How to identify .onion links in text?… here is a solution to the problem.
How to identify .onion links in text?
How do I recognize .onion links in text, remembering that they can appear in multiple ways;
hfajlhfjkdsflkdsja.onion
http://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
I’m thinking about regular expressions, but (.*?. onion)
returns the entire paragraph where the URL link is located
Solution
Do this: (?:https?:/ /)? (?:www)? (\S*?\.onion)\b
(Non-capture group added – credit: @WiktorStribiżew).
Demo:
s = '''hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
https://www.google.com
https://stackoverflow.com'''
for m in re.finditer(r'(?:https?:/ /)? (?:www)? (\S*?\.onion)\b', s, re. M | re. IGNORECASE):
print(m.group(0))
Output
hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion