Python – How to identify .onion links in text?

How to identify .onion links in text?… here is a solution to the problem.

How to identify .onion links in text?

How do I recognize .onion links in text, remembering that they can appear in multiple ways;

hfajlhfjkdsflkdsja.onion
http://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion

I’m thinking about regular expressions, but (.*?. onion) returns the entire paragraph where the URL link is located

Solution

Do this: (?:https?:/ /)? (?:www)? (\S*?\.onion)\b (Non-capture group added – credit: @WiktorStribiżew).

Demo:

s = '''hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
https://www.google.com
https://stackoverflow.com'''

for m in re.finditer(r'(?:https?:/ /)? (?:www)? (\S*?\.onion)\b', s, re. M | re. IGNORECASE):
    print(m.group(0))

Output

hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion

Related Problems and Solutions