A Pythonic method that compares a list of words to a list of sentences and prints matching rows

A Pythonic method that compares a list of words to a list of sentences and prints matching rows … here is a solution to the problem.

A Pythonic method that compares a list of words to a list of sentences and prints matching rows

I’m currently cleaning up our database and it’s getting very time-consuming. Typical

for email in emails:   

The loop isn’t fast enough.

For example, I’m currently comparing a list of 230,000 emails to a full list of records of 39,000,000 rows. It takes several hours to match these emails to the line of record they belong to and print them out. Does anyone know how to implement threads in this query to speed things up? Although this is super fast

strings = ("string1", "string2", "string3")
for line in file:
    if any(s in line for s in strings):
        print "yay!"

That will never print matching threads, only needles.

Thanks in advance

Solution

One possibility is to store emails using set. This makes it possible to check if word in emails O(1). So the work done is proportional to the total number of words in the file:

emails = {"string1", "string2", "string3"} # this is a set

for line in f:
    if any(word in emails for word in line.split()):
        print("yay!")

Your initial solution is O(nm) (for n words and m emails) instead of O(n) and set.

Related Problems and Solutions