Python Pandas – Find elements (substrings) in the same column

Python Pandas – Find elements (substrings) in the same column … here is a solution to the problem.

Python Pandas – Find elements (substrings) in the same column

I have a string column

(‘b’) and I want to get a string similar to a substring in the same column. For example, in the data frame column ‘b’ below, world is a substring of helloworld and ness is a substring of greatness. I want to get the strings world and ness in a list. Can you come up with a solution.

     a           b
0  test       world
1  teat  helloworld
2   gor         bye
3   jhr   greatness
4   fre        ness

The expected output in the list

listofsubstrings
Out[353]: ['world', 'ness']

Solution

You can use:

from itertools import product

#get unique values only
b = df.b.unique()
#create all combination
df1 = pd. DataFrame(list(product(b, b)), columns=['a', 'b'])
#filtering
df1 = df1[df1.apply(lambda x: x.a in x.b, axis=1) & (df1.a != df1.b)]
print (df1)
        a           b
1   world  helloworld
23   ness   greatness

print (df1.a.tolist())
['world', 'ness']

Alternative solutions for cross-connect:

b = df.b.unique()
df['tmp'] = 1
df1 = pd.merge(df[['b','tmp']],df[['b','tmp']], on='tmp')
df1 = df1[df1.apply(lambda x: x.b_x in x.b_y, axis=1) & (df1.b_x != df1.b_y)]
print (df1)
      b_x  tmp         b_y
1   world    1  helloworld
23   ness    1   greatness

print (df1.b_x.tolist())
['world', 'ness']

Related Problems and Solutions