Python – Use isin() to determine what should be printed

Use isin() to determine what should be printed… here is a solution to the problem.

Use isin() to determine what should be printed

Now I have two dataframes (data1 and data2).

I want to print a list of string values in a data frame named data1 based on whether the ID exists in both data2 and data1.

What I’m doing now is giving me a list of bool values (True or False if the ID exists in both dataframes but not in the string column).

print(data2['id'].isin(data1.id).to_string())

Yield

0      True
1      True
2      True
3      True
4      True
5      True

Any ideas would be appreciated.

Here is a sample of data 1

‘user_id’, ‘id’, ‘rating’, ‘unix_timestamp’

196 242 3   881250949
186 302 3   891717742
22  377 1   878887116

Data2 contains such content

‘id’, ‘title’, ‘

release date’,
‘video_release_date’, ‘imdb_url’

37| Nadja (1994)|01-Jan-1994|| http://us.imdb.com/M/title-exact? Nadja%20(1994)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
38|Net, The (1995)|01-Jan-1995|| http://us.imdb.com/M/title-exact?Net,%20The%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|0|0
39| Strange Days (1995)|01-Jan-1995|| http://us.imdb.com/M/title-exact? Strange%20Days%20(1995)|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0|0|0

Solution

If all values of id are unique:

I think you need to >merge with inner Connect. For data2 to select only the id column, the on parameter should be omitted because all columns are included – here only id:

df = pd.merge(data1, data2[['id']])

Example:

data1 = pd. DataFrame({'id':list('abcdef'),
                      'B':[4,5,4,5,5,4],
                      'C':[7,8,9,4,2,3]})

print (data1)
   B  C id
0  4  7  a
1  5  8  b
2  4  9  c
3  5  4  d
4  5  2  e
5  4  3  f

data2 = pd. DataFrame({'id':list('frcdeg'),
                      'D':[1,3,5,7,1,0],
                      'E':[5,3,6,9,2,4],})

print (data2)
   D  E id
0  1  5  f
1  3  3  r
2  5  6  c
3  7  9  d
4  1  2  e
5  0  4  g

df = pd.merge(data1, data2[['id']])
print (df)
   B  C id
0  4  9  c
1  5  4  d
2  5  2  e
3  4  3  f

A similar solution was added if id reused another answer in one or another Dataframe:

df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]

ids = set(data1['id']) & set(data2['id'])
df = data2.query('id in @ids')

df = data1[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id']))]

Example:

data1 = pd. DataFrame({'id':list('abcdef'),
                      'B':[4,5,4,5,5,4],
                      'C':[7,8,9,4,2,3]})

print (data1)
   B  C id
0  4  7  a
1  5  8  b
2  4  9  c
3  5  4  d
4  5  2  e
5  4  3  f

data2 = pd. DataFrame({'id':list('fecdef'),
                      'D':[1,3,5,7,1,0],
                      'E':[5,3,6,9,2,4],})

print (data2)
   D  E id
0  1  5  f
1  3  3  e
2  5  6  c
3  7  9  d
4  1  2  e
5  0  4  f

df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]
print (df)
   B  C id
2  4  9  c
3  5  4  d
4  5  2  e
5  4  3  f

Edit:

You can use:

df = data2.loc[data1['id'].isin(set(data1['id']) & set(data2['id'])), ['title']]

ids = set(data1['id']) & set(data2['id'])
df = data2.query('id in @ids')[['title']]

df = data2.loc[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id'])), ['title']]

Related Problems and Solutions