Use isin() to determine what should be printed
Now I have two dataframes (data1
and data2
).
I want to print a list of string values in a data frame named data1 based on whether the ID exists in both data2 and data1.
What I’m doing now is giving me a list of bool values (True
or False
if the ID exists in both dataframes but not in the string column).
print(data2['id'].isin(data1.id).to_string())
Yield
0 True
1 True
2 True
3 True
4 True
5 True
Any ideas would be appreciated.
Here is a sample of data 1
‘user_id’, ‘id’, ‘rating’, ‘unix_timestamp’
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
Data2 contains such content
‘id’, ‘title’, ‘
release date’,
‘video_release_date’, ‘imdb_url’
37| Nadja (1994)|01-Jan-1994|| http://us.imdb.com/M/title-exact? Nadja%20(1994)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
38|Net, The (1995)|01-Jan-1995|| http://us.imdb.com/M/title-exact?Net,%20The%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|0|0
39| Strange Days (1995)|01-Jan-1995|| http://us.imdb.com/M/title-exact? Strange%20Days%20(1995)|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0|0|0
Solution
If all values of id
are unique:
I think you need to >merge with inner
Connect. For data2
to select only the id column, the on
parameter should be omitted because all columns are included – here only id:
df = pd.merge(data1, data2[['id']])
Example:
data1 = pd. DataFrame({'id':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3]})
print (data1)
B C id
0 4 7 a
1 5 8 b
2 4 9 c
3 5 4 d
4 5 2 e
5 4 3 f
data2 = pd. DataFrame({'id':list('frcdeg'),
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],})
print (data2)
D E id
0 1 5 f
1 3 3 r
2 5 6 c
3 7 9 d
4 1 2 e
5 0 4 g
df = pd.merge(data1, data2[['id']])
print (df)
B C id
0 4 9 c
1 5 4 d
2 5 2 e
3 4 3 f
A similar solution was added if id
reused another answer in one or another Dataframe
:
df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]
ids = set(data1['id']) & set(data2['id'])
df = data2.query('id in @ids')
df = data1[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id']))]
Example:
data1 = pd. DataFrame({'id':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3]})
print (data1)
B C id
0 4 7 a
1 5 8 b
2 4 9 c
3 5 4 d
4 5 2 e
5 4 3 f
data2 = pd. DataFrame({'id':list('fecdef'),
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],})
print (data2)
D E id
0 1 5 f
1 3 3 e
2 5 6 c
3 7 9 d
4 1 2 e
5 0 4 f
df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]
print (df)
B C id
2 4 9 c
3 5 4 d
4 5 2 e
5 4 3 f
Edit:
You can use:
df = data2.loc[data1['id'].isin(set(data1['id']) & set(data2['id'])), ['title']]
ids = set(data1['id']) & set(data2['id'])
df = data2.query('id in @ids')[['title']]
df = data2.loc[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id'])), ['title']]