Merges Pandas and outputs only selected columns… here is a solution to the problem.
Merges Pandas and outputs only selected columns
Is there a way to merge in pandas to limit the columns you want to see?
What do I have:
df1
ID Col1 Col2 Col3 Col4
1 1 1 1 D
2 A C C 4
3 B B B d
4 X 2 3 6
df2
ID ColA ColB ColC ColD
1 1 1 1 D
2 A C X 4
3 B B Y d
What I want:
df_final
ID ColA ColB ColC ColD
1 NA NA NA NA
2 A C X 4
3 B B Y d
4 NA NA NA NA
I want to
left concatenate the two dataframes (keep all IDs in df1), but I only want to keep the columns in df2. If Col3 from df1 is C or B, I only need the value too.
The following is valid, but the resulting DF includes all columns for both DFs.
I can add a third row to see only the columns I want, but this is a simple example. Actually, I have a larger dataset and it’s hard to manually enter all the column names I want to keep.
df=pd.merge(df1,df2,how='left',on='ID')
df_final=df[df['Col3'].isin['C','B']]
The equivalent SQL is
create table df_final as
select b.*
from df1 a
left join df2 b
on a.ID=b.ID
where a.Col3 in ('C','B')
Solution
Mask df1
: with your ISIN
condition before merging
df1.where(df1. Col3.isin(['C', 'B']))[['ID']].merge(df2, how='left', on='ID')
Or,
df1.mask(~df1. Col3.isin(['C', 'B']))[['ID']].merge(df2, how='left', on='ID')
ID ColA ColB ColC ColD
0 NaN NaN NaN NaN NaN
1 2 A C X 4
2 3 B B Y d
3 NaN NaN NaN NaN NaN