Delete rows if values in one column do not meet the requirements in another column… here is a solution to the problem.
Delete rows if values in one column do not meet the requirements in another column
Let’s say I have this data frame:
df = DataFrame({'ID': [1001,4003,1001, 4003, 7000, 7000],
'col_2': ['3', '8', '2', '1','7','9'],
'col_3': ['Steak','Chicken','Chicken','Steak','Chicken','Chicken']})
I want to create 3 data frames.
The first two are data frames that contain each ID of Chicken. The second is all the IDs that have beef. It’s simple:
dfsteak = df[~(df['col_3'] != 'Steak')]
dfchicken = df[~(df['col_3'] != 'Chicken')]
But for the third, if the ID doesn’t have Chicken at one time and the other doesn’t have Steak, I’d like to remove any of the rows. So, in this example, DF has an ID of 7000 and he only ordered chicken. But how will I achieve it?
Solution
It’s an intuitive approach. The idea is to create a series that aggregates col_3
into a set of IDs
.
It is then filtered when the mapped set
is not a superset of {'Steak', 'Chicken'}
.
s = df.groupby('ID')['col_3'].apply(set)
df = df[~(df['ID'].map(s) >= {'Steak', 'Chicken'})]
print(df)
ID col_2 col_3
4 7000 7 Chicken
5 7000 9 Chicken