Python – Delete rows if values in one column do not meet the requirements in another column

Delete rows if values in one column do not meet the requirements in another column… here is a solution to the problem.

Delete rows if values in one column do not meet the requirements in another column

Let’s say I have this data frame:

df = DataFrame({'ID': [1001,4003,1001, 4003, 7000, 7000], 
            'col_2': ['3', '8', '2', '1','7','9'], 
            'col_3': ['Steak','Chicken','Chicken','Steak','Chicken','Chicken']})

I want to create 3 data frames.
The first two are data frames that contain each ID of Chicken. The second is all the IDs that have beef. It’s simple:

dfsteak = df[~(df['col_3'] != 'Steak')]
dfchicken =  df[~(df['col_3'] != 'Chicken')]

But for the third, if the ID doesn’t have Chicken at one time and the other doesn’t have Steak, I’d like to remove any of the rows. So, in this example, DF has an ID of 7000 and he only ordered chicken. But how will I achieve it?

Solution

It’s an intuitive approach. The idea is to create a series that aggregates col_3 into a set of IDs.

It is then filtered when the mapped set is not a superset of {'Steak', 'Chicken'}.

s = df.groupby('ID')['col_3'].apply(set)
df = df[~(df['ID'].map(s) >= {'Steak', 'Chicken'})]

print(df)

ID col_2    col_3
4  7000     7  Chicken
5  7000     9  Chicken

Related Problems and Solutions