Python Pandas checks whether a value appears multiple times on the same day

Python Pandas checks whether a value appears multiple times on the same day … here is a solution to the problem.

Python Pandas checks whether a value appears multiple times on the same day

I have a Pandas data frame as shown below. What I want to do is check if a station has the variable yyy and any other variables on the same day (as is the case with station1). If this is true, I need to delete the entire row containing yyy.

Currently I’m doing this with iterrows() and iterating through the date this variable appears, changing the variable to something like “delete me”, and from that I build a new data frame (because pandas doesn’t support replacing in place and filter the new data frame to remove unwanted rows. This works now because my data frame is small but unlikely to scale.

Question: This seems like a very “non-Pandas” approach, is there another way to remove unwanted variables?

                dateuse         station         variable1
0   2012-08-12 00:00:00        station1               xxx
1   2012-08-12 00:00:00        station1               yyy
2   2012-08-23 00:00:00        station2               aaa
3   2012-08-23 00:00:00        station3               bbb
4   2012-08-25 00:00:00        station4               ccc
5   2012-08-25 00:00:00        station4               ccc
6   2012-08-25 00:00:00        station4               ccc

Solution

I might use a bool array for indexing. We’re going to remove rows with yyy and multiple dateuse/station combinations (anyway, if I understand what you mean!)

We can use transform to broadcast the size of each dateuse/station combination up to the length of the dataframe, and then select the row of length > 1 in the group. Then we can use where yyy is located to &.

>>> multiple = df.groupby(["dateuse", "station"])["variable1"].transform(len) > 1
>>> must_be_isolated = df["variable1"] == "yyy"
>>> df[~(multiple & must_be_isolated)]
               dateuse   station variable1
0  2012-08-12 00:00:00  station1       xxx
2  2012-08-23 00:00:00  station2       aaa
3  2012-08-23 00:00:00  station3       bbb
4  2012-08-25 00:00:00  station4       ccc
5  2012-08-25 00:00:00  station4       ccc
6  2012-08-25 00:00:00  station4       ccc

Related Problems and Solutions