Python – Remove the entire day from the Pandas hourly data frame

Remove the entire day from the Pandas hourly data frame… here is a solution to the problem.

Remove the entire day from the Pandas hourly data frame

I’m using an hourly data frame that contains rows for every hour of the day starting in 2016. I want to remove some of my days in the “df_outlayers” data frame from this hour data frame, which is a daily data frame. I tried the following:

remove = df_hourly.loc[df_outlayers.index]
df_clean = df_hourly.drop(remove.index)
df_clean['2017-04-17']

But it only deletes the first hour of the day, e.g. it deletes the line 2017-04-17 00:00:00

instead of 2017-04-17 01:00:00. How do I delete every hour for those given outlayers days?

Note: My “df_outlayers” data frame has an index column named Date, which df_outlayers.index gives for example:

DatetimeIndex(['2016-07-06', '2016-07-08', '2016-10-10', '2017-04-09',
           '2017-04-17', '2017-04-26', '2017-07-05', '2017-07-07',
           '2017-09-01', '2017-09-22', '2017-09-29'],
          dtype='datetime64[ns]', name='date', freq=None)

My df_hourly data frame also has an index column called “date”, which df_hourly.index gives for example:

DatetimeIndex(['2014-07-19 00:00:00', '2014-07-19 01:00:00', ...]
dtype='datetime64[ns]', name='date', length=13214, freq=None)

Solution

It seems that you need to >boolean indexing Invert the mask by ~ and numpy.in1d , because DatetimeIndex.date Returns numpy array:

mask = np.in1d(df_hourly.index.date, df_outlayers.index.date)
df_clean = df_hourly[~mask]

Related Problems and Solutions