Remove the entire day from the Pandas hourly data frame
I’m using an hourly data frame that contains rows for every hour of the day starting in 2016. I want to remove some of my days in the “df_outlayers” data frame from this hour data frame, which is a daily data frame. I tried the following:
remove = df_hourly.loc[df_outlayers.index]
df_clean = df_hourly.drop(remove.index)
df_clean['2017-04-17']
But it only deletes the first hour of the day, e.g. it deletes the line 2017-04-17 00:00:00
instead of 2017-04-17 01:00:00. How do I delete every hour for those given outlayers days?
Note: My “df_outlayers” data frame has an index column named Date, which df_outlayers.index gives for example:
DatetimeIndex(['2016-07-06', '2016-07-08', '2016-10-10', '2017-04-09',
'2017-04-17', '2017-04-26', '2017-07-05', '2017-07-07',
'2017-09-01', '2017-09-22', '2017-09-29'],
dtype='datetime64[ns]', name='date', freq=None)
My df_hourly data frame also has an index column called “date”, which df_hourly.index gives for example:
DatetimeIndex(['2014-07-19 00:00:00', '2014-07-19 01:00:00', ...]
dtype='datetime64[ns]', name='date', length=13214, freq=None)
Solution
It seems that you need to >boolean indexing Invert the mask by ~
and numpy.in1d
, because DatetimeIndex.date
Returns numpy array
:
mask = np.in1d(df_hourly.index.date, df_outlayers.index.date)
df_clean = df_hourly[~mask]