Create a datetime for each ID in the data frame
I have a data frame (named df) from 1/1/2016 00:00 to 25/11/2018 23:00, with a timestamp every hour, a object_id and a value. The dataset contains only object_id rows that have values.
timestampHour object_id value
2016/1/1 00:00 1 2
2016/1/1 00:00 3 1
2016/1/1 01:00 1 1
2016/1/1 01:00 2 3
2016/1/1 02:00 2 3
2016/1/1 02:00 3 2
I want to get a data frame that shows all object IDs per hour, null if there is no value.
timestampHour object_id value
2016/1/1 00:00 1 2
2016/1/1 00:00 2 null
2016/1/1 00:00 3 1
2016/1/1 01:00 1 1
2016/1/1 01:00 2 3
2016/1/1 01:00 3 null
2016/1/1 02:00 1 null
2016/1/1 02:00 2 3
2016/1/1 02:00 3 2
I’ve created a datetime based on the timestamp. And use the following code to round them to the hour:
df["timestamp"] = pd.to_datetime(df["result_timestamp"])
df['timestampHour'] = df['result_timestamp'].dt.round('60min')
(I don’t know if there’s a better option, but I’ve been trying to create timestampHour rows until 12 (I have 12 unique object_id each) and populate those newly created rows with unused object_id (for that hour). But I can’t create empty rows based on the condition)
I’m still new to programming, and I haven’t found a clue to come closer to solving this problem by searching for other posts.
Solution
Use pivot_table
and unstack
:
df.pivot_table(
index='object_id', columns='timestampHour', values='value'
).unstack().rename('value').reset_index()
timestampHour object_id value
0 2016/1/1 00:00 1 2.0
1 2016/1/1 00:00 2 NaN
2 2016/1/1 00:00 3 1.0
3 2016/1/1 01:00 1 1.0
4 2016/1/1 01:00 2 3.0
5 2016/1/1 01:00 3 NaN
6 2016/1/1 02:00 1 NaN
7 2016/1/1 02:00 2 3.0
8 2016/1/1 02:00 3 2.0
To understand how this works, the middle pivot_table
helps to see:
timestampHour 2016/1/1 00:00 2016/1/1 01:00 2016/1/1 02:00
object_id
1 2.0 1.0 NaN
2 NaN 3.0 3.0
3 1.0 NaN 2.0
When the value of the object_id
and timestampHour
combination cannot be found, NaN
is added to the table. When you use unstack
, these NaNs
are preserved, giving you the desired results with missing values.