Python – Create a datetime for each ID in the data frame

Create a datetime for each ID in the data frame… here is a solution to the problem.

Create a datetime for each ID in the data frame

I have a data frame (named df) from 1/1/2016 00:00 to 25/11/2018 23:00, with a timestamp every hour, a object_id and a value. The dataset contains only object_id rows that have values.

timestampHour     object_id  value
2016/1/1 00:00    1          2
2016/1/1 00:00    3          1
2016/1/1 01:00    1          1
2016/1/1 01:00    2          3
2016/1/1 02:00    2          3
2016/1/1 02:00    3          2

I want to get a data frame that shows all object IDs per hour, null if there is no value.

timestampHour     object_id  value
2016/1/1 00:00    1          2
2016/1/1 00:00    2          null
2016/1/1 00:00    3          1
2016/1/1 01:00    1          1
2016/1/1 01:00    2          3
2016/1/1 01:00    3          null
2016/1/1 02:00    1          null
2016/1/1 02:00    2          3
2016/1/1 02:00    3          2

I’ve created a datetime based on the timestamp. And use the following code to round them to the hour:

df["timestamp"] = pd.to_datetime(df["result_timestamp"])
df['timestampHour'] = df['result_timestamp'].dt.round('60min')

(I don’t know if there’s a better option, but I’ve been trying to create timestampHour rows until 12 (I have 12 unique object_id each) and populate those newly created rows with unused object_id (for that hour). But I can’t create empty rows based on the condition)

I’m still new to programming, and I haven’t found a clue to come closer to solving this problem by searching for other posts.

Solution

Use pivot_table and unstack:

df.pivot_table(
    index='object_id',  columns='timestampHour', values='value'
).unstack().rename('value').reset_index()

    timestampHour  object_id  value
0  2016/1/1 00:00          1    2.0
1  2016/1/1 00:00          2    NaN
2  2016/1/1 00:00          3    1.0
3  2016/1/1 01:00          1    1.0
4  2016/1/1 01:00          2    3.0
5  2016/1/1 01:00          3    NaN
6  2016/1/1 02:00          1    NaN
7  2016/1/1 02:00          2    3.0
8  2016/1/1 02:00          3    2.0

To understand how this works, the middle pivot_table helps to see:

timestampHour  2016/1/1 00:00  2016/1/1 01:00  2016/1/1 02:00
object_id
1                         2.0             1.0             NaN
2                         NaN             3.0             3.0
3                         1.0             NaN             2.0

When the value of the object_id and timestampHour combination cannot be found, NaN is added to the table. When you use unstack, these NaNs are preserved, giving you the desired results with missing values.

Related Problems and Solutions