Python - Create a datetime for each ID in the data frame

Create a datetime for each ID in the data frame… here is a solution to the problem.

Create a datetime for each ID in the data frame

I have a data frame (named df) from 1/1/2016 00:00 to 25/11/2018 23:00, with a timestamp every hour, a object_id and a value. The dataset contains only object_id rows that have values.

timestampHour     object_id  value
2016/1/1 00:00    1          2
2016/1/1 00:00    3          1
2016/1/1 01:00    1          1
2016/1/1 01:00    2          3
2016/1/1 02:00    2          3
2016/1/1 02:00    3          2

I want to get a data frame that shows all object IDs per hour, null if there is no value.

timestampHour     object_id  value
2016/1/1 00:00    1          2
2016/1/1 00:00    2          null
2016/1/1 00:00    3          1
2016/1/1 01:00    1          1
2016/1/1 01:00    2          3
2016/1/1 01:00    3          null
2016/1/1 02:00    1          null
2016/1/1 02:00    2          3
2016/1/1 02:00    3          2

I’ve created a datetime based on the timestamp. And use the following code to round them to the hour:

df["timestamp"] = pd.to_datetime(df["result_timestamp"])
df['timestampHour'] = df['result_timestamp'].dt.round('60min')

(I don’t know if there’s a better option, but I’ve been trying to create timestampHour rows until 12 (I have 12 unique object_id each) and populate those newly created rows with unused object_id (for that hour). But I can’t create empty rows based on the condition)

I’m still new to programming, and I haven’t found a clue to come closer to solving this problem by searching for other posts.

Solution

Use pivot_table and unstack:

df.pivot_table(
    index='object_id',  columns='timestampHour', values='value'
).unstack().rename('value').reset_index()

    timestampHour  object_id  value
0  2016/1/1 00:00          1    2.0
1  2016/1/1 00:00          2    NaN
2  2016/1/1 00:00          3    1.0
3  2016/1/1 01:00          1    1.0
4  2016/1/1 01:00          2    3.0
5  2016/1/1 01:00          3    NaN
6  2016/1/1 02:00          1    NaN
7  2016/1/1 02:00          2    3.0
8  2016/1/1 02:00          3    2.0

To understand how this works, the middle pivot_table helps to see:

timestampHour  2016/1/1 00:00  2016/1/1 01:00  2016/1/1 02:00
object_id
1                         2.0             1.0             NaN
2                         NaN             3.0             3.0
3                         1.0             NaN             2.0

When the value of the object_id and timestampHour combination cannot be found, NaN is added to the table. When you use unstack, these NaNs are preserved, giving you the desired results with missing values.

Python – Create a datetime for each ID in the data frame

Create a datetime for each ID in the data frame

Solution

Related Problems and Solutions