Python – How to count consecutive occurrences in a time period? – Pandas python

How to count consecutive occurrences in a time period? – Pandas python… here is a solution to the problem.

How to count consecutive occurrences in a time period? – Pandas python

I have a data frame like this:

datetime              levels       shear
2016-05-01 00:10:00    100           5
2016-05-01 00:10:00    200           6 
2016-05-01 00:10:00    300           10
2016-05-01 00:10:00    400           11
2016-05-01 00:20:00    100           7  
2016-05-01 00:20:00    300           10
2016-05-01 00:30:00    100           5
2016-05-01 00:30:00    200           9
2016-05-01 00:30:00    300           12
2016-05-01 00:30:00    400           15
2016-05-01 00:40:00    100           19
2016-05-01 00:40:00    200           2
2016-05-01 00:40:00    300           18

This is wind shear at different heights at 10-minute intervals.

What I want to do is group them by levels and then calculate the cuts that occur continuously for each level. So, first I create another column, which is the time difference of each continuous data group by level:

data3_5['Delta'] = data3_5.sort_values(['levels','datetimes']).groupby('levels')['datetimes'].diff()

Then define a function that rolls over the same time difference that occurs consecutively:

  def rolling_count(val):
    if val == rolling_count.previous:
       rolling_count.count +=1
    else:
       rolling_count.previous = val
       rolling_count.count = 1
  return rolling_count.count
 rolling_count.count = 0 #static variable
 rolling_count.previous = None #static variable

Apply the function and add a new column:

data3_5['count'] = data3_5.sort_values(['levels','datetimes']).groupby('levels')['index'].diff().apply(rolling_count)

Then finally print out the result:

group = data3_5.groupby('levels', as_index=False)
group_keys = sorted(list(group.groups.keys()))

for each in range(0,len(group_keys)):
    group_result = group.get_group(group_keys[each])
    print(group_result)

Some of the results are as follows:

datetime              levels  shear  Delta     count
2016-05-01 00:10:00   100     5      NaT       1
2016-05-01 00:20:00   100     7      00:10:00  1
2016-05-01 00:30:00   100     5      00:10:00  2
2016-05-01 00:40:00   100     19     00:10:00  3
datetime              levels  shear  Delta     count
2016-05-01 00:10:00   200     6      NaT       1
2016-05-01 00:30:00   200     9      00:20:00  1
2016-05-01 00:40:00   200     2      00:10:00  1

This is not the result we want, because there will be two problems:
1. The first count of each level will be treated as NaT, so the next consecutive occurrence of the value will be considered the first occurrence.
Similar to question 1, for level 200, the third result should count as 2.

How do I fix this? Thank you.

Solution

Let’s try:

First let’s make sure df[‘datetime’] is indeed a datetime object:

df['datetime'] = pd.to_datetime(df['datetime'])

df['Delta'] = df.groupby('levels')['datetime'].transform(lambda x: x-x.shift(1))
df['count'] = df.groupby('levels')['Delta'].transform(lambda x : x.le(pd. Timedelta('10 minutes')).cumsum())
df.sort_values(by='levels')

Output:

              datetime  levels  shear    Delta  count
0  2016-05-01 00:10:00     100      5      NaT      0
4  2016-05-01 00:20:00     100      7 00:10:00      1
6  2016-05-01 00:30:00     100      5 00:10:00      2
10 2016-05-01 00:40:00     100     19 00:10:00      3
1  2016-05-01 00:10:00     200      6      NaT      0
7  2016-05-01 00:30:00     200      9 00:20:00      0
11 2016-05-01 00:40:00     200      2 00:10:00      1
2  2016-05-01 00:10:00     300     10      NaT      0
5  2016-05-01 00:20:00     300     10 00:10:00      1
8  2016-05-01 00:30:00     300     12 00:10:00      2
12 2016-05-01 00:40:00     300     18 00:10:00      3
3  2016-05-01 00:10:00     400     11      NaT      0
9  2016-05-01 00:30:00     400     15 00:20:00      0

Related Problems and Solutions