Enumerates data frames based on columns… here is a solution to the problem.
Enumerates data frames based on columns
I’m dealing with a time series data frame that looks like this, except that there are over a few thousand rows. I want to create a new column to enumerate rows with the same “symbol” value of block. i.e. row 0 will be 0, lines 1 to 23 will be 1, lines 24 to 30 will be 2 and so on… (Chronology is important) What is the most pythonic way to accomplish this? Thank you in advance
Date sign
0 2011-01-27 1
1 2011-01-28 -1
2 2011-01-31 -1
3 2011-02-01 -1
4 2011-02-02 -1
5 2011-02-07 -1
6 2011-02-08 -1
7 2011-02-09 -1
8 2011-02-10 -1
9 2011-02-11 -1
10 2011-02-14 -1
11 2011-02-15 -1
12 2011-02-16 -1
13 2011-02-17 -1
14 2011-02-18 -1
15 2011-02-21 -1
16 2011-02-22 -1
17 2011-02-23 -1
18 2011-02-24 -1
19 2011-02-25 -1
20 2011-02-28 -1
21 2011-03-01 -1
22 2011-03-02 -1
23 2011-03-03 -1
24 2011-03-04 1
25 2011-03-07 1
26 2011-03-08 1
27 2011-03-09 1
28 2011-03-10 1
29 2011-03-11 1
30 2011-03-14 1
31 2011-03-15 -1
32 2011-03-16 -1
33 2011-03-17 -1
34 2011-03-18 -1
35 2011-03-21 -1
36 2011-03-22 -1
37 2011-03-23 -1
38 2011-03-24 -1
39 2011-03-25 -1
40 2011-03-28 -1
41 2011-03-29 1
42 2011-03-30 1
Solution
You can get the cumsum
at the symbol change, use diff
to get:
df['new_column'] = (df.sign.diff()!=0).cumsum()-1
>>> df
Date sign new_column
0 2011-01-27 1 0
1 2011-01-28 -1 1
2 2011-01-31 -1 1
3 2011-02-01 -1 1
4 2011-02-02 -1 1
5 2011-02-07 -1 1
6 2011-02-08 -1 1
7 2011-02-09 -1 1
8 2011-02-10 -1 1
9 2011-02-11 -1 1
10 2011-02-14 -1 1
11 2011-02-15 -1 1
12 2011-02-16 -1 1
13 2011-02-17 -1 1
14 2011-02-18 -1 1
15 2011-02-21 -1 1
16 2011-02-22 -1 1
17 2011-02-23 -1 1
18 2011-02-24 -1 1
19 2011-02-25 -1 1
20 2011-02-28 -1 1
21 2011-03-01 -1 1
22 2011-03-02 -1 1
23 2011-03-03 -1 1
24 2011-03-04 1 2
25 2011-03-07 1 2
26 2011-03-08 1 2
27 2011-03-09 1 2
28 2011-03-10 1 2
29 2011-03-11 1 2
30 2011-03-14 1 2
31 2011-03-15 -1 3
32 2011-03-16 -1 3
33 2011-03-17 -1 3
34 2011-03-18 -1 3
35 2011-03-21 -1 3
36 2011-03-22 -1 3
37 2011-03-23 -1 3
38 2011-03-24 -1 3
39 2011-03-25 -1 3
40 2011-03-28 -1 3
41 2011-03-29 1 4
42 2011-03-30 1 4