Aggregate run percentage within each group and Python in descending order per group… here is a solution to the problem.
Aggregate run percentage within each group and Python in descending order per group
df = pd. DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,'id':
[1,2,3,4,5,6]*2 ,'sales': [np.random.randint(100000, 999999) for _ in
range(12)]})
This is the output of df:
id sales state
0 1 847754 CA
1 2 362532 WA
2 3 615849 CO
3 4 376480 AZ
4 5 381286 CA
5 6 411001 WA
6 1 946795 CO
7 2 857435 AZ
8 3 928087 CA
9 4 675593 WA
10 5 371339 CO
11 6 440285 AZ
I can’t calculate the cumulative percentage for each group in descending order. I want output like this:
id sales state cumsum run_pct
0 2 857435 AZ 857435 0.5121460996296738
1 6 440285 AZ 1297720 0.7751284195436626
2 4 376480 AZ 1674200 1.0
3 3 928087 CA 928087 0.43024216932985404
4 1 847754 CA 1775841 0.8232436013271356
5 5 381286 CA 2157127 1.0
6 1 946795 CO 946795 0.48955704367618535
7 3 615849 CO 1562644 0.807992624547372
8 5 371339 CO 1933983 1.0
9 4 675593 WA 675593 0.46620721731581655
10 6 411001 WA 1086594 0.7498271371847582
11 2 362532 WA 1449126 1.0
Solution
One possible solution is to first sort the data, calculate the cumsum, and then calculate the percentage.
Sort by status ascending and sales descending:
df = df.sort_values(['state', 'sales'], ascending=[True, False])
Calculate the cumulative sum:
df['cumsum'] = df.groupby('state')['sales'].cumsum()
and percentage:
df['run_pct'] = df.groupby('state')['sales'].apply(lambda x: (x/x.sum()).cumsum())
This will give:
id sales state cumsum run_pct
0 4 846079 AZ 846079 0.608566
1 2 312708 AZ 1158787 0.833491
2 6 231495 AZ 1390282 1.000000
3 3 790291 CA 790291 0.506795
4 1 554631 CA 1344922 0.862467
5 5 214467 CA 1559389 1.000000
6 1 983878 CO 983878 0.388139
7 5 779497 CO 1763375 0.695650
8 3 771486 CO 2534861 1.000000
9 6 794407 WA 794407 0.420899
10 2 587843 WA 1382250 0.732355
11 4 505155 WA 1887405 1.000000