Python – Aggregate run percentage within each group and Python in descending order per group

Aggregate run percentage within each group and Python in descending order per group… here is a solution to the problem.

Aggregate run percentage within each group and Python in descending order per group

df = pd. DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,'id': 
       [1,2,3,4,5,6]*2 ,'sales': [np.random.randint(100000, 999999) for _ in 
        range(12)]})

This is the output of df:

 id sales state
0    1    847754    CA
1    2    362532    WA
2    3    615849    CO
3    4    376480    AZ
4    5    381286    CA
5    6    411001    WA
6    1    946795    CO
7    2    857435    AZ
8    3    928087    CA
9    4    675593    WA
10   5    371339    CO
11   6    440285    AZ

I can’t calculate the cumulative percentage for each group in descending order. I want output like this:

 id sales state cumsum run_pct
0    2    857435    AZ    857435     0.5121460996296738
1    6    440285    AZ    1297720    0.7751284195436626
2    4    376480    AZ    1674200    1.0
3    3    928087    CA    928087     0.43024216932985404
4    1    847754    CA    1775841    0.8232436013271356
5    5    381286    CA    2157127    1.0
6    1    946795    CO    946795     0.48955704367618535
7    3    615849    CO    1562644    0.807992624547372
8    5    371339    CO    1933983    1.0
9    4    675593    WA    675593     0.46620721731581655
10   6    411001    WA    1086594    0.7498271371847582
11   2    362532    WA    1449126    1.0

Solution

One possible solution is to first sort the data, calculate the cumsum, and then calculate the percentage.
Sort by status ascending and sales descending:

df = df.sort_values(['state', 'sales'], ascending=[True, False])

Calculate the cumulative sum:

df['cumsum'] = df.groupby('state')['sales'].cumsum()

and percentage:

df['run_pct'] = df.groupby('state')['sales'].apply(lambda x: (x/x.sum()).cumsum())

This will give:

    id  sales   state   cumsum  run_pct
0   4   846079  AZ  846079  0.608566
1   2   312708  AZ  1158787 0.833491
2   6   231495  AZ  1390282 1.000000
3   3   790291  CA  790291  0.506795
4   1   554631  CA  1344922 0.862467
5   5   214467  CA  1559389 1.000000
6   1   983878  CO  983878  0.388139
7   5   779497  CO  1763375 0.695650
8   3   771486  CO  2534861 1.000000
9   6   794407  WA  794407  0.420899
10  2   587843  WA  1382250 0.732355
11  4   505155  WA  1887405 1.000000

Related Problems and Solutions