Python - Aggregate run percentage within each group and Python in descending order per group

Aggregate run percentage within each group and Python in descending order per group… here is a solution to the problem.

Aggregate run percentage within each group and Python in descending order per group

df = pd. DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,'id': 
       [1,2,3,4,5,6]*2 ,'sales': [np.random.randint(100000, 999999) for _ in 
        range(12)]})

This is the output of df:

 id sales state
0    1    847754    CA
1    2    362532    WA
2    3    615849    CO
3    4    376480    AZ
4    5    381286    CA
5    6    411001    WA
6    1    946795    CO
7    2    857435    AZ
8    3    928087    CA
9    4    675593    WA
10   5    371339    CO
11   6    440285    AZ

I can’t calculate the cumulative percentage for each group in descending order. I want output like this:

 id sales state cumsum run_pct
0    2    857435    AZ    857435     0.5121460996296738
1    6    440285    AZ    1297720    0.7751284195436626
2    4    376480    AZ    1674200    1.0
3    3    928087    CA    928087     0.43024216932985404
4    1    847754    CA    1775841    0.8232436013271356
5    5    381286    CA    2157127    1.0
6    1    946795    CO    946795     0.48955704367618535
7    3    615849    CO    1562644    0.807992624547372
8    5    371339    CO    1933983    1.0
9    4    675593    WA    675593     0.46620721731581655
10   6    411001    WA    1086594    0.7498271371847582
11   2    362532    WA    1449126    1.0

Solution

One possible solution is to first sort the data, calculate the cumsum, and then calculate the percentage.
Sort by status ascending and sales descending:

df = df.sort_values(['state', 'sales'], ascending=[True, False])

Calculate the cumulative sum:

df['cumsum'] = df.groupby('state')['sales'].cumsum()

and percentage:

df['run_pct'] = df.groupby('state')['sales'].apply(lambda x: (x/x.sum()).cumsum())

This will give:

    id  sales   state   cumsum  run_pct
0   4   846079  AZ  846079  0.608566
1   2   312708  AZ  1158787 0.833491
2   6   231495  AZ  1390282 1.000000
3   3   790291  CA  790291  0.506795
4   1   554631  CA  1344922 0.862467
5   5   214467  CA  1559389 1.000000
6   1   983878  CO  983878  0.388139
7   5   779497  CO  1763375 0.695650
8   3   771486  CO  2534861 1.000000
9   6   794407  WA  794407  0.420899
10  2   587843  WA  1382250 0.732355
11  4   505155  WA  1887405 1.000000

Python – Aggregate run percentage within each group and Python in descending order per group

Aggregate run percentage within each group and Python in descending order per group

Solution

Related Problems and Solutions