Python – Get low, high, and average values from columns

Get low, high, and average values from columns… here is a solution to the problem.

Get low, high, and average values from columns

I’m trying to get low, high, and average values from a column. However, I only want to aggregate by column values. For example, if we have 2 rows with the same column value, then we aggregate the two rows together. In addition, they must belong to the same operator. Like this:

Before processing:

carrier   class   price
SP        A       22
VZ        C       33
XM        A       50 
XM        D       20     
SP        A       88
VZ        C       100

After processing:

carrier   class   price   low   high   mean
SP        A       22      22    88     55
VZ        C       33      33    100    66.5
XM        A       50      50    50     50
XM        D       20      20    20     20
SP        A       88      22    88     55
VZ        C       100     33    100    66.5

As you can see, if we have the same operators and the same categories, then we aggregate and get low, high and average. If we have the same operators, but not the same categories, then we will not aggregate, but we will still get the same low, high, average as the category price.

I want the result to be exactly the same as after processing. The result should be a data frame. How can I do this?

Solution

Use > DataFrameGroupBy.agg comes with a tuple list foe with aggregate functions and joins the new column name to the original DataFrame:

d = [('low','min'),('high','max'),('mean','mean')]
df1 = df.join(df.groupby(['carrier','class'])['price'].agg(d), on=['carrier','class'])
print (df1)
  carrier class  price  low  high  mean
0      SP     A     22   22    88  55.0
1      VZ     C     33   33   100  66.5
2      XM     A     50   50    50  50.0
3      XM     D     20   20    20  20.0
4      SP     A     88   22    88  55.0
5      VZ     C    100   33   100  66.5

Details:

print (df.groupby(['carrier','class'])['price'].agg(d))
               low  high  mean
carrier class                 
SP      A       22    88  55.0
VZ      C       33   100  66.5
XM      A       50    50  50.0
        D       20    20  20.0

Or use >transform , interesting solution:

d = [('low','min'),('high','max'),('mean','mean')]
g = df.groupby(['carrier','class'])['price']
for i, j in d:
    df[i] = g.transform(j)
print (df)
  carrier class  price  low  high  mean
0      SP     A     22   22    88  55.0
1      VZ     C     33   33   100  66.5
2      XM     A     50   50    50  50.0
3      XM     D     20   20    20  20.0
4      SP     A     88   22    88  55.0
5      VZ     C    100   33   100  66.5

Related Problems and Solutions