Get low, high, and average values from columns
I’m trying to get low, high, and average values from a column. However, I only want to aggregate by column values. For example, if we have 2 rows with the same column value, then we aggregate the two rows together. In addition, they must belong to the same operator. Like this:
Before processing:
carrier class price
SP A 22
VZ C 33
XM A 50
XM D 20
SP A 88
VZ C 100
After processing:
carrier class price low high mean
SP A 22 22 88 55
VZ C 33 33 100 66.5
XM A 50 50 50 50
XM D 20 20 20 20
SP A 88 22 88 55
VZ C 100 33 100 66.5
As you can see, if we have the same operators and the same categories, then we aggregate and get low, high and average. If we have the same operators, but not the same categories, then we will not aggregate, but we will still get the same low, high, average as the category price.
I want the result to be exactly the same as after processing. The result should be a data frame. How can I do this?
Solution
Use > DataFrameGroupBy.agg comes with a tuple list foe with aggregate functions and joins
the new column name to the original DataFrame
:
d = [('low','min'),('high','max'),('mean','mean')]
df1 = df.join(df.groupby(['carrier','class'])['price'].agg(d), on=['carrier','class'])
print (df1)
carrier class price low high mean
0 SP A 22 22 88 55.0
1 VZ C 33 33 100 66.5
2 XM A 50 50 50 50.0
3 XM D 20 20 20 20.0
4 SP A 88 22 88 55.0
5 VZ C 100 33 100 66.5
Details:
print (df.groupby(['carrier','class'])['price'].agg(d))
low high mean
carrier class
SP A 22 88 55.0
VZ C 33 100 66.5
XM A 50 50 50.0
D 20 20 20.0
Or use >transform , interesting solution:
d = [('low','min'),('high','max'),('mean','mean')]
g = df.groupby(['carrier','class'])['price']
for i, j in d:
df[i] = g.transform(j)
print (df)
carrier class price low high mean
0 SP A 22 22 88 55.0
1 VZ C 33 33 100 66.5
2 XM A 50 50 50 50.0
3 XM D 20 20 20 20.0
4 SP A 88 22 88 55.0
5 VZ C 100 33 100 66.5