How do I group apps to aggregate back into a data frame in Python Pandas?… here is a solution to the problem.
How do I group apps to aggregate back into a data frame in Python Pandas?
def my_per_group_func(temp):
# apply some tricks here
return a, b, c, d
output = dataframe.groupby('group_id').apply(my_per_group_func)
My question is how to aggregate the “output” back to a data frame with some column names (obviously the index of the data frame is group_id)?
Usually what I do is use aggregate functions
But the problem here is that my_per_group_func is very complex here, and it can’t be done using the usual “aggregate” function syntax
Does anyone know?
Thanks
Solution
It seems that you need to return a DataFrame
or Series
– check flexible apply docs :
dataframe = pd. DataFrame({'group_id':[1,1,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (dataframe)
B C D E F group_id
0 4 7 1 5 7 1
1 5 8 3 3 4 1
2 6 9 5 6 3 3
def my_per_group_func(x):
#print (x)
#some sample operations
a = x.B + x.C
b = x.E + x.B
c = x.D + x.F
d = x.F + x.E
return pd. DataFrame({'group_id': x.group_id, 'a':a, 'b':b, 'c':c, 'd':d})
output = dataframe.groupby('group_id').apply(my_per_group_func)
print (output)
a b c d group_id
0 11 9 8 12 1
1 13 8 7 7 1
2 15 12 8 9 3
def my_per_group_func(x):
#print (x)
#some sample aggregations
a = (x.B + x.C).mean()
b = (x.E + x.B).sum()
c = (x.D + x.F).median()
d = (x.F + x.E).std()
return pd. Series([a,b,c,d], index=['a','b','c','d'])
output = dataframe.groupby('group_id').apply(my_per_group_func)
print (output)
a b c d
group_id
1 12.0 17.0 7.5 3.535534
3 15.0 12.0 8.0 NaN