Python – How do I group apps to aggregate back into a data frame in Python Pandas?

How do I group apps to aggregate back into a data frame in Python Pandas?… here is a solution to the problem.

How do I group apps to aggregate back into a data frame in Python Pandas?

def my_per_group_func(temp):

# apply some tricks here
    return a, b, c, d

output =  dataframe.groupby('group_id').apply(my_per_group_func)

My question is how to aggregate the “output” back to a data frame with some column names (obviously the index of the data frame is group_id)?

Usually what I do is use aggregate functions

But the problem here is that my_per_group_func is very complex here, and it can’t be done using the usual “aggregate” function syntax

Does anyone know?

Thanks

Solution

It seems that you need to return a DataFrame or Series – check flexible apply docs :

dataframe = pd. DataFrame({'group_id':[1,1,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (dataframe)
   B  C  D  E  F  group_id
0  4  7  1  5  7         1
1  5  8  3  3  4         1
2  6  9  5  6  3         3

def my_per_group_func(x):
    #print (x)
    #some sample operations
    a = x.B + x.C
    b = x.E + x.B
    c = x.D + x.F
    d = x.F + x.E
    return pd. DataFrame({'group_id': x.group_id, 'a':a, 'b':b, 'c':c, 'd':d})

output =  dataframe.groupby('group_id').apply(my_per_group_func)
print (output)
    a   b  c   d  group_id
0  11   9  8  12         1
1  13   8  7   7         1
2  15  12  8   9         3

def my_per_group_func(x):
    #print (x)
    #some sample aggregations
    a = (x.B + x.C).mean()
    b = (x.E + x.B).sum()
    c = (x.D + x.F).median()
    d = (x.F + x.E).std()
    return pd. Series([a,b,c,d], index=['a','b','c','d'])

output =  dataframe.groupby('group_id').apply(my_per_group_func)
print (output)
             a     b    c         d
group_id                           
1         12.0  17.0  7.5  3.535534
3         15.0  12.0  8.0       NaN

Related Problems and Solutions