Python – Groupby/aggregate data frames with non-numeric types

Groupby/aggregate data frames with non-numeric types… here is a solution to the problem.

Groupby/aggregate data frames with non-numeric types

I have the following situation

date_range = pd.date_range('20180101', '20180105')
date_list = list(itertools.chain.from_iterable(itertools.repeat(date, 2) for date in date_range))
num_list = np.random.randint(1,100,size=(10))
date2 = ['2018-12-31']*10

df = pd. DataFrame({'date1':date_list,'numbers':num_list,'date2':date2})

Show this data frame to

      date1        date2    numbers
0   2018-01-01  2018-12-31  38
1   2018-01-01  2018-12-31  2
2   2018-01-02  2018-12-31  8
3   2018-01-02  2018-12-31  51
4   2018-01-03  2018-12-31  16
5   2018-01-03  2018-12-31  22
6   2018-01-04  2018-12-31  43
7   2018-01-04  2018-12-31  76
8   2018-01-05  2018-12-31  47
9   2018-01-05  2018-12-31  50

I want to get a new data frame that is a) grouped by date1, b) sum the values of each date1 in the numeric column, c) keep the date2 value (we can assume it is the same as each date1 or, in this case, the whole data frame

I

can do the following to implement a+b, but if I try to include something like ‘date2′:’mean’ in the aggregate dictionary, it won’t work and return DataError: No numeric types to aggregate

df.groupby(['date1'],as_index=False).agg({'numbers':'sum'})

Any suggestions?

Solution

If date2 is the same for each group, it seems that you need to:

df.groupby(['date1', 'date2'],as_index=False).agg({'numbers':'sum'})

Or press >first Aggregates:

df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':'first'})

But if you need datetime's mean, it’s a bit complicated:

df['date2'] = pd.to_datetime(df['date2'])
f = lambda x: pd.to_datetime(x.values.astype(np.int64).mean())
df1 = df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':f})
print (df1)
       date1  numbers      date2
0 2018-01-01      159 2018-12-31
1 2018-01-02      104 2018-12-31
2 2018-01-03       75 2018-12-31
3 2018-01-04       98 2018-12-31
4 2018-01-05      184 2018-12-31

Related Problems and Solutions