Groupby/aggregate data frames with non-numeric types… here is a solution to the problem.
Groupby/aggregate data frames with non-numeric types
I have the following situation
date_range = pd.date_range('20180101', '20180105')
date_list = list(itertools.chain.from_iterable(itertools.repeat(date, 2) for date in date_range))
num_list = np.random.randint(1,100,size=(10))
date2 = ['2018-12-31']*10
df = pd. DataFrame({'date1':date_list,'numbers':num_list,'date2':date2})
Show this data frame to
date1 date2 numbers
0 2018-01-01 2018-12-31 38
1 2018-01-01 2018-12-31 2
2 2018-01-02 2018-12-31 8
3 2018-01-02 2018-12-31 51
4 2018-01-03 2018-12-31 16
5 2018-01-03 2018-12-31 22
6 2018-01-04 2018-12-31 43
7 2018-01-04 2018-12-31 76
8 2018-01-05 2018-12-31 47
9 2018-01-05 2018-12-31 50
I want to get a new data frame that is a) grouped by date1, b) sum the values of each date1 in the numeric column, c) keep the date2 value (we can assume it is the same as each date1 or, in this case, the whole data frame
I
can do the following to implement a+b, but if I try to include something like ‘date2′:’mean’ in the aggregate dictionary, it won’t work and return DataError: No numeric types to aggregate
df.groupby(['date1'],as_index=False).agg({'numbers':'sum'})
Any suggestions?
Solution
If date2
is the same for each group, it seems that you need to:
df.groupby(['date1', 'date2'],as_index=False).agg({'numbers':'sum'})
Or press >first Aggregates:
df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':'first'})
But if you need datetime's
mean
, it’s a bit complicated:
df['date2'] = pd.to_datetime(df['date2'])
f = lambda x: pd.to_datetime(x.values.astype(np.int64).mean())
df1 = df.groupby(['date1'],as_index=False).agg({'numbers':'sum','date2':f})
print (df1)
date1 numbers date2
0 2018-01-01 159 2018-12-31
1 2018-01-02 104 2018-12-31
2 2018-01-03 75 2018-12-31
3 2018-01-04 98 2018-12-31
4 2018-01-05 184 2018-12-31