Pandas: How do I subset (and sum) the top N observations in a subcategory?
In a pandas data frame like this:
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121
5 1949 June 135
.
.
.
137 1960 June 535
138 1960 July 622
139 1960 August 606
140 1960 September 508
141 1960 October 461
142 1960 November 390
143 1960 December 432
How do I divide (and count) the maximum 3 months per year for passengers?
You can copy the exact same data frame from the seaborn dataset:
import pandas as pd
import seaborn as sns
df = sns.load_dataset('flights')
df
Here’s what I tried :
According to this Suggestion in post comment I found that after reindexing the dataframe, I can use nlargest()
to subset the dataframe:
df = df.set_index(['year', 'month'])
df2 = df.groupby(level=0)['passengers'].nlargest(3)
df2
Output:
year year month
1949 1949 July 148
August 148
September 136
1950 1950 July 170
August 170
But for some reason, the year index is duplicated and I still need to re-index
, group the dataframe by year and sum the results. This is starting to get confusing, so isn’t there a better way?
Here’s what a simple copy-paste is all about:
import pandas as pd
import seaborn as sns
df = sns.load_dataset('flights')
df = df.set_index(['year', 'month'])
df2 = df.groupby(level=0)['passengers'].nlargest(3)
This is the shape of the desired output:
# Sum of top 3 months for each year (no index other than default pandas dataframe index)
year sum
0 1949 600 (the sum is made up)
1 1950 600
.
.
.
10 1960 600
Thanks for any suggestions!
Edit: System Information:
python 3.6.0
Pandas 0.19.2
Windows 7
Solution
I think < a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.apply.html" rel="noreferrer noopener nofollow" > GroupBy.apply
, Series.nlargest
and sum
:
df = sns.load_dataset('flights')
df2 = df.groupby('year')['passengers'].apply(lambda x: x.nlargest(3).sum()).reset_index()
print (df2)
year passengers
0 1949 432
1 1950 498
2 1951 582
3 1952 690
4 1953 779
5 1954 859
6 1955 1026
7 1956 1192
8 1957 1354
9 1958 1431
10 1959 1579
11 1960 1763