How do I get the highest value row after grouping two columns and getting the value count in Pandas Dataframe?… here is a solution to the problem.
How do I get the highest value row after grouping two columns and getting the value count in Pandas Dataframe?
I use the following line of code to group in two columns:
df.groupby('topic')['category'].value_counts()
I get the following output:
topic category
topic1 Entertainment 1303
Science 462
Sports 351
Economy 270
Business 161
Technology 92
Education 40
Politics 18
Environment 5
topic2 Politics 134
Economy 133
Entertainment 110
Sports 69
Business 68
Science 45
Technology 22
Education 7
Environment 2
topic3 Entertainment 1370
Sports 533
Economy 485
Science 335
Business 207
Politics 180
Education 108
Technology 97
Environment 12
I want to get the topmost row for each topic (which is the most common category) as follows:
topic category
topic1 Entertainment 1303
topic2 Politics 134
topic3 Entertainment 1370
Solution
In pandas, value_counts
will sort the values in descending order, so all you need to do is take the highest value from each group and return it. This can be easily done by applying a function:
def top_value_count(x):
return x.value_counts().head(1)
df.groupby('topic')['category'].apply(top_value_count)
Change 1
to another number to return more values for each topic.