Python – How do I get the highest value row after grouping two columns and getting the value count in Pandas Dataframe?

How do I get the highest value row after grouping two columns and getting the value count in Pandas Dataframe?… here is a solution to the problem.

How do I get the highest value row after grouping two columns and getting the value count in Pandas Dataframe?

I use the following line of code to group in two columns:

df.groupby('topic')['category'].value_counts()

I get the following output:

topic                 category     

topic1            Entertainment    1303
                  Science           462
                  Sports            351
                  Economy           270
                  Business          161
                  Technology         92
                  Education          40
                  Politics           18
                  Environment         5

topic2            Politics          134
                  Economy           133
                  Entertainment     110
                  Sports             69
                  Business           68
                  Science            45
                  Technology         22
                  Education           7
                  Environment         2

topic3            Entertainment    1370
                  Sports            533
                  Economy           485
                  Science           335
                  Business          207
                  Politics          180
                  Education         108
                  Technology         97
                  Environment        12

I want to get the topmost row for each topic (which is the most common category) as follows:

topic                 category     

topic1            Entertainment    1303
topic2            Politics          134
topic3            Entertainment    1370

Solution

In pandas, value_counts will sort the values in descending order, so all you need to do is take the highest value from each group and return it. This can be easily done by applying a function:

def top_value_count(x):
    return x.value_counts().head(1)

df.groupby('topic')['category'].apply(top_value_count)

Change 1 to another number to return more values for each topic.

Related Problems and Solutions