Python – How to group based on the values of a list in a column in a data frame python

How to group based on the values of a list in a column in a data frame python… here is a solution to the problem.

How to group based on the values of a list in a column in a data frame python

I have a Pandas movie data frame like this

id, name,     genre, release_year 
1    A    [a,b,c]     2017
2    B    [b,c]       2017
3    C    [a,c]       2010
4    D    [d,c]       2010
....

I want to group movies based on values in the genre list.
My expected output is:

year, genre, number_of_movies
2017  a       1
2017  b       2
2017  c       2
2010  a       1
2010  c       2 
...

Can someone help me achieve this goal?

Solution

You can create a new DataFrame through the constructor, through stack reshape 。 and used to count >groupby with size :

df1 = (pd. DataFrame(df['genre'].values.tolist(), index=df['release_year'].values)
         .stack()
         .reset_index(name='genre')
         .groupby(['release_year','genre'])
         .size()
         .reset_index(name='number_of_movies'))

print (df1)
   release_year genre  number_of_movies
0          2010     a                 1
1          2010     c                 2
2          2010     d                 1
3          2017     a                 1
4          2017     b                 2
5          2017     c                 2

Related Problems and Solutions