Python – Is it possible to get the groupby style count without collapsing the Pandas DataFrame?

Is it possible to get the groupby style count without collapsing the Pandas DataFrame?… here is a solution to the problem.

Is it possible to get the groupby style count without collapsing the Pandas DataFrame?

I

have a DataFrame with 9 columns and I’m trying to add a column based on the first 3 columns of unique value counts (e.g. columns A, B, and C, which must match to count as unique values, but the rest of the columns may be different. I tried to do this like I would with groupby :

df = pd. DataFrame(resultsFile500.groupby(['chr','start','end']).size().reset_index().rename(columns={0:'count'}))

This returns a DataFrame with 5 columns, and the count is what I want. However, I also need the values in the original data frame, so what I’ve been trying to do is somehow make these count values a column in the original df. Therefore, this means that if two rows in the chr, start, and end columns have the same value, the counts column is 2 in both rows, but they are not collapsed into one row. Is there an easy solution here that I’m missing or do I need to crack something together?

Solution

You can use .transform to get non-folding behavior:

>>> df
   a  b  c  d  e
0  3  4  1  3  0
1  3  1  4  3  0
2  4  3  3  2  1
3  3  4  1  4  0
4  0  4  3  3  2
5  1  2  0  4  1
6  3  1  4  2  1
7  0  4  3  4  0
8  1  3  0  1  1
9  3  4  1  2  1
>>> df.groupby(['a','b','c']).transform('count')
   d  e
0  3  3
1  2  2
2  1  1
3  3  3
4  2  2
5  1  1
6  2  2
7  2  2
8  1  1
9  3  3
>>>

Note that I have to select any column from the .transform result and then just do the following:

>>> df['unique_count'] = df.groupby(['a','b','c']).transform('count')['d']
>>> df
   a  b  c  d  e  unique_count
0  3  4  1  3  0             3
1  3  1  4  3  0             2
2  4  3  3  2  1             1
3  3  4  1  4  0             3
4  0  4  3  3  2             2
5  1  2  0  4  1             1
6  3  1  4  2  1             2
7  0  4  3  4  0             2
8  1  3  0  1  1             1
9  3  4  1  2  1             3

Related Problems and Solutions