Python - Is it possible to get the groupby style count without collapsing the Pandas DataFrame?

Is it possible to get the groupby style count without collapsing the Pandas DataFrame?… here is a solution to the problem.

Is it possible to get the groupby style count without collapsing the Pandas DataFrame?

have a DataFrame with 9 columns and I’m trying to add a column based on the first 3 columns of unique value counts (e.g. columns A, B, and C, which must match to count as unique values, but the rest of the columns may be different. I tried to do this like I would with groupby :

df = pd. DataFrame(resultsFile500.groupby(['chr','start','end']).size().reset_index().rename(columns={0:'count'}))

This returns a DataFrame with 5 columns, and the count is what I want. However, I also need the values in the original data frame, so what I’ve been trying to do is somehow make these count values a column in the original df. Therefore, this means that if two rows in the chr, start, and end columns have the same value, the counts column is 2 in both rows, but they are not collapsed into one row. Is there an easy solution here that I’m missing or do I need to crack something together?

Solution

You can use .transform to get non-folding behavior:

>>> df
   a  b  c  d  e
0  3  4  1  3  0
1  3  1  4  3  0
2  4  3  3  2  1
3  3  4  1  4  0
4  0  4  3  3  2
5  1  2  0  4  1
6  3  1  4  2  1
7  0  4  3  4  0
8  1  3  0  1  1
9  3  4  1  2  1
>>> df.groupby(['a','b','c']).transform('count')
   d  e
0  3  3
1  2  2
2  1  1
3  3  3
4  2  2
5  1  1
6  2  2
7  2  2
8  1  1
9  3  3
>>>

Note that I have to select any column from the .transform result and then just do the following:

>>> df['unique_count'] = df.groupby(['a','b','c']).transform('count')['d']
>>> df
   a  b  c  d  e  unique_count
0  3  4  1  3  0             3
1  3  1  4  3  0             2
2  4  3  3  2  1             1
3  3  4  1  4  0             3
4  0  4  3  3  2             2
5  1  2  0  4  1             1
6  3  1  4  2  1             2
7  0  4  3  4  0             2
8  1  3  0  1  1             1
9  3  4  1  2  1             3

Python – Is it possible to get the groupby style count without collapsing the Pandas DataFrame?

Is it possible to get the groupby style count without collapsing the Pandas DataFrame?

Solution

Related Problems and Solutions