Python – Concatenate columns with the same id Pandas DataFrame

Concatenate columns with the same id Pandas DataFrame… here is a solution to the problem.

Concatenate columns with the same id Pandas DataFrame

I have a DataFrame named weather with the following structure:

    STATION     DATE        ELEM    VALUE
0   US1MNCV0008 20170101    PRCP    0
1   US1MNCV0008 20170101    SNOW    0
2   US1MISW0005 20170101    PRCP    0
3   US1MISW0005 20170101    SNOW    0
4   US1MISW0005 20170101    SNWD    0

I would like to use a date and station combination line to get the following:

    STATION     DATE        ELEM  VALUE ELEM  VALUE ELEM VALUE
0   US1MNCV0008 20170101    PRCP  0     SNOW  0
1   US1MISW0005 20170101    PRCP  0     SNOW  0     SNWD    0

I’m trying to achieve this by:

weather.groupby(['station', as_index=False).agg(lambda x: x.tolist())

But this creates a list, which is not what I want. How do I do aggregation?

Solution

You can use:

df = (df.set_index(['STATION','DATE', df.groupby(['STATION','DATE']).cumcount()])
        .unstack()
        .sort_index(axis=1, level=1))
df.columns = ['{}_{}'.format(i, j) for i, j in df.columns]
df = df.reset_index()
print (df)
       STATION      DATE ELEM_0  VALUE_0 ELEM_1  VALUE_1 ELEM_2  VALUE_2
0  US1MISW0005  20170101   PRCP      0.0   SNOW      0.0   SNWD      0.0
1  US1MNCV0008  20170101   PRCP      0.0   SNOW      0.0    NaN      NaN

Explanation:

  1. PRESS STATION AND DATE cumcount Gets the count per group
  2. Created by >set_index MultiIndex
  3. reshape unstack
  4. Expand MultiIndex in the column
  5. Convert index to > reset_index column

Or use GroupBy.apply Create a DaatFrame for each group, and the final solution is the same as above:

df = (df.groupby(['STATION','DATE'])['ELEM','VALUE']
       .apply(lambda x: pd. DataFrame(x.values, columns=x.columns))
       .unstack()
       .sort_index(axis=1, level=1))
df.columns = ['{}_{}'.format(i, j) for i, j in df.columns]
df = df.reset_index()
print (df)
       STATION      DATE ELEM_0 VALUE_0 ELEM_1 VALUE_1 ELEM_2 VALUE_2
0  US1MISW0005  20170101   PRCP       0   SNOW       0   SNWD       0
1  US1MNCV0008  20170101   PRCP       0   SNOW       0    NaN     NaN

Related Problems and Solutions