Concatenate columns with the same id Pandas DataFrame… here is a solution to the problem.
Concatenate columns with the same id Pandas DataFrame
I have a DataFrame named weather with the following structure:
STATION DATE ELEM VALUE
0 US1MNCV0008 20170101 PRCP 0
1 US1MNCV0008 20170101 SNOW 0
2 US1MISW0005 20170101 PRCP 0
3 US1MISW0005 20170101 SNOW 0
4 US1MISW0005 20170101 SNWD 0
I would like to use a date and station combination line to get the following:
STATION DATE ELEM VALUE ELEM VALUE ELEM VALUE
0 US1MNCV0008 20170101 PRCP 0 SNOW 0
1 US1MISW0005 20170101 PRCP 0 SNOW 0 SNWD 0
I’m trying to achieve this by:
weather.groupby(['station', as_index=False).agg(lambda x: x.tolist())
But this creates a list, which is not what I want. How do I do aggregation?
Solution
You can use:
df = (df.set_index(['STATION','DATE', df.groupby(['STATION','DATE']).cumcount()])
.unstack()
.sort_index(axis=1, level=1))
df.columns = ['{}_{}'.format(i, j) for i, j in df.columns]
df = df.reset_index()
print (df)
STATION DATE ELEM_0 VALUE_0 ELEM_1 VALUE_1 ELEM_2 VALUE_2
0 US1MISW0005 20170101 PRCP 0.0 SNOW 0.0 SNWD 0.0
1 US1MNCV0008 20170101 PRCP 0.0 SNOW 0.0 NaN NaN
Explanation:
- PRESS
STATION
ANDDATE
cumcount
Gets the count per group - Created by >set_index
MultiIndex
- reshape
unstack
- Expand
MultiIndex
in the column - Convert
index
to > reset_index column
Or use GroupBy.apply
Create a DaatFrame
for each group, and the final solution is the same as above:
df = (df.groupby(['STATION','DATE'])['ELEM','VALUE']
.apply(lambda x: pd. DataFrame(x.values, columns=x.columns))
.unstack()
.sort_index(axis=1, level=1))
df.columns = ['{}_{}'.format(i, j) for i, j in df.columns]
df = df.reset_index()
print (df)
STATION DATE ELEM_0 VALUE_0 ELEM_1 VALUE_1 ELEM_2 VALUE_2
0 US1MISW0005 20170101 PRCP 0 SNOW 0 SNWD 0
1 US1MNCV0008 20170101 PRCP 0 SNOW 0 NaN NaN