Combine rows into a single cell… here is a solution to the problem.
Combine rows into a single cell
I currently have a data frame like this (df):
name info
alpha foo,bar
alpha bar,foo
beta foo,bar
beta bar,foo
beta baz,qux
I’m looking to create a data frame like this:
name info
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
I’m approaching groupby.apply(list). For example.
new_df=df.groupby('name')['info'].apply(list)
However, I can’t seem to figure out how to get the output in raw dataframe format. (That is, there are two columns, as in the example.)
I
think I need reset_index
and unstack
? Thanks for any help!
Solution
Try using the for loop to do the
following:
uniqnames = df.name.unique() # get unique names
newdata = [] # data list for output dataframe
for u in uniqnames: # for each unique name
subdf = df[df.name == u] # get rows with this unique name
s = ""
for i in subdf['info']:
s += "("+i+")," # join all info cells for that name
newdata.append([u, s[:-1]]) # remove trailing comma from infos & add row to data list
newdf = pd. DataFrame(data=newdata, columns=['name','info'])
print(newdf)
The output is exactly what is required:
name info
0 alpha (foo,bar),(bar,foo)
1 beta (foo,bar),(bar,foo),(baz,qux)