Python – Combine rows into a single cell

Combine rows into a single cell… here is a solution to the problem.

Combine rows into a single cell

I currently have a data frame like this (df):

name    info
alpha   foo,bar
alpha   bar,foo
beta    foo,bar
beta    bar,foo
beta    baz,qux

I’m looking to create a data frame like this:

name    info
alpha   (foo,bar),(bar,foo)
beta    (foo,bar),(bar,foo),(baz,qux)

I’m approaching groupby.apply(list). For example.

new_df=df.groupby('name')['info'].apply(list)

However, I can’t seem to figure out how to get the output in raw dataframe format. (That is, there are two columns, as in the example.)

I

think I need reset_index and unstack? Thanks for any help!

Solution

Try using the for loop to do the following:

uniqnames = df.name.unique() # get unique names
newdata = []                 # data list for output dataframe
for u in uniqnames:          # for each unique name
    subdf = df[df.name == u] # get rows with this unique name
    s = ""
    for i in subdf['info']:
        s += "("+i+"),"      # join all info cells for that name
    newdata.append([u, s[:-1]]) # remove trailing comma from infos & add row to data list

newdf = pd. DataFrame(data=newdata, columns=['name','info'])
print(newdf)

The output is exactly what is required:

    name                           info
0  alpha            (foo,bar),(bar,foo)
1   beta  (foo,bar),(bar,foo),(baz,qux)

Related Problems and Solutions