reshape the list of strings as rows… here is a solution to the problem.
reshape the list of strings as rows
I have a Pandas data frame like this :
df = pandas. DataFrame({
'Grouping': ["A", "B", "C"],
'Elements': ['[\"A1\"]', '[\"B1\", \"B2\", \"B3\"]', '[\"C1\", \"C2\"]']
}).set_index('Grouping')
So
Elements
Grouping
===============================
A ["A1"]
B ["B1", "B2", "B3"]
C ["C1", "C2"]
That is, some lists are encoded as lists of strings. What is a clean way to reshape it into a neat dataset like this:
Elements
Grouping
====================
A A1
B B1
B B2
B B3
C C1
C C2
Not resorting to for loops? The best I can think of:
df1 = pandas. DataFrame()
for index, row in df.iterrows():
df_temp = pandas. DataFrame({'Elements': row['Elements'].replace("[\"", "").replace("\"]", "").split('\", \"')})
df_temp['Grouping'] = index
df1 = pandas.concat([df1, df_temp])
df1.set_index('Grouping', inplace=True)
But it’s ugly.
Solution
You can use .str.extractall()
:
df. Elements.str.extractall(r'"(.+?)"'). reset_index(level="match", drop=True).rename({0:"Elements"}, axis=1)
Result:
Elements
Grouping
A A1
B B1
B B2
B B3
C C1
C C2