Pandas is applicable when a cell contains a list… here is a solution to the problem.
Pandas is applicable when a cell contains a list
I have a DataFrame
with a column containing a list as cell contents as follows:
import pandas as pd
df = pd. DataFrame({
'col_lists': [[1, 2, 3], [5]],
'col_normal': [8, 9]
})
>>> df
col_lists col_normal
0 [1, 2, 3] 8
1 [5] 9
I want to apply some transformations to each element of col_lists
, for example:
df['col_lists'] = df.apply(
lambda row: [ None if (element % 2 == 0) else element for element in row['col_lists'] ],
axis=1
)
>>> df
col_lists col_normal
0 [1, None, 3] 8
1 [5] 9
For this data frame, it worked as I expected, however, when I applied the same code to other data frames, I got a strange result – for each row, the column contains only the first element of the list:
df2 = pd. DataFrame({
'col_lists': [[1, 2], [5]], # length of first list is smaller here
'col_normal': [8, 9]
})
df2['col_lists'] = df2.apply(
lambda row: [ None if (element % 2 == 0) else element for element in row['col_lists'] ],
axis=1
)
>>> df2
col_lists col_normal
0 1.0 8
1 5.0 9
I have two questions:
(1) What happened here? Why am I getting the correct results in the case of df
instead of df2
?
(2) How to correctly apply certain transformations to a list in a DataFrame
?
Solution
First, I don’t think using list
in pandas is good idea
But if you really need it, try upgrading pandas as it works fine in pandas 0.23.4
for me :
df2['col_lists'] = df2.apply(
lambda row: [ None if (element % 2 == 0) else element for element in row['col_lists'] ],
axis=1
)
print (df2)
col_lists col_normal
0 [1, None] 8
1 [5] 9