Adds an array to the Pandas data frame… here is a solution to the problem.
Adds an array to the Pandas data frame
I
have a data frame and I want to create a new column and add an array to each row of that new column. I know to do this I have to change the data type of the column to “object”, I tried the following but it doesn’t work
import pandas
import numpy as np
df = pandas. DataFrame({'a':[1,2,3,4]})
df['b'] = np.nan
df['b'] = df['b'].astype(object)
df.loc[0,'b'] = [[1,2,4,5]]
The error is
ValueError: Must have equal len keys and value when setting with an ndarray
However, if I convert the data type of the entire dataframe to “object”, it works :
df = pandas. DataFrame({'a':[1,2,3,4]})
df['b'] = np.nan
df = df.astype(object)
df.loc[0,'b'] = [[1,2,4,5]]
So my question is: why do I have to change the data type of the entire DataFrame?
Solution
Try this :
In [12]: df.at[0,'b'] = [1,2,4,5]
In [13]: df
Out[13]:
a b
0 1 [1, 2, 4, 5]
1 2 NaN
2 3 NaN
3 4 NaN
PS Note that once you put a nonscalar value into any cell – the dtype of the corresponding column will be changed to object
to be able to contain the non-scalar value:
In [14]: df.dtypes
Out[14]:
a int64
b object
dtype: object
PPS In general, storing nonscalar values in cells is a bad idea because the vast majority of Pandas/Numpy methods fail to handle such data correctly.