Python – Adds an array to the Pandas data frame

Adds an array to the Pandas data frame… here is a solution to the problem.

Adds an array to the Pandas data frame

I

have a data frame and I want to create a new column and add an array to each row of that new column. I know to do this I have to change the data type of the column to “object”, I tried the following but it doesn’t work

import pandas
import numpy as np

df = pandas. DataFrame({'a':[1,2,3,4]})
df['b'] = np.nan
df['b'] = df['b'].astype(object)
df.loc[0,'b'] = [[1,2,4,5]]

The error is

ValueError: Must have equal len keys and value when setting with an ndarray

However, if I convert the data type of the entire dataframe to “object”, it works :

df = pandas. DataFrame({'a':[1,2,3,4]})
df['b'] = np.nan
df = df.astype(object)
df.loc[0,'b'] = [[1,2,4,5]] 

So my question is: why do I have to change the data type of the entire DataFrame?

Solution

Try this :

In [12]: df.at[0,'b'] = [1,2,4,5]

In [13]: df
Out[13]:
   a             b
0  1  [1, 2, 4, 5]
1  2           NaN
2  3           NaN
3  4           NaN

PS Note that once you put a nonscalar value into any cell – the dtype of the corresponding column will be changed to object to be able to contain the non-scalar value:

In [14]: df.dtypes
Out[14]:
a     int64
b    object
dtype: object

PPS In general, storing nonscalar values in cells is a bad idea because the vast majority of Pandas/Numpy methods fail to handle such data correctly.

Related Problems and Solutions