Python – pd.notnull Strange null checking behavior

pd.notnull Strange null checking behavior… here is a solution to the problem.

pd.notnull Strange null checking behavior

This is essentially a rehashing of my answer here content

I’m getting some weird behavior when trying to solve this question, using pd.notnull.

Consider

x = ('A4', nan)

I want to check which of these items are empty. Using np.isnan directly throws a TypeError (but I’ve found a workaround).

Using pd.notnull is invalid.

>>> pd.notnull(x)
True

It treats tuples as single values (rather than iterable values). Also, converting it to a list and then testing it will also give the wrong answer.

>>> pd.notnull(list(x))
array([ True,  True])

Since the second value is nan, the result I am looking for should be [True, False]. When you pre-converted to series, it finally worked:

>>> pd. Series(x).notnull() 
0     True
1    False
dtype: bool

Therefore, the solution is to serialize it and then test the values.

Along a similar line, another (admittedly roundabout) solution is to pre-convert to object dtype numpy arrays, and pd.notnull or np.isnan will work directly:

>>> pd.notnull(np.array(x, dtype=object))
Out[151]: array([True,  False])

I imagine pd.notnull directly implicitly converting x to an array of strings, rendering NaN as the string “nan”, so it’s no longer an “empty” value.

Does pd.notnull do the same thing here? Or should I pay attention to something else going on behind the scenes?

Notes

In [156]: pd.__version__
Out[156]: '0.22.0'

Solution

This is the issue associated with this behavior: https://github.com/pandas-dev/pandas/issues/20675 .

In short, if the argument passed to notnull is of type list, it is internally converted to np.array and uses the np.asarray method. This error occurs because, if dtype is not specified, numpy converts np.nan to string(pd. isnull as null value):

a = ['A4', np.nan]
np.asarray(a)
# array(['A4', 'nan'], dtype='<U3')

This issue was fixed in version 0.23.0 by calling np.asarray with dtype=object.

Related Problems and Solutions