Python – compare pd. series and get unusual-looking results when the series contains None

compare pd. series and get unusual-looking results when the series contains None… here is a solution to the problem.

compare pd. series and get unusual-looking results when the series contains None

I wonder why comparing two identical series with a None value returns False:

pd. Series(['x', 'y', None]) == pd. Series(['x', 'y', None])

0     True
1     True
2    False
dtype: bool

I expect all results to be True. If I create an array from the series and compare it, I get the expected result :

pd. Series(['x', 'y', None]).values == pd. Series(['x', 'y', None]).values

array([ True,  True,  True])

Why are two identical series of None not equal to each other? Am I missing something?

I want np.nan

to behave this way because np.nan != np.nan; However, None == None

Solution

This is by design :

see the warnings box: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

This was done quite a while ago to make the behavior of nulls
consistent, in that they don’t compare equal. This puts None and
np.nan on an equal (though not-consistent with python, BUT consistent
with numpy) footing.

So this is not a bug, rather a consequence of stradling 2 conventions.

I suppose the documentation could be slightly enhanced.

To make series containing null values equal, use pd. Series.equals :

pd. Series(['x', 'y', None]).equals(pd. Series(['x', 'y', None]))  # True

Related Problems and Solutions