compare pd. series and get unusual-looking results when the series contains None
I wonder why comparing two identical series with a None
value returns False:
pd. Series(['x', 'y', None]) == pd. Series(['x', 'y', None])
0 True
1 True
2 False
dtype: bool
I expect all results to be True. If I create an array from the series and compare it, I get the expected result :
pd. Series(['x', 'y', None]).values == pd. Series(['x', 'y', None]).values
array([ True, True, True])
Why are two identical series of None
not equal to each other? Am I missing something?
I want np.nan
to behave this way because np.nan != np.nan
; However,
None == None
Solution
This is by design :
see the warnings box: http://pandas.pydata.org/pandas-docs/stable/missing_data.html
This was done quite a while ago to make the behavior of nulls
consistent, in that they don’t compare equal. This putsNone
and
np.nan
on an equal (though not-consistent with python, BUT consistent
with numpy) footing.So this is not a bug, rather a consequence of stradling 2 conventions.
I suppose the documentation could be slightly enhanced.
To make series containing null values equal, use pd. Series.equals
:
pd. Series(['x', 'y', None]).equals(pd. Series(['x', 'y', None])) # True