Pandas read_excel : nan values forcing others in the same column to be converted to float
Let’s say I want to read the following Excel file:
What I want is a simple solution (preferably a row) that reads excel in order to convert dates to str
(or at least int
), blank values are nan
or nat
or whatever can be pd.isnull
The value detected.
If I use df = pd.read_excel(file_path),
what I get is
df
Out[8]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203.0 20150605 NaN 20090501.0
delist_date NaN 20170801 NaN NaN
So pandas recognizes blank cells as NaN
, which is great, but annoying that all other values are forced to float64
, even if they are meant to just str
or int
. (EDIT: It seems that if a column, e.g. [1]
column does not have nan
, then other values are not forced to float
.) However, in my case, most of the columns have blank delist_date
because most stocks have an IPO date but have not yet been delisted. )
As far as I know, I tried the dtype=str
keyword arg and it gave me
df
Out[10]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203 20150605 nan 20090501
delist_date nan 20170801 nan nan
Does it look good? That’s right, the date is now str
, but it’s ridiculous that nan
is now a literal string! For example
df.iloc[1, 0]
Out[12]:
'nan'
This will make me have to add some weird things later, like df.replace.
I
didn’t try using the converter
because it requires specifying the data type column by column, and the actual excel file I’m using is a very long spreadsheet (about 3k columns). I also don’t want to transpose the spreadsheet in Excel itself.
Can someone help? Thanks in advance.
Solution
Use dtype=object as a parameter.
Here’s a good explanation: pandas distinction between str and object types