DataFrame performance warning… here is a solution to the problem.
DataFrame performance warning
I get a performance warning for Pandas
/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py:1471:
PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->block0_values] [items->['int', 'str']]
I’ve read several issues on github and the issue here, and they all say it’s because I’m mixing genres in one column, but I’m definitely not. A simple example is as follows:
import pandas as pd
df = pd. DataFrame(columns=['int', 'str'])
df = df.append({ 'int': 0, 'str': '0'}, ignore_index=True)
df = df.append({ 'int': 1, 'str': '1'}, ignore_index=True)
for _, row in df.iterrows():
print(type(row['int']), type(row['str']))
# <class 'int'> <class 'str'>
# <class 'int'> <class 'str'>
# however
df.dtypes
# int object
# str object
# dtype: object
# the following causes the warning
df.to_hdf('table.h5', 'table')
What is this about and what can I do?
Solution
Where appropriate, you need to convert the data frame series to a numeric type.
For integers, there are two main ways to achieve this:
# Method 1
df['col'] = df['col'].astype(int)
# Method 2
df['col'] = pd.to_numeric(df['col'], downcast='integer')
This ensures that the data type is natively mapped to the C type, allowing the data to be stored in HDF5 format (used by PyTables) without the need for pickling.