PyMongo: NatType ValueError when inserting new collections in bulk
I’m trying to use PyMongo to upload a mixed set of date and text data to a new collection in my remote MongoDB server.
However, because null values are mixed with dates, I get an error that there is a row with a None
value instead of a datetime.datetime() object.
As some background: the raw data is stored in a CSV file, which I’m reading into pandas using pandas.read_csv().
DataFrame()
。 Once I have the data in pandas
, I do some basic cleanup before converting the data to a dictionary list and then upload it into a collection using the standard collection.insert_many()
method.
Initially, the values in each line/document/dictionary are stored as strings. However, before uploading the data, I converted some date columns to datetime objects by calling datetime.datetime.strptime()
on each value. However, not every dictionary populates these date fields. For these dictionaries, I just use
None
instead of the datetime
object.
Then, I
tried to upload the resulting data being a dictionary list with a mix of many NoneType values, and when I called insert_many()
I got it:
ValueError: NaTType does not support utcoffset.
I’m not familiar with utcoffset
, and my attempts at research on it confuse me.
Has anyone ever encountered this issue or have suggestions on how to handle lost datetime data in PyMongo?
Here is my code :
import pandas as pd
import pymongo
source = '/path/to/data'
sampleData = pd.read_csv(source, dtype=str)
Date_Columns = [
'date_a',
'date_b',
'date_c',
'date_d'
]
cleanData = sampleData
for col in Date_Columns:
# Convert the strings to datetime objects for each column.
# If a value is null, then use a None object instead of a datetime.
Strings = sampleData[col].values
Formats = [dt.datetime.strptime(d, '%m/%d/%Y') if isinstance(d, str) else None for d in Strings]
cleanData[col] = Formats
client = pymongo. MongoClient('XX.XX.XX.XX', 99999)
db = client['my_db']
c = db['my_collection']
# Convert the cleaned DataFrame into a list of dictionaries.
Keys = [key for key in sampleData.columns.values]
Data = [dict(zip(Keys, L)) for L in sampleData.values]
c.insert_many(Data)
and the full backtrace:
Traceback (most recent call last):
File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 738, in <module>
setup_db()
File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 679, in setup_db
c.insert_many(Data)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/collection.py", line 753, in insert_many
blk.execute(write_concern, session=session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 513, in execute
return self.execute_command(generator, write_concern, session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 338, in execute_command
self.is_retryable, retryable_bulk, s, self)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1196, in _retry_with_session
return func(session, sock_info, retryable)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 333, in retryable_bulk
retryable, full_result)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 285, in _execute_command
self.collection.codec_options, bwc)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1273, in _do_bulk_write_command
namespace, operation, command, docs, check_keys, opts, ctx)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1263, in _do_batched_write_command
namespace, operation, command, docs, check_keys, opts, ctx)
File "pandas/_libs/tslibs/nattype.pyx", line 59, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support utcoffset
Solution
Most machines have clocks set to UTC
, which is ideal. It is an integer value in seconds from a given date (sometime in the 70s, I believe). This means that your process planning is not dependent on local time, including the headache of daylight saving time.
UTC deviates from EAST by 4-5 hours (depending on daylight saving time).
Look at your error, this is a pandas error, pandas.datetime
does not work with datetime.datetime.
Convert it to a datetime
string
with the desired precision. That should avoid this error.