Python – PyMongo: NatType ValueError when inserting new collections in bulk

PyMongo: NatType ValueError when inserting new collections in bulk… here is a solution to the problem.

PyMongo: NatType ValueError when inserting new collections in bulk

I’m trying to use PyMongo to upload a mixed set of date and text data to a new collection in my remote MongoDB server.

However, because null values are mixed with dates, I get an error that there is a row with a None value instead of a datetime.datetime() object.

As some background: the raw data is stored in a CSV file, which I’m reading into pandas using pandas.read_csv(). DataFrame()。 Once I have the data in pandas, I do some basic cleanup before converting the data to a dictionary list and then upload it into a collection using the standard collection.insert_many() method.

Initially, the values in each line/document/dictionary are stored as strings. However, before uploading the data, I converted some date columns to datetime objects by calling datetime.datetime.strptime() on each value. However, not every dictionary populates these date fields. For these dictionaries, I just use None instead of the datetime object.

Then, I

tried to upload the resulting data being a dictionary list with a mix of many NoneType values, and when I called insert_many() I got it:

ValueError: NaTType does not support utcoffset.

I’m not familiar with utcoffset, and my attempts at research on it confuse me.

Has anyone ever encountered this issue or have suggestions on how to handle lost datetime data in PyMongo?

Here is my code :

import pandas as pd
import pymongo

source = '/path/to/data'
sampleData = pd.read_csv(source, dtype=str)

Date_Columns = [
    'date_a',
    'date_b',
    'date_c',
    'date_d'
]
cleanData = sampleData
for col in Date_Columns:

# Convert the strings to datetime objects for each column.
    # If a value is null, then use a None object instead of a datetime.
    Strings = sampleData[col].values
    Formats = [dt.datetime.strptime(d, '%m/%d/%Y') if isinstance(d, str) else None for d in Strings]
    cleanData[col] = Formats

client = pymongo. MongoClient('XX.XX.XX.XX', 99999)
db = client['my_db']
c = db['my_collection']

# Convert the cleaned DataFrame into a list of dictionaries.
Keys = [key for key in sampleData.columns.values]
Data = [dict(zip(Keys, L)) for L in sampleData.values]

c.insert_many(Data)

and the full backtrace:

Traceback (most recent call last):
  File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 738, in <module>
    setup_db()
  File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 679, in setup_db
    c.insert_many(Data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/collection.py", line 753, in insert_many
    blk.execute(write_concern, session=session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 513, in execute
    return self.execute_command(generator, write_concern, session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 338, in execute_command
    self.is_retryable, retryable_bulk, s, self)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1196, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 333, in retryable_bulk
    retryable, full_result)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 285, in _execute_command
    self.collection.codec_options, bwc)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1273, in _do_bulk_write_command
    namespace, operation, command, docs, check_keys, opts, ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1263, in _do_batched_write_command
    namespace, operation, command, docs, check_keys, opts, ctx)
  File "pandas/_libs/tslibs/nattype.pyx", line 59, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support utcoffset

Solution

Most machines have clocks set to UTC, which is ideal. It is an integer value in seconds from a given date (sometime in the 70s, I believe). This means that your process planning is not dependent on local time, including the headache of daylight saving time.

UTC deviates from EAST by 4-5 hours (depending on daylight saving time).

Look at your error, this is a pandas error, pandas.datetime does not work with datetime.datetime. Convert it to a datetime string with the desired precision. That should avoid this error.

Related Problems and Solutions