Python boto3 How do I configure AWS s3select on Parquet?

Python boto3 How do I configure AWS s3select on Parquet? … here is a solution to the problem.

Python boto3 How do I configure AWS s3select on Parquet?

I’m trying to query Parquet files using the AWS s3select feature. According to The documentation it’s supported, but I’ve tried various configurations and it doesn’t work. In every InputSerialization attempt that I’ve commented out, I’ve listed the errors I received when trying that version. Can anyone tell me how to configure it correctly?

import boto3

S3_BUCKET = 'myBucket'
KEY_LIST = "'0123','6789'"
S3_FILE = 'myFolder/myFile.parquet'

s3 = boto3.client('s3')

r = s3.select_object_content(
        Bucket=S3_BUCKET,
        Key=S3_FILE,
        ExpressionType='SQL',
        Expression="select \"Record\" from s3object s where s.\"Key\" in [" + KEY_LIST + "]",
#        InputSerialization={}, # (MissingRequiredParameter) when calling the SelectObjectContent operation: InputSerialization is required
#        InputSerialization={'CompressionType': { 'NONE' }},    # Invalid type for parameter InputSerialization.CompressionType, value: {'NONE'}, type: <class 'set'>, valid types: < class 'str'>
#        InputSerialization={'Parquet': {}}, # Unknown parameter in InputSerialization: "Parquet", must be one of: CSV, CompressionType, JSON
#        InputSerialization={'CompressionType': { 'Snappy' }},    # Invalid type for parameter InputSerialization.CompressionType, value: {'Snappy'}, type: <class 'set'>, valid types: < class 'str'>

OutputSerialization={'JSON': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

Solution

I need to upgrade my boto3 installation to the latest version. This version works after upgrading to 1.9.7:

InputSerialization={'Parquet': {}},

Related Problems and Solutions