Python – boto3 S3 object parsing

boto3 S3 object parsing… here is a solution to the problem.

boto3 S3 object parsing

I’m trying to write a Python script to process audio data stored on S3.

I have an S3 object that is being called

def grabAudio(filename, directory):

obj = s3client.get_object(Bucket=bucketname, Key=directory+'/'+filename)

return obj['Body'].read()

Access Data Usage

print(obj['Body'].read())

Produce the correct audio information. Therefore, it can access the data in the bucket well.

When I try to use this data in my audio processing library (pydub), it fails :

audio = AudioSegment.from_wav(grabAudio(filename, bucketname))

Traceback (most recent call last):
File "split_audio.py", line 38, in <module>
audio = AudioSegment.from_wav(grabAudio(filename, bucketname))
File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 544, in from_wav
return cls.from_file(file, 'wav', parameters)
File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 456, in from_file
file.seek(0)
AttributeError: 'bytes' object has no attribute 'seek'

What is the format of the object from s3? Byte array, I guess? If so, is there a way to parse it to .wav format without saving to disk? I tried to avoid saving to disk.

Also open to other audio processing libraries.

Solution

Thanks to Linas for linking similar questions and thanks to Jiaaro for answering.

 import io
    s = io. BytesIO(y['data'])
    AudioSegment.from_file(s).export(x, format='mp3')

Allows me to pull in memory directly from the bucket

obj = s3client.get_object(Bucket=bucketname, Key=customername+'/'+filename)

data = io. BytesIO(obj['Body'].read())
audio = AudioSegment.from_file(data)

Related Problems and Solutions