boto3 S3 object parsing
I’m trying to write a Python script to process audio data stored on S3.
I have an S3 object that is being called
def grabAudio(filename, directory):
obj = s3client.get_object(Bucket=bucketname, Key=directory+'/'+filename)
return obj['Body'].read()
Access Data Usage
print(obj['Body'].read())
Produce the correct audio information. Therefore, it can access the data in the bucket well.
When I try to use this data in my audio processing library (pydub), it fails :
audio = AudioSegment.from_wav(grabAudio(filename, bucketname))
Traceback (most recent call last):
File "split_audio.py", line 38, in <module>
audio = AudioSegment.from_wav(grabAudio(filename, bucketname))
File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 544, in from_wav
return cls.from_file(file, 'wav', parameters)
File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 456, in from_file
file.seek(0)
AttributeError: 'bytes' object has no attribute 'seek'
What is the format of the object from s3? Byte array, I guess? If so, is there a way to parse it to .wav format without saving to disk? I tried to avoid saving to disk.
Also open to other audio processing libraries.
Solution
Thanks to Linas for linking similar questions and thanks to Jiaaro for answering.
import io
s = io. BytesIO(y['data'])
AudioSegment.from_file(s).export(x, format='mp3')
Allows me to pull in memory directly from the bucket
obj = s3client.get_object(Bucket=bucketname, Key=customername+'/'+filename)
data = io. BytesIO(obj['Body'].read())
audio = AudioSegment.from_file(data)