How do I read a file from s3 in EMR?… here is a solution to the problem.
How do I read a file from s3 in EMR?
I want to read a file from S3 in my EMR Hadoop job. I’m using the custom JAR option.
I tried two solutions:
org.apache.hadoop.fs.S3FileSystem
: Throws aNullPointerException
.com.amazonaws.services.s3.AmazonS3Client
: Throws an exception saying “Access Denied”.
What I failed to understand was that I started working from the console, so obviously I should have the necessary permissions. However, the AWS_*_KEY key is missing from the environment variable (System.getenv())
available to the mapper.
I’m sure I’m doing something wrong, just not sure what.
Solution
It may be a little late, but…
Use the InstanceProfileCredentialsProvider
for Amazon S3Client.