Java – How do I read a file from s3 in EMR?

How do I read a file from s3 in EMR?… here is a solution to the problem.

How do I read a file from s3 in EMR?

I want to read a file from S3 in my EMR Hadoop job. I’m using the custom JAR option.

I tried two solutions:

  • org.apache.hadoop.fs.S3FileSystem: Throws a NullPointerException.
  • com.amazonaws.services.s3.AmazonS3Client: Throws an exception saying “Access Denied”.

What I failed to understand was that I started working from the console, so obviously I should have the necessary permissions. However, the AWS_*_KEY key is missing from the environment variable (System.getenv()) available to the mapper.

I’m sure I’m doing something wrong, just not sure what.

Solution

It may be a little late, but…
Use the InstanceProfileCredentialsProvider for Amazon S3Client.

Related Problems and Solutions