Java – Nutch about reading EMR from S3

Nutch about reading EMR from S3… here is a solution to the problem.

Nutch about reading EMR from S3

Hello, I’m trying to run Apache Nutch 1.2 on Amazon’s EMR.
To do this, I specified an input directory from S3. I get the following error:

Fetcher: java.lang.IllegalArgumentException:
    This file system object (hdfs://ip-11-202-55-144.ec2.internal:9000)
    does not support access to the request path 
    You possibly called FileSystem.get(conf) when you should have called
    FileSystem.get(uri, conf) to obtain a file system supporting your path.

I understand the difference between FileSystem.get(

uri, conf) and FileSystem.get(conf). If I wrote this myself, I would FileSystem.get(uri, conf) but I’m trying to use existing Nutch code.


asked this question and I was told that the hadoop-site.xml needed to be modified to include the following properties:, fs.s3.awsAccessKeyId, fs.s3.awsSecretAccessKey. I updated these properties in core-site.xml (hadoop-site.xml doesn’t exist), but it’s no different. Does anyone have any other ideas?
Thanks for your help.



to specify in



This mentions to Nutch that S3 should be used by default



A specification that is required only if your S3 object is under authentication (in S3, all users can access the object or can only be authenticated).

Related Problems and Solutions