Java – Spring Boot YARN does not run in Hadoop 2.8.0 and clients cannot access DataNode

Spring Boot YARN does not run in Hadoop 2.8.0 and clients cannot access DataNode… here is a solution to the problem.

Spring Boot YARN does not run in Hadoop 2.8.0 and clients cannot access DataNode

I’m trying to run the Spring Boot YARN sample ( on Windows https://spring.io/guides/gs/yarn-basic/)。 In application.yml, I changed the fsUri and resourceManagerHost to point to my VM host 192.168....
But when I tried to run the application, Exceprion appeared:

DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1508)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1284)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
[2017-05-27 19:59:49.570] boot - 7728  INFO [Thread-5] --- DFSClient: Abandoning BP-646365587-10.0.2.15-1495898351938:blk_1073741830_1006
[2017-05-27 19:59:49.602] boot - 7728  INFO [Thread-5] --- DFSClient: Excluding datanode DatanodeInfoWithStorage[10.0.2.15:50010,DS-f909ec7a-8374-4cdd-9cfc-0e778810d98c,DISK]
[2017-05-27 19:59:49.647] boot - 7728  WARN [Thread-5] --- DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /app/gs-yarn-basic/gs-yarn-basic-container-0.1.0.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

This means that DataNode cannot be accessed from my host. For this reason, I added to the hdfs-site .xml

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>

But it still throws the exception.

I have Hadoop 2.8.0 running on my virtual machine. This is the session. File:

Core site .xml

<configuration>
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://0.0.0.0:9000</value>
   </property>

</configuration>

hdfs-site.xml

    <configuration>
       <property>
           <name>dfs.replication</name>
           <value>1</value>
       </property>
       <property>
           <name>dfs.namenode.name.dir</name>
           <value>/usr/local/hadoop/hadoop-2.8.0/data/namenode</value>
       </property>

<property>
           <name>dfs.datanode.data.dir</name>
           <value>/usr/local/hadoop/hadoop-2.8.0/data/datanode</value>
       </property>

<property>
            <name>dfs.permissions.enabled</name>
            <value>false</value>
        </property>

<property>
           <name>dfs.client.use.datanode.hostname</name>
           <value>true</value>
           <description>Whether clients should use datanode hostnames when
              connecting to datanodes.
           </description>
        </property>
   </configuration>

mapred-site.xml

<configuration>    
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
    <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8192</value>
    </property>
        <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
    </property>
    <property>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-
           disk-percentage</name>
        <value>99</value>
    </property>    
</configuration>

Solution

Your core-site.xml should point to the Namenode address, but currently it points to 0.0.0.0 which means all addresses locally on the machine. This will produce ambiguous results because each machine should be treated as a Namenode.

Namenode should be the only one in the hadoop cluster.

Replacing 0.0.0.0 with Namenode's ip or hostname should solve the problem you’re experiencing.

Related Problems and Solutions