Java – Hadoop “Spill Failed” exception in ec2 instance with 420GB instance store

Hadoop “Spill Failed” exception in ec2 instance with 420GB instance store… here is a solution to the problem.

Hadoop “Spill Failed” exception in ec2 instance with 420GB instance store

I

am using Hadoop 2.3.0 and installing it as a single-node cluster (pseudo-distributed mode) on a CentOS 6.4 Amazon ec2 instance with instance storage of 420GB and 7.5GB RAM, my understanding is that the “overflow failure” exception only occurs when the node runs out of disk space, however, after running the map/reduce task only for a short time (without close to 420 GB of data) I get the following exception.

I would like to mention that I moved the Hadoop installation on the same node

from an 8GB EBS volume (where I originally installed it) to a 420GB instance store volume on the same node and changed the $HADOOP_HOME environment variable and other properties to point to the instance store volume accordingly, and Hadoop 2.3.0 is now fully contained in the 420GB drive.

But I still see the following exception, is there anything other than disk space that can cause the Spill Failed exception?

2014-02-28 15:35:07,630 ERROR [IPC Server handler 12 on 58189] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1393591821307_0013_m_000000_0 - exited : 
java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)

2014-02-28 15:35:07,604 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Spill failed
2014-02-28 15:35:07,605 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)

Solution

I was able to fix this by setting the hadoop.tmp.dir value to some value on the instance store, which by default points to the EBS-backed root volume.

Related Problems and Solutions