An JAVA_HOME error occurred when upgrading to Spark 1.3.0

An JAVA_HOME error occurred when upgrading to Spark 1.3.0 … here is a solution to the problem.

An JAVA_HOME error occurred when upgrading to Spark 1.3.0

I’m trying to upgrade a Spark project written in Scala from Spark 1.2.1 to 1.3.0, so I changed my build.sbt as follows:

-libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % "provided"
+libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"

Then make an assembly jar and submit:

HADOOP_CONF_DIR=/etc/hadoop/conf \
    spark-submit \
    --driver-class-path=/etc/hbase/conf \
    --conf spark.hadoop.validateOutputSpecs=false \
    --conf spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
    --deploy-mode=cluster \
    --master=yarn \
    --class=TestObject \
    --num-executors=54 \
    target/scala-2.11/myapp-assembly-1.2.jar

Job submission fails, and the terminal receives the following exception:

15/03/19 10:30:07 INFO yarn. Client: 
15/03/19 10:20:03 INFO yarn. Client: 
     client token: N/A
     diagnostics: Application application_1420225286501_4698 failed 2 times due to AM 
     Container for appattempt_1420225286501_4698_000002 exited with  exitCode: 127 
     due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

Finally, I

went to check the web interface of the YARN app master (since the work is there, I know it does at least that) and the only logs it shows are these:

    Log Type: stderr
    Log Length: 61
    /bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory

Log Type: stdout
    Log Length: 0

I’m not sure how to interpret it – {{JAVA_HOME}} is a text (including parentheses) that somehow turns it into a script? Is this from a worker node or a driver? What can I do to experiment and troubleshoot?

I did set JAVA_HOME:in the hadoop config file for all nodes in the cluster

% grep JAVA_HOME /etc/hadoop/conf/*.sh
/etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
/etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31

Has this behavior changed in 1.3.0 since 1.2.1? With 1.2.1 and without any other changes, the job completes normally.

[Note: I originally posted this on the Spark mailing list, if/when I find a solution, I’ll update both places.] ]

Solution

Have you tried setting JAVA_HOME in the etc/hadoop/yarn-env.sh file? Your JAVA_HOME environment variable might not work with the YARN container that runs the job.

I’ve had it happen before that some environment variable in .bashrc on the node is not being read by the yarn worker spawned on the cluster.

The error may not be related to the version upgrade, but to the YARN environment configuration.

Related Problems and Solutions