Java – Oozie job failed Mapr 6.x

Oozie job failed Mapr 6.x… here is a solution to the problem.

Oozie job failed Mapr 6.x

I’m trying to submit a Spark job to Oozie in yarn-client mode.
When I run the Spark job outside of oozie, it works fine. But when I submit the oozie job, it keeps failing with the following error:

Exception in thread "main" java.lang.IllegalStateException: basedir job.jar/lib does not exist.
    at org.apache.tools.ant.DirectoryScanner.scan(DirectoryScanner.java:871)
    at org.apache.spark.classpath.ClasspathFilter$$anonfun$resolveClasspath$1.apply(ClasspathFilter.scala:47)
    at org.apache.spark.classpath.ClasspathFilter$$anonfun$resolveClasspath$1.apply(ClasspathFilter.scala:44)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at org.apache.spark.classpath.ClasspathFilter$.resolveClasspath(ClasspathFilter.scala:44)
    at org.apache.spark.classpath.ClasspathFilter$.main(ClasspathFilter.scala:31)
    at org.apache.spark.classpath.ClasspathFilter.main(ClasspathFilter.scala)
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    at org.apache.spark.deploy.SparkSubmitArguments.handleUnknown(SparkSubmitArguments.scala:465)
    at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:178)
    at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 5 more
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

At first, I thought it wouldn’t load hdfs-related dependencies. So I added the hadoop dependency on the classpath and submitted the job. But it didn’t work.

Later, I created a uber jar for my app and tried to run it. The result is still the same.

If I run the same job in the mapr 5.x environment, everything looks fine and the oozie job runs successfully without any issues. But the same work failed on Mapr 6.x env

Has anyone had the same problem? Thanks for your help.

Here are some important details:

Mapr version : 6.0.1
Spark version: 2.2.1
Oozie version: 4.3.0
Hadoop version: 2.7.0

Solution

I was finally able to fix this.

The problem is with mapr-spark.env.sh

The value of MAPR_HADOOP_CLASSPATH here is set to ‘/opt/mapr/spark/spark-2.2.1/bin/mapr-classpath.sh’

I changed the value to MAPR_HADOOP_CLASSPATH=’hadoop classpath’. This loads Hadoop libraries (especially HDFS) correctly and the Oozie job runs successfully.

Related Problems and Solutions