Java – Overrides Spark’s libraries in spark commits

Overrides Spark’s libraries in spark commits… here is a solution to the problem.

Overrides Spark’s libraries in spark commits

Our application’s Hadoop cluster has Spark 1.5 installed. However, due to specific requirements, we developed version 2.0.2 of Spark jobs. When I submit a job to yarn, I overwrite the spark library in the cluster with the –jars command. But it still doesn’t have the Scala library jar selected. It throws a false saying

ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala. Predef$. ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
java.lang.NoSuchMethodError: scala. Predef$. ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
    at org.apache.spark.sql.SparkSession$Builder.config(SparkSession.scala:713)
    at org.apache.spark.sql.SparkSession$Builder.appName(SparkSession.scala:704)

Any ideas on how to override cluster libraries during spark commits?

The shell command I used to submit the job is as follows.

spark-submit \
  --jars test.jar,spark-core_2.11-2.0.2.jar,spark-sql_2.11-2.0.2.jar,spark-catalyst_2.11-2.0.2.jar,scala-library-2.11.0.jar \
  --class Application \
  --master yarn \
  --deploy-mode cluster \
  --queue xxx \
  xxx.jar \
  <params>

Solution

It’s simple — Yarn doesn’t care which version of Spark you’re running, it executes jars provided by the yarn client, which are packaged by Spark Submit. This process packages your application jar with the Spark library.

In order to deploy Spark 2.0

instead of the provided 1.5, you can simply install Spark 2.0 on the host where you started working, such as in your home directory, set the YARN_CONF_DIR env vars to point to your hadoop conf, and then use that spark-submit.

Related Problems and Solutions