Apache Spark :-Nullpointer Exception on broadcast variables (YARN Cluster mode)… here is a solution to the problem.
Apache Spark :-Nullpointer Exception on broadcast variables (YARN Cluster mode)
I
have a simple spark application where I’m trying to broadcast a variable of type String on a YARN cluster.
But every time I try to access the broadcast variable value, I get null in the task. It would be very helpful if you guys could make suggestions, what I’m doing wrong here.
My code is as follows:-
public class TestApp implements Serializable {
static Broadcast<String[]> mongoConnectionString;
public static void main( String[] args ) {
String mongoBaseURL = args[0];
SparkConf sparkConf = new SparkConf().setAppName(Constants.appName);
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
mongoConnectionString = javaSparkContext.broadcast(args);
JavaSQLContext javaSQLContext = new JavaSQLContext(javaSparkContext);
JavaSchemaRDD javaSchemaRDD = javaSQLContext.jsonFile(hdfsBaseURL+Constants.hdfsInputDirectoryPath);
if(javaSchemaRDD!=null) {
javaSchemaRDD.registerTempTable("LogAction");
javaSchemaRDD.cache();
pageSchemaRDD = javaSQLContext.sql(SqlConstants.getLogActionPage);
pageSchemaRDD.foreach(new Test());
}
}
private static class Test implements VoidFunction<Row> {
private static final long serialVersionUID = 1L;
public void call(Row t) throws Exception {
logger.info("mongoConnectionString "+mongoConnectionString.value());
}
}
}
Solution
This is because your broadcast variable is class-level. Because when the class is initialized in a minion, it does not see the value you assigned in the main method. It will only see a null value because the broadcast variable is not initialized to anything. The solution I found was to pass a broadcast variable to the method when it is called. The same goes for accumulators