Java – Example of parallelizing calls to SparkContext in Java

Example of parallelizing calls to SparkContext in Java… here is a solution to the problem.

Example of parallelizing calls to SparkContext in Java

I started using Spark and ran into problems trying to implement a simple example of the map function. The problem is the definition of “parallelization” in the new version of Spark. Can someone share an example of how to use it because the following method will go wrong due to insufficient parameters.

Spark version: 2.3.2
Java:1.8

SparkSession session = SparkSession.builder().appName("Compute Square of Numbers").config("spark.master","local").getOrCreate();
SparkContext context = session.sparkContext();
List<Integer> seqNumList = IntStream.rangeClosed(10, 20).boxed().collect(Collectors.toList());
JavaRDD<Integer> numRDD = context.parallelize(seqNumList, 2);

Compile-time error message: The method requires 3 parameters

I don’t understand what the third parameter should look like? According to the documentation, it should be

scala.reflect.ClassTag<T>

But how to define or use it?

Please don’t recommend using JavaSparkContext, because I’m wondering how to use the generic SparkContext to use this approach.

Quote: https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/SparkContext.html#parallelize-scala.collection.Seq-int-scala.reflect.ClassTag-

Solution

This is the code that eventually worked for me. It’s not the best way to achieve results, but for me it’s a way to explore the API

SparkSession session = SparkSession.builder().appName(“Compute Square of Numbers”)
.config(“spark.master”, “local”).getOrCreate();

SparkContext context = session.sparkContext();

List<Integer> seqNumList = IntStream.rangeClosed(10, 20).boxed().collect(Collectors.toList());

RDD<Integer> numRDD = context
        .parallelize(JavaConverters.asScalaIteratorConverter(seqNumList.iterator()).asScala()
                .toSeq(), 2, scala.reflect.ClassTag$. MODULE$.apply(Integer.class));

numRDD.toJavaRDD().foreach(x -> System.out.println(x));
session.stop();

Related Problems and Solutions