Java – Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster… here is a solution to the problem.

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

I have a Spark standalone cluster with 3 slave nodes on Virtualbox. My code is on Java and it handles my small input dataset very well, and their inputs total about 100MB.

I

set my virtual machine RAM to 16GB, but when I run my code on a large input file (about 2GB), I get this error after hours of processing in my reduce section:

Job aborted due to stage failure: Total size of serialized results of 4 tasks (4.3GB) is bigger than spark.driver.maxResultSize`

I edited spark-defaults.conf and allocated more capacity (2GB and 4GB) to spark.driver.maxResultSize. It didn’t help, the same error appeared.

No, I’m trying 8GB spark.driver.maxResultSize and my spark.driver.memory is also the same size as RAM (16GB). But I get this error :

TaskResultLost (result lost from block manager)

Any comments on this? I also attach a picture.

I don’t know if the problem is caused by the large size of maxResultSize, or if it is caused by the collection of RDDs in the code. I’ve also provided a mapper part of the code for better understanding.

enter image description here

JavaRDD<Boolean[][][]> fragPQ = uData.map(new Function<String, Boolean[][][]>() {
        public Boolean[][][] call(String s) {
            Boolean[][][] PQArr = new Boolean[2][][];
            PQArr[0] = new Boolean[11000][];
            PQArr[1] = new Boolean[11000][];
            for (int i = 0; i < 11000; i++) {
                PQArr[0][i] = new Boolean[11000];
                PQArr[1][i] = new Boolean[11000];
                for (int j = 0; j < 11000; j++) {
                    PQArr[0][i][j] = true;
                    PQArr[1][i][j] = true;

Solution

Usually, this error indicates that you are collecting/transferring large amounts of data into the driver. Never do this. You need to rethink your application logic.

Also, you don’t need to modify spark-defaults.conf to set this property. Instead, you can specify such application-specific properties through the —conf option in spark-shell or spark-submit, depending on how you run it.

Related Problems and Solutions