Spark (Kafka) streaming memory issues

I’m testing my first Spark Streaming pipeline, which processes messages from Kafka. However, after several test runs, I get the following error message
The Java runtime environment is out of memory to continue.

My test data is very small, so this shouldn’t happen. After reviewing the process, I realize that previously submitted spark jobs may not have been completely deleted? enter image description here


usually submit assignments like this, and I’m using Spark 2.2.1
/usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/

Then stop it with ‘Ctrl+C’

The last few lines of the script are as follows:



After I changed the way I submit the Spark Streaming job (command below), I’m still having the same issue where memory isn’t freed after killing the job. I only started Hadoop and Spark for those 4 EC2 nodes.

/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/ --master spark://<master_IP>:7077 --deploy-mode client  ~/


When you press Ctrl-C, only the submitter process is interrupted and the job itself continues to run. Eventually your system runs out of memory to start a new JVM.

Also, even if you restart the cluster, all previously run jobs will be restarted again.

Read how to stop a running Spark application properly .

