Spark (Kafka) streaming memory issues
I’m testing my first
Spark Streaming pipeline, which processes messages from
Kafka. However, after several test runs, I get the following error message
The Java runtime environment is out of memory to continue.
usually submit assignments like this, and I’m using
/usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/spark_streaming.py
Then stop it with ‘Ctrl+C’
The last few lines of the script are as follows:
After I changed the way I submit the Spark Streaming job (command below), I’m still having the same issue where memory isn’t freed after killing the job. I only started
Spark for those 4 EC2 nodes.
/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/config.py --master spark://<master_IP>:7077 --deploy-mode client ~/spark_kafka.py
When you press Ctrl-C, only the submitter process is interrupted and the job itself continues to run. Eventually your system runs out of memory to start a new JVM.
Also, even if you restart the cluster, all previously run jobs will be restarted again.