Python – The version of python used by Apache Spark

The version of python used by Apache Spark… here is a solution to the problem.

The version of python used by Apache Spark

Which version of python (2 or 3) does the Apache Spark library support?
If it supports both versions, are there any performance considerations for using python 2 or 3 when using Apache-Spark?

Solution

Since at least Spark 1.2.1, if not set with PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON, the default Python version is 2.7 (see bin/pyspark)。

Python 3 is supported since Spark 1.4.0 (see SPARK-4897 and.) Spark 1.4.0 release notes)。

Choosing one over the other should depend on your requirements. Read Should I use Python 2 or Python 3 for my development activity? Probably wise if you’re not sure. Other than that, it is likely to be a broad and subjective subject of SO.

Related Problems and Solutions