Python – Apache Spark : Error while starting PySpark

Apache Spark : Error while starting PySpark… here is a solution to the problem.

Apache Spark : Error while starting PySpark

On Centos machines, Python v2.6.6 and Apache Spark v1.2.1

The following error occurred when trying to run ./pyspark

Seems like some problem with python, but can’t figure it out

15/06/18 08:11:16 INFO spark. SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
  File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/shell.py", line 45, in <module>
    sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
  File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py", line 105, in __init__
    conf, jsc)
  File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py", line 157, in _do_init
    self._accumulatorServer = accumulators._start_update_server()
  File "/usr/lib/spark_1.2.1/spark-1.2.1-bin-hadoop2.4/python/pyspark/accumulators.py", line 269, in _start_update_server
    server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
  File "/usr/lib64/python2.6/SocketServer.py", line 402, in __init__
    self.server_bind()
  File "/usr/lib64/python2.6/SocketServer.py", line 413, in server_bind
    self.socket.bind(self.server_address)
  File "<string>", line 1, in bind
socket.gaierror: [Errno -2] Name or service not known
>>> 15/06/18 08:11:16 INFO remote. RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/06/18 08:11:16 INFO remote. RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

Solution

From the logs, it appears that Pyspark cannot understand the host localhost. Check your /etc/hosts file, if localhost is not available, adding an entry should fix the problem.

For example:

[ip] [hostname] localhost

If you cannot change the host entry for the server
Edit /python/pyspark/accumulators as shown below.py line number 269

server

= AccumulatorServer(([server hostname in hosts file]”, 0), _UpdateRequestHandler).

Related Problems and Solutions