Python – Pyhive, SASL, and Python 3.5

Pyhive, SASL, and Python 3.5… here is a solution to the problem.

Pyhive, SASL, and Python 3.5

I tried setting up the hive connection as described here: How to Access Hive via Python? Use Hive. Connection to python 3.5.2 (installed on cloudera Linux BDA), but the SASL package seems to be causing problems. I saw on the forum that SASL is only compatible with 2.7 python. Is that correct? What am I missing/doing wrong?

from pyhive import hive
conn = hive. Connection(host="myserver", port=10000)
import pandas as pd

Error message

TTransportException Traceback (most recent call last)
in ()
1 from pyhive import hive
2 #conn = hive. Connection(host="myserver", port=10000)
----> 3 conn = hive. Connection(host="myserver")
4 import pandas as pd

/opt/anaconda3/lib/python3.5/site-packages/pyhive/hive.py in init(self, host, port, username, database, auth, configuration)
102
103 try:
--> 104 self._transport.open()
105 open_session_req = ttypes. TOpenSessionReq(
106 client_protocol=protocol_version,

/opt/anaconda3/lib/python3.5/site-packages/thrift_sasl/init.py in open(self)
70 if not ret:
71 raise TTransportException(type=TTransportException.NOT_OPEN,
---> **72 message=("Could not start SASL: %s" % self.sasl.getError()))**
73
74 # Send initial response

TTransportException: TTransportException(message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'", type=1)

Solution

We (and should say the IT team) found a solution

The python package thrift (to version 0.10.0) and PyHive (to version 0.3.0) upgrades don’t know why we are not using the latest version.

Added the following:

<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>

The following Hive configuration parameters in Cloudera Manager:

HiveServer2 Advanced Configuration Fragment for Hive-site.xml (Safety Valve)
The Hive client advanced configuration fragment (safety valve) for hive-site.xml is required for HUE to work

from pyhive import hive
conn = hive. Connection(host="myserver", auth='NOSASL')
import pandas as pd
import sys

df = pd.read_sql("SELECT * FROM my_table", conn) 
print(sys.getsizeof(df))
df.head()

Works without problems/errors.

The best
Tom

Related Problems and Solutions