【原創】用python連接thrift Server 去執行sql的問題總匯
場景:python和現有產品的結合和應用——python的前瞻性調研
環境:centos7
0、首先確保安裝了python和pyhive,下面是連接代碼:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from pyhive import hive
from TCLIService.ttypes import TOperationState
def pyhiveexesql(sql):
cursor = None
try:
cursor = hive.connect(host='10.19.12.20', port=10015, username='xxx').cursor()
cursor.execute(sql, async=True)
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print message
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
status = cursor.poll().operationState
print cursor.fetchall()
except Exception, e:
print '%s' % (e.message)
finally:
cursor.close()
if __name__ == '__main__':
pyhiveexesql('SELECT * FROM my_awesome_data LIMIT 10')
問題一:ImportError: No module named sasl
解決:找不到saal就裝嘛,執行:sudo pip install sasl,於是爆出第二個問題
問題二:Error:command 'gcc' failed with exit status 1
解決:網上說安裝sasl之前有一些前置支撐包,好吧有安裝了一些,但還是無動於衷,於是瘋狂找了一天終於找到了解決方案:
執行:sudo yum install libffi-devel;sudo yum install libgsasl-devel;sudo yum install libmemcached-devel;
好吧,問題一解決了,但是又爆出問題三:
問題三:ImportError: No module named thrift_sasl
解決方案:sudo yum -y install easy_install; sudo easy_install thrift; sudo pip install thrift_sasl;
好吧問題二和問題三解決了,但是又爆出問題四:
問題四:pyhive.exc.OperationalError: TFetchResultsResp(status=TStatus(errorCode=0, errorMessage=u'Expected state FINISHED, but found ERROR'
具體問題:
[hfb@192 ~]$ python Desktop/pyhive4.py
Traceback (most recent call last):
File "Desktop/pyhive4.py", line 31, in <module>
print cursor.fetchall()
File "/usr/lib/python2.7/site-packages/pyhive/common.py", line 145, in fetchall
one = self.fetchone()
File "/usr/lib/python2.7/site-packages/pyhive/common.py", line 105, in fetchone
self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
File "/usr/lib/python2.7/site-packages/pyhive/common.py", line 45, in _fetch_while
self._fetch_more()
File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 318, in _fetch_more
_check_status(response)
File "/usr/lib/python2.7/site-packages/pyhive/hive.py", line 421, in _check_status
raise OperationalError(response)
pyhive.exc.OperationalError: TFetchResultsResp(status=TStatus(errorCode=0, errorMessage=u'Expected state FINISHED, but found ERROR', sqlState=None, infoMessages=[u'*org.apache.hive.service.cli.HiveSQLException:Expected state FINISHED, but found ERROR:15:14', u'org.apache.hive.service.cli.operation.Operation:assertState:Operation.java:161', u'org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:getNextRowSet:SparkExecuteStatementOperation.scala:107', u'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:220', u'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:685', u'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454', u'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672', u'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', u'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538', u'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', u'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', u'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', u'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', u'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', u'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', u'java.lang.Thread:run:Thread.java:748'], statusCode=3), results=None, hasMoreRows=None)
解決方案:表不存在,哈哈,這個錯誤信息不明顯啊
完。。。

