方法一：使用pyhive庫

如上圖所示我們需要四個外部包

中間遇到很多報錯。我都一一解決了

1.Connection Issue: thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

2.安裝sasl 遇到Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools"

解決了點擊

3.遇到

thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'

處理

加上 auth="NOSAL"這個參數

4.我發現上面這個包有的安裝不了我強行用pycharm alt+enter強行按安裝的

最后附上測試代碼

from pyhive import hive
import thrift
import sasl
import thrift_sasl
conn = hive.Connection(host='192.168.154.201', port=10000, database='default',auth='NOSASL')
cursor=conn.cursor()
cursor.execute('select * from a1 limit 10')
for result in cursor.fetchall():
    print (result)

方法二：使用impyla庫

pip install thrift-sasl==0.2.1
pip install sasl
pip install impyla

測試代碼如下：

from impala.dbapi import connect
conn = connect(host='192.168.154.201', port=10000, database='default')
cursor = conn.cursor()
cursor.execute('select * from a1 limit 10')
for result in cursor.fetchall():
    print(result)

方法三：使用ibis庫

# # 1.查詢hdfs數據
from ibis import hdfs_connect
hdfs = hdfs_connect(host='xxx.xxx.xxx.xxx', port=50070)
hdfs.ls('/')
hdfs.ls('/apps/hive/warehouse/ai.db/tmp_ys_sku_season_tag')
hdfs.get('/apps/hive/warehouse/ai.db/tmp_ys_sku_season_tag/000000_0', 'parquet_dir')

# 2.查詢數據到python dataframe
from ibis.impala.api import connect

ImpalaClient = connect('192.168.154.201',10000,database='default')
lists=ImpalaClient.list_databases()
print(lists)
isExist=ImpalaClient.exists_table('a1')

# # 執行SQL
# if(isExist):
#     sql='set mapreduce.job.queuename=A'
#     ImpalaClient.raw_sql(sql)

# 將SQL結果導出到python dataframe
requete = ImpalaClient.sql('select * from a1 limit 10')
df = requete.execute(limit=None)
print(type(df))
print(df)

結果：

官網API：https://docs.ibis-project.org/api.html#impala-client

變成df確實能用pandas和numpy兩個包能做很多事情

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 CentOS6.5下通過Thrift使用Python連接操作hive 安裝配置記錄 python操作hive並且獲取查詢結果scheam 【hive】centos7下apache-hive-3.1.2-bin的安裝測試 centos上安裝rabbitmq並且python測試 Python接口測試之對MySQL的操作（六） Hive入門操作 Hive配置與操作實踐 Hive 之 Java API 操作使用PyHive操作Hive Hive的日志操作