方法一:使用pyhive庫
如上圖所示我們需要四個外部包
中間遇到很多報錯。我都一一解決了
1.Connection Issue: thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
2.安裝sasl 遇到Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools"
解決了 點擊
3.遇到
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
處理
加上 auth="NOSAL"這個參數
4.我發現上面這個包有的安裝不了 我強行用pycharm alt+enter強行按安裝的
最后附上測試代碼
from pyhive import hive import thrift import sasl import thrift_sasl conn = hive.Connection(host='192.168.154.201', port=10000, database='default',auth='NOSASL') cursor=conn.cursor() cursor.execute('select * from a1 limit 10') for result in cursor.fetchall(): print (result)
方法二:使用impyla庫
pip install thrift-sasl==0.2.1 pip install sasl pip install impyla
測試代碼如下:
from impala.dbapi import connect conn = connect(host='192.168.154.201', port=10000, database='default') cursor = conn.cursor() cursor.execute('select * from a1 limit 10') for result in cursor.fetchall(): print(result)
方法三:使用ibis庫
# # 1.查詢hdfs數據 from ibis import hdfs_connect hdfs = hdfs_connect(host='xxx.xxx.xxx.xxx', port=50070) hdfs.ls('/') hdfs.ls('/apps/hive/warehouse/ai.db/tmp_ys_sku_season_tag') hdfs.get('/apps/hive/warehouse/ai.db/tmp_ys_sku_season_tag/000000_0', 'parquet_dir')
# 2.查詢數據到python dataframe from ibis.impala.api import connect ImpalaClient = connect('192.168.154.201',10000,database='default') lists=ImpalaClient.list_databases() print(lists) isExist=ImpalaClient.exists_table('a1') # # 執行SQL # if(isExist): # sql='set mapreduce.job.queuename=A' # ImpalaClient.raw_sql(sql) # 將SQL結果導出到python dataframe requete = ImpalaClient.sql('select * from a1 limit 10') df = requete.execute(limit=None) print(type(df)) print(df)
結果:
官網API:https://docs.ibis-project.org/api.html#impala-client
變成df確實能用pandas和numpy兩個包能做很多事情