[數據分析]利用pandasticsearch批量讀取ES


1.git地址

https://github.com/onesuper/pandasticsearch

2.建立連接

from pandasticsearch import DataFrame


username = b'xxxx'
password = b'xxxx'

df = DataFrame.from_es(url='IP:9200',
                       index='x'x'x'x',
                       username=username,
                       password=password,
                       doc_type='x'x'x'x',
                       compat=5
                      )
[注] 實測python3 會遇到編碼問題
TypeError: a bytes-like object is required, not 'str'

3.修改源碼

將~/anaconda3/lib/python3.7/site-packages/pandasticsearch/client.py中

    59             if username is not None and password is not None:
    60                 base64creds = base64.b64encode('%s:%s' % (username,password))
    61                 req.add_header("Authorization", "Basic %s" % base64creds)

修改為:

    if username is not None and password is not None:
        base64creds = bytes.decode(base64.b64encode(b'%s:%s' % (username,password)))
        req.add_header("Authorization", "Basic %s" % base64creds)

4.批量查詢數據

limit()函數查詢前20萬條數據,to_pandas()轉成pandas的dataframe

pd_df = df.limit(200000).to_pandas()


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM