Python操作es批量讀取數據


1. Python連接elasticserach

python連接elasticsearch有一下幾種連接方式

pip3 instal elasticsearch

from elasticsearch import Elasticsearch   


es = Elasticsearch()  # 默認連接本地elasticsearch
es = Elasticsearch(["127.0.0.1:9200"])  # 連接本地9200端口
es = Elasticsearch(["192.168.1.10", "192.168.1.11",
"192.168.1.12"],  # 連接集群,以列表的形式存放各節點的IP地址 
sniff_on_start=True,    # 連接前測試    
sniff_on_connection_fail=True,  # 節點無響應時刷新節點   
sniff_timeout=60)    # 設置超時時間

配置可忽略的狀態碼

es = Elasticsearch(['127.0.0.1:9200'],ignore=400)  # 忽略返回的400狀態碼
es = Elasticsearch(['127.0.0.1:9200'],ignore=[400, 405, 502])  # 以列表的形式忽略多個狀態碼

2.Python操作es批量讀取數據

from elasticsearch import Elasticsearch

es = Elasticsearch()

query_json = {
  "query": {
        "match_all": {}  # 獲取所有數據
  }
}
page_num = 100  # 每次獲取數據

query = es.search(index=8, body=query_json, scroll='5m', size=page_num)

results = query['hits']['hits']  # es查詢出的結果第一頁
total = query['hits']['total']  # es查詢出的結果總量
scroll_id = query['_scroll_id']  # 游標用於輸出es查詢出的所有結果
every_num = int(total/page_num)  #

alist = []
for i in range(0, every_num+1):
    # scroll參數必須指定否則會報錯
    query_scroll = es.scroll(scroll_id=scroll_id, scroll='5m')['hits']['hits']
    results += query_scroll
for key in results:
    es_data_dict = key["_source"]["word"]
    # print(es_data_dict)
    alist.append(es_data_dict)
print(len(alist))

參考資料地址:https://blog.csdn.net/fwj_ntu/article/details/87863788?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM