如何解決elaseticsearch獲取數據數量限制問題

本文轉載自查看原文 2018-05-04 16:26 2174

前言

search 我們經常使用，默認一次返回10條數據，並且可以通過 from 和 size 參數修改返回條數並執行分頁操作。但是有時需要返回大量數據，就必須通過scan和scroll實現。兩者一起使用來從Elasticsearch里高效地取回巨大數量的結果而不需要付出深分頁的代價。
詳情參考：https://es.xiaoleilu.com/060_Distributed_Search/20_Scan_and_scroll.html
與上文鏈接不同的是，本文是關於python實現的介紹和描述。

數據說明

索引hz中一共29999條數據，且內容如下。批量導入數據代碼可見：
http://blog.csdn.net/xsdxs/article/details/72849796
這里寫圖片描述

代碼示例

ES客戶端代碼：

# -*- coding: utf-8 -*-

import elasticsearch

ES_SERVERS = [{ 'host': 'localhost', 'port': 9200 }]

es_client = elasticsearch.Elasticsearch( hosts=ES_SERVERS )

search接口搜索代碼：

# -*- coding: utf-8 -*-
from es_client import es_client


def search(search_offset, search_size):
    es_search_options = set_search_optional()
    es_result = get_search_result(es_search_options, search_offset, search_size)
    final_result = get_result_list(es_result)
    return final_result


def get_result_list(es_result):
    final_result = []
    result_items = es_result['hits']['hits']
    for item in result_items:
        final_result.append(item['_source'])
    return final_result


def get_search_result(es_search_options, search_offset, search_size, index='hz', doc_type='xyd'):
    es_result = es_client.search(
        index=index,
        doc_type=doc_type,
        body=es_search_options,
        from_=search_offset,
        size=search_size
    )
    return es_result


def set_search_optional():
    # 檢索選項
    es_search_options = {
        "query": {
            "match_all": {}
        }
    }
    return es_search_options


if __name__ == '__main__':
    final_results = search(0, 1000)
    print len(final_results)

這樣一切貌似ok，正常輸出1000，但是現在改下需求，想搜索其中20000條數據。

if __name__ == '__main__':
    final_results = search(0, 20000)

輸出如下錯誤：

elasticsearch.exceptions.TransportError: TransportError(500, u’search_phase_execution_exception’, u’Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.’)

說明：search接口最多返回1w條數據。所以這里會報錯。
不廢話，基於scan和scroll實現，直接給代碼如下：

# -*- coding: utf-8 -*-
from es_client import es_client
from elasticsearch import helpers


def search():
    es_search_options = set_search_optional()
    es_result = get_search_result(es_search_options)
    final_result = get_result_list(es_result)
    return final_result


def get_result_list(es_result):
    final_result = []
    for item in es_result:
        final_result.append(item['_source'])
    return final_result


def get_search_result(es_search_options, scroll='5m', index='hz', doc_type='xyd', timeout="1m"):
    es_result = helpers.scan(
        client=es_client,
        query=es_search_options,
        scroll=scroll,
        index=index,
        doc_type=doc_type,
        timeout=timeout
    )
    return es_result


def set_search_optional():
    # 檢索選項
    es_search_options = {
        "query": {
            "match_all": {}
        }
    }
    return es_search_options


if __name__ == '__main__':
    final_results = search()
    print len(final_results)

輸出如下：
這里寫圖片描述
把29999條數據全部取出來了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 MongoDB in 數量限制微信開發 access_token 數量限制問題 sql中where in的數量限制 linux 修改文件打開數量限制 EndNote中文文獻導入出錯和數量限制解決微信小程序之頁面打開數量限制 Intellij IDEA 打開文件tab數量限制的調整微信小程序之頁面打開數量限制 Linux進程關於文件描述符的數量限制 Vue並發隊列-最大並發數量限制