ElasticSearch如何一次查詢出全部數據——基於Scroll


Elasticsearch 查詢結果默認只顯示10條,可以通過設置fromsize來達到分頁的效果(詳見附3),但是 from + size <= 10,000,因為index.max_result_window 默認值是10,000,而 from+ size 必須小於index.max_result_window 。因此只能用Scroll(一次取一點,分多次取)取出所有的結果

轉載請注明出處:https://www.cnblogs.com/NaughtyCat/p/how-to-search-all-results-once-in-es.html

  • Scroll相當於傳統數據庫的游標,具體代碼片段如下:
		SearchResponse scrollResp = client.prepareSearch(availableIndices)
				.setTypes(type)
				.setScroll(new TimeValue(60000))
				.setQuery(boolQueryBuilder)
				.setSize(SEARCH_HITS_SIZE).get(); //max of SEARCH_HITS_SIZE hits will be returned for each scroll
		//Scroll until no hits are returned
		do {

			for (SearchHit hit : scrollResp.getHits().getHits()) {
					tmpJsonList.add( (JSONObject) JSONValue.parse(hit.getSourceAsString()));
				}
			}
			jsonList.addAll(tmpJsonList);
			tmpJsonList.clear();
			scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
		} while (scrollResp.getHits().getHits().length != 0);

setScroll()里傳入的時間,表示一次處理setSize()中size大小的數據的超時時間,即處理一個分頁最長不超過的時間,上面的代碼表示TimeOut = 1分鍾(詳情可搜索Scroll context。另外,數據量比較大,TimeOut得設長一點,筆者20億左右的數據,至少TimeOut得設置為3min;否則會拋出異常: ElasticSearch: SearchContextMissingException[No search context found for id

)。scrollResp.getScrollId()每次會生成一個ScrollID,如下圖:

 

  • 用from + size循環讀取的代碼片段如下(作者【CoderBaby】)
       int index = 0; 
            {
                tmpJsonList.clear();
                srb.setFrom(Math.multiplyExact(index, SEARCH_HITS_SIZE));
                index++;
                MultiSearchResponse.Item[] items = sr.get().getResponses();
                for (MultiSearchResponse.Item item : items) {
                    SearchResponse response = item.getResponse();
                    SearchHit[] hits = response.getHits().getHits();
                    if (hits.length != 0) {
                        for (SearchHit hit : hits) {
                                tmpJsonList.add((JSONObject) JSONValue.parse(hit.getSourceAsString());
                            }
                        }
                    }
                jsonList.addAll(tmpJsonList);
                }
            } while (tmpJsonList.size() > 0);

其中:SEARCH_HITS_SIZE = 1000, srb是多條件組合查詢,前置代碼如下:

 queryBuilders.forEach(query -> {          
        boolQueryBuilder.must(query);
            });

MultiSearchRequestBuilder sr = client.prepareMultiSearch();
            SearchRequestBuilder srb = client.prepareSearch().setTypes(type).setIndices(availableIndices).setQuery(boolQueryBuilder).setSize(SEARCH_HITS_SIZE);
            sr.add(srb);

查詢條件的構造代碼片段如下(用QueryBuilders根據需要選擇term, range, match等):

	StringUtil.isEmpty(l7p)) {
            queryBuilders.add(QueryBuilders.termQuery(Event.FIELD_L7P, l7p));
        }
  
        if (!StringUtil.isEmpty(startTime) && StringUtil.isEmpty(endTime)) {
            queryBuilders.add(QueryBuilders.rangeQuery(Event.FIELD_START_TIME).from(startTime));
        }

 

附:

1)using scroll in java https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

2)scroll https://www.elastic.co/guide/en/elasticsearch/reference/5.1/search-request-scroll.html

3) from and size https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size

 

*****************************************************************************************************

精力有限,想法太多,專注做好一件事就行

  • 我只是一個程序猿。5年內把代碼寫好,技術博客字字推敲,堅持零拷貝和原創
  • 寫博客的意義在於打磨文筆,訓練邏輯條理性,加深對知識的系統性理解;如果恰好又對別人有點幫助,那真是一件令人開心的事

*****************************************************************************************************


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM