Elasticsearch 聚合Aggregations API


簡介:聚合框架有助於根據搜索查詢提供聚合數據,語法定義如下:

"aggregations" : {                                      // 可以簡寫為aggs
    "<aggregation_name>" : {                            // 聚合名字,唯一標識符
        "<aggregation_type>" : {                        // 聚合類型
            <aggregation_body>                          // 聚合體,對那些字段聚合
        }
        [,"meta" : {  [<meta_data_body>] } ]?           // 元
        [,"aggregations" : { [<sub_aggregation>]+ } ]?  // 聚合里面的子聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*               // 另一個聚合名字
}

一、Metric Aggregations(指標聚合):對桶內的文檔進行統計計算

  1. Top Hits:獲取文檔前幾條數據,相當於MySQL中limit

    A. URL:POST /index/_search?size=0

    B. 請求參數

      form:開始位置;

      size:返回匹配項的最大數量,默認值3;

      sort:匹配項的排序方式,默認是按照分數排序。

    C. Kibana查詢

    D. Java實現

TopHitsAggregationBuilder aggregationBuilder = AggregationBuilders.topHits("top_hits").sort("time", SortOrder.DESC).size(1);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  TopHits topHits = aggregations.get("top_hits");
}

   2. Cardinality:統計去重后的文檔數,相當於MySQL中count(distinct(字段))

    A. URL:POST /index/_search?size=0

    B. 請求參數

      field:去重字段名;

      script:腳本。

    C. Kibana查詢

    D. Java實現

CardinalityAggregationBuilder aggregationBuilder = AggregationBuilders.cardinality("cardinality").field("cid");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  Cardinality cardinality = aggregations.get("cardinality");
  long count = cardinality.getValue();
}

  3. Max:對指定字段求最大值

    A. URL:POST /index/_search?size=0

    B. 請求參數

      field:求最大值字段名;

      script:腳本。

    C. Kibana查詢

    D. Java實現

MaxAggregationBuilder aggregationBuilder = AggregationBuilders.max("max").field("timestamp");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  ParsedMax max = aggregations.get("max");
  String timestamp = max.getValueAsString(); }

  4. Min:對指定字段求最小值

    A. URL:POST /index/_search?size=0

    B. 請求參數

      filed:求最小值字段名;

      script:腳本。

    C. Kibana查詢

    D. Java實現

MinAggregationBuilder aggregationBuilder = AggregationBuilders.min("min").field("timestamp");
            
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  ParsedMin min = aggregations.get("min");
  String timestamp = min.getValueAsString(); }

  5. Sum:對指定字段值求和

    A. URL:POST /index/_search?size=0

    B. 請求參數

      filed:求和字段名;

      script:腳本。

    C. Kibana查詢

    D. Java實現

SumAggregationBuilder aggregationBuilder = AggregationBuilders.sum("sum").field("low");
            
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  Sum sum = aggregations.get("low");
  Double low = sum.getValue(); }

  6. Avg:求均值 

    A. URL:

    B. 請求參數

      script:腳本

    C. Kibana查詢

    D. Java實現

  7. Stats:統計,包含Max、Min、Sum、Avg

    A. URL:

    B. 請求參數

      script:腳本

    C. Kibana查詢

    D. Java實現

  8. Value Count:統計文檔數,重復的依然會計數  

    A. URL:POST /index/_search?size=0

    B. 請求參數

      field:統計的字段名;

      script:腳本。

    C. Kibana查詢

    D. Java實現

ValueCountAggregationBuilder aggregationBuilder = AggregationBuilders.count("count").field("cid");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  ValueCount valueCount = aggregations.get("count");
  long count = valueCount.getValue();
}

 

二、Bucket Aggregations(桶聚合):滿足特定條件的文檔的集合

  1. Terms:對指定字段進行分組統計,相當於MySQL中group by,該聚合不太准確

    A. URL:GET /index/_search

    B. 請求參數

      filed:分組對象名,只適合一個字段;

      size:返回文檔的個數,默認值10,size值越大,數據越准確,伴隨成本也越高;

      order:指定返回結果的排序方式;

      script:腳本,僅限於根據兩個字段進行分組,但這有性能問題,最好不用。

    C. Kibana查詢

    D. Java實現

 // Script script = new Script("doc['data.srcip'].value + '_' + doc['data.dstip'].value");
 // TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").script(script).size(Integer.MAX_VALUE);

TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").field("data.ip").size(Integer.MAX_VALUE);
        
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  Terms terms = aggregations.get("terms");
}

   2. Filter:對查詢的文檔再進行過濾

    A. URL:POST /index/_search?size=0

    B. 請求參數:可參考DSL語句查詢

    C. Kibana查詢

    D. Java實現

FilterAggregationBuilder aggregationBuilder = AggregationBuilders.filter("filter", QueryBuilders.termsQuery("rule", new String[]{"login", "auth", "cca"}));
        
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是解決索引不存在的問題
if (aggregations != null) {
  Filter filter = aggregations.get("filter");
}

  3. Range:按指定區間范圍統計,注意包括from值,不包括to值

    A. URL:GET /index/_search

    B. 請求參數

      field:區間字段名;

      to value1:指從*到value1范圍,不包括value1;

      from value1 - to value2:指從value1 到value2范圍,包括value1,但不包括value2;

      from value2:指從value2到*范圍,包括value2。

    C. Kibana查詢

    D. Java實現

RangeAggregationBuilder aggregationBuilder = AggregationBuilders.range("range").field("level").addUnboundedTo("1", 6).addRange("2", 6, 11).addUnboundedFrom("3", 11);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是解決索引不存在的問題
if (aggregations != null) {
  Range range = aggregations.get("range");
}

   4. Date histogram:按日期統計日期直方圖數據,適用於日期和日期范圍聚合

    A. URL:POST /index/_search?size=0

    B. 請求參數

      field:日期字段名;

      format:時間格式;

      calendar_interval:日歷間隔,比如2d;

      fixed_interval:固定間隔,比如1000ms;

      min_doc_count:最小文檔數,比該值還小就忽略獲取。

    C. Kibana查詢

    D. Java實現

DateHistogramAggregationBuilder aggregationBuilder = AggregationBuilders.dateHistogram("date_histogram")
  .field("timestamp")
  .format("yyyy-MM-dd")
  .calendarInterval(new DateHistogramInterval("1d"))
  .minDocCount(1);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
if (aggregations != null) {
   ParsedDateHistogram histogram = aggregations.get("date_histogram");
}

  5. Date range:按日期值的區間范圍統計

    A. URL:POST /index/_search?size=0

    B. 請求參數

      field:日期區間字段名;

      format:時間格式; 

      to value1:指從*到value1范圍,不包括value1;

      from value1 - to value2:指從value1 到value2范圍,包括value1,但不包括value2;

    C. Kibana查詢

    D. Java實現

DateRangeAggregationBuilder dateRangeAggregationBuilder = AggregationBuilders.dateRange("day_range")
                    .field("day")
                    .format("yyyy-MM-dd")
                    .addRange("1", "2020-02-03")
                    .addRange("2", "2020-02-03", "2020-03-10")
                    .addRange("3", "2020-03-10");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的問題
if (aggregations != null) {
  ParsedDateRange dateRange = aggregations.get("day_range");
}

 

三、Pipeline Aggregations(管道聚合):是基於其他聚合而非文檔集所產生的輸出,類似數據庫分組后分頁

  1. Bucket Sort:是對其父多桶聚合的桶進行排序

    A. URL:POST /sales/_search?size=0

    B. 請求參數

      from:設置值之前的位置的存儲桶將被截斷,默認值為0,注意分頁需是size的整數倍

      size:要返回的存儲桶數,默認為父聚合的所有存儲桶;

      sort:定義排序結構,可以多字段

    C. Kibana查詢:

    D. Java實現:

 

可參考:ES官網 聚合Aggregation

    ES官網 聚合 Java API


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM