簡介:聚合框架有助於根據搜索查詢提供聚合數據,語法定義如下:
"aggregations" : { // 可以簡寫為aggs "<aggregation_name>" : { // 聚合名字,唯一標識符 "<aggregation_type>" : { // 聚合類型 <aggregation_body> // 聚合體,對那些字段聚合 } [,"meta" : { [<meta_data_body>] } ]? // 元 [,"aggregations" : { [<sub_aggregation>]+ } ]? // 聚合里面的子聚合 } [,"<aggregation_name_2>" : { ... } ]* // 另一個聚合名字 }
一、Metric Aggregations(指標聚合):對桶內的文檔進行統計計算
1. Top Hits:獲取文檔前幾條數據,相當於MySQL中limit
A. URL:POST /index/_search?size=0
B. 請求參數
form:開始位置;
size:返回匹配項的最大數量,默認值3;
sort:匹配項的排序方式,默認是按照分數排序。
C. Kibana查詢
D. Java實現
TopHitsAggregationBuilder aggregationBuilder = AggregationBuilders.topHits("top_hits").sort("time", SortOrder.DESC).size(1); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { TopHits topHits = aggregations.get("top_hits"); }
2. Cardinality:統計去重后的文檔數,相當於MySQL中count(distinct(字段))
A. URL:POST /index/_search?size=0
B. 請求參數
field:去重字段名;
script:腳本。
C. Kibana查詢
D. Java實現
CardinalityAggregationBuilder aggregationBuilder = AggregationBuilders.cardinality("cardinality").field("cid"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { Cardinality cardinality = aggregations.get("cardinality"); long count = cardinality.getValue(); }
3. Max:對指定字段求最大值
A. URL:POST /index/_search?size=0
B. 請求參數
field:求最大值字段名;
script:腳本。
C. Kibana查詢
D. Java實現
MaxAggregationBuilder aggregationBuilder = AggregationBuilders.max("max").field("timestamp"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { ParsedMax max = aggregations.get("max");
String timestamp = max.getValueAsString(); }
4. Min:對指定字段求最小值
A. URL:POST /index/_search?size=0
B. 請求參數
filed:求最小值字段名;
script:腳本。
C. Kibana查詢
D. Java實現
MinAggregationBuilder aggregationBuilder = AggregationBuilders.min("min").field("timestamp"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { ParsedMin min = aggregations.get("min");
String timestamp = min.getValueAsString(); }
5. Sum:對指定字段值求和
A. URL:POST /index/_search?size=0
B. 請求參數
filed:求和字段名;
script:腳本。
C. Kibana查詢
D. Java實現
SumAggregationBuilder aggregationBuilder = AggregationBuilders.sum("sum").field("low"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { Sum sum = aggregations.get("low");
Double low = sum.getValue(); }
6. Avg:求均值
A. URL:
B. 請求參數
script:腳本
C. Kibana查詢
D. Java實現
7. Stats:統計,包含Max、Min、Sum、Avg
A. URL:
B. 請求參數
script:腳本
C. Kibana查詢
D. Java實現
8. Value Count:統計文檔數,重復的依然會計數
A. URL:POST /index/_search?size=0
B. 請求參數
field:統計的字段名;
script:腳本。
C. Kibana查詢
D. Java實現
ValueCountAggregationBuilder aggregationBuilder = AggregationBuilders.count("count").field("cid"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { ValueCount valueCount = aggregations.get("count"); long count = valueCount.getValue(); }
二、Bucket Aggregations(桶聚合):滿足特定條件的文檔的集合
1. Terms:對指定字段進行分組統計,相當於MySQL中group by,該聚合不太准確
A. URL:GET /index/_search
B. 請求參數
filed:分組對象名,只適合一個字段;
size:返回文檔的個數,默認值10,size值越大,數據越准確,伴隨成本也越高;
order:指定返回結果的排序方式;
script:腳本,僅限於根據兩個字段進行分組,但這有性能問題,最好不用。
C. Kibana查詢
D. Java實現
// Script script = new Script("doc['data.srcip'].value + '_' + doc['data.dstip'].value");
// TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").script(script).size(Integer.MAX_VALUE);
TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").field("data.ip").size(Integer.MAX_VALUE); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { Terms terms = aggregations.get("terms"); }
2. Filter:對查詢的文檔再進行過濾
A. URL:POST /index/_search?size=0
B. 請求參數:可參考DSL語句查詢
C. Kibana查詢
D. Java實現
FilterAggregationBuilder aggregationBuilder = AggregationBuilders.filter("filter", QueryBuilders.termsQuery("rule", new String[]{"login", "auth", "cca"})); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是解決索引不存在的問題 if (aggregations != null) { Filter filter = aggregations.get("filter"); }
3. Range:按指定區間范圍統計,注意包括from值,不包括to值
A. URL:GET /index/_search
B. 請求參數
field:區間字段名;
to value1:指從*到value1范圍,不包括value1;
from value1 - to value2:指從value1 到value2范圍,包括value1,但不包括value2;
from value2:指從value2到*范圍,包括value2。
C. Kibana查詢
D. Java實現
RangeAggregationBuilder aggregationBuilder = AggregationBuilders.range("range").field("level").addUnboundedTo("1", 6).addRange("2", 6, 11).addUnboundedFrom("3", 11); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是解決索引不存在的問題 if (aggregations != null) { Range range = aggregations.get("range"); }
4. Date histogram:按日期統計日期直方圖數據,適用於日期和日期范圍聚合
A. URL:POST /index/_search?size=0
B. 請求參數
field:日期字段名;
format:時間格式;
calendar_interval:日歷間隔,比如2d;
fixed_interval:固定間隔,比如1000ms;
min_doc_count:最小文檔數,比該值還小就忽略獲取。
C. Kibana查詢
D. Java實現
DateHistogramAggregationBuilder aggregationBuilder = AggregationBuilders.dateHistogram("date_histogram") .field("timestamp") .format("yyyy-MM-dd") .calendarInterval(new DateHistogramInterval("1d")) .minDocCount(1); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); if (aggregations != null) { ParsedDateHistogram histogram = aggregations.get("date_histogram"); }
5. Date range:按日期值的區間范圍統計
A. URL:POST /index/_search?size=0
B. 請求參數
field:日期區間字段名;
format:時間格式;
to value1:指從*到value1范圍,不包括value1;
from value1 - to value2:指從value1 到value2范圍,包括value1,但不包括value2;
C. Kibana查詢
D. Java實現
DateRangeAggregationBuilder dateRangeAggregationBuilder = AggregationBuilders.dateRange("day_range") .field("day") .format("yyyy-MM-dd") .addRange("1", "2020-02-03") .addRange("2", "2020-02-03", "2020-03-10") .addRange("3", "2020-03-10"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的問題 if (aggregations != null) { ParsedDateRange dateRange = aggregations.get("day_range"); }
三、Pipeline Aggregations(管道聚合):是基於其他聚合而非文檔集所產生的輸出,類似數據庫分組后分頁
1. Bucket Sort:是對其父多桶聚合的桶進行排序
A. URL:POST /sales/_search?size=0
B. 請求參數
from:設置值之前的位置的存儲桶將被截斷,默認值為0,注意分頁需是size的整數倍;
size:要返回的存儲桶數,默認為父聚合的所有存儲桶;
sort:定義排序結構,可以多字段
C. Kibana查詢:
D. Java實現: