最近從同事那里學到了很多ES查詢的新姿勢,總結一波.
總和桶聚合(Sum Bucket Aggregation)
使用場景: 獲取某分組條件下所有桶的指定度量的和
比如: 根據某個條件分組,獲取前1000條數據出現的數量和.
可以用笨辦法定義變量,循環遍歷分組,拿到count再求和的方式,但不夠逼格,既然ES提供了方法,直接調用即可.
傳送門:https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/36aggregationsju-he-fen-679029/363guan-dao-ju-540828-pipeline-aggregations/zong-he-tong-ju-540828-sum-bucket-aggregation
例1-DSL寫法:
"aggs": { "all": { "terms": { "field": "topics", "size": 5 } }, "sum":{ "sum_bucket":{ "buckets_path":"all>_count" } } }
結果:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 11656,
"sum_other_doc_count": 2575137,
"buckets": [
{
"key": "xx",
"doc_count": 129636
},
{
"key": "xxx",
"doc_count": 41586
},
{
"key": "xxxx",
"doc_count": 39196
},
{
"key": "xxxxx",
"doc_count": 38775
},
{
"key": "xxxxxx",
"doc_count": 23163
}
]
},
"sum": {
"value": 272356
}
}
sum的value就是分組的doc_count的和
java操作rest-high-level-client寫法:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder() .query(new MatchAllQueryBuilder()) .size(0) .timeout(TimeValue.timeValueMillis(120000)); TermsAggregationBuilder terms = AggregationBuilders.terms("all").field("topics").size(5); SumBucketPipelineAggregationBuilder sumBucket = new SumBucketPipelineAggregationBuilder("sum", "all>_count"); sourceBuilder.aggregation(terms).aggregation(sumBucket); SearchRequest request = new SearchRequest(xxIndex) .types(xxType) .source(sourceBuilder); SearchResponse response = esClient.getClient().search(request); Map<String, Aggregation> map = response.getAggregations().getAsMap(); double sum = ((ParsedSimpleValue)map.get("sum")).value();
除了count,其他度量條件(數字類型)也可以求和,比如對分組下的某個字段求和,然后獲取所有分組的和
例2-DSL寫法:
"aggs": {
"all": {
"terms": {
"field": "topics",
"size": 5
},
"aggs": {
"friends_cnt": {
"sum": {
"field": "friends_cnt"
}
}
}
},
"sum":{
"sum_bucket":{
"buckets_path":"all>friends_cnt"
}
}
}
結果:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 11656,
"sum_other_doc_count": 2575137,
"buckets": [
{
"key": "xx",
"doc_count": 129636,
"friends_cnt": {
"value": 55291503
}
},
{
"key": "xxx",
"doc_count": 41586,
"friends_cnt": {
"value": 21381248
}
},
{
"key": "xxxx",
"doc_count": 39196,
"friends_cnt": {
"value": 14668921
}
},
{
"key": "xxxxx",
"doc_count": 38775,
"friends_cnt": {
"value": 19805247
}
},
{
"key": "xxxxxx",
"doc_count": 23163,
"friends_cnt": {
"value": 10268415
}
}
]
},
"sum": {
"value": 121415334
}
}
基於java:只需要修改第一個聚合條件,加一個子聚合,然后修改sumbucket的"_count"
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder() .query(new MatchAllQueryBuilder()) .size(0) .timeout(TimeValue.timeValueMillis(120000)); TermsAggregationBuilder terms = AggregationBuilders.terms("all").field("topics").size(5) .subAggregation(new SumAggregationBuilder("friends_cnt").field("friends_cnt")); SumBucketPipelineAggregationBuilder sumBucket = new SumBucketPipelineAggregationBuilder("sum", "all>friends_cnt"); sourceBuilder.aggregation(terms).aggregation(sumBucket); SearchRequest request = new SearchRequest(xxIndex) .types(xxType) .source(sourceBuilder); SearchResponse response = esClient.getClient().search(request); Map<String, Aggregation> map = response.getAggregations().getAsMap(); double sum = ((ParsedSimpleValue)map.get("sum")).value(); return Double.toString(sum);