聚合分析的格式:
"aggregations" : { "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> } [,"meta" : { [<meta_data_body>] } ]? [,"aggregations" : { [<sub_aggregation>]+ } ]? } [,"<aggregation_name_2>" : { ... } ]* }
舉個栗子-------------------------------
GET test_index/doc/_search { "size":0 "aggs": { #聚合關鍵字 "avg_age": { #聚合的名字 "max": { #聚合分析的類型 "field": "age" #body }}}}
聚合分析有四種:
metrics,指標分析聚合
bucket,分桶類型
pipeline,管道分析
matrix,矩陣分析
SELECT COUNT(color) FROM table GROUP BY color # GROUP BY相當於做分桶(bucket)的工作,COUNT是統計指標(metrics)。
- Metrics
單值分析
- min 返回數值類字段的最小值
GET test_index/_search { "size": 0, "aggs": { "min_age": { "min": { "field": "age" } } } } ---> "aggregations": { "min_age": { "value": 10 } }
- max
GET test_index/_search { "size": 0, "aggs": { "max_age": { "max": { "field": "age" } } } } ---------> "aggregations": { "max_age": { "value": 50 } }
- avg
GET test_index/_search { "size": 0, "aggs": { "avg_age": { "avg": { "field": "age" } } } } -------------> "aggregations": { "avg_age": { "value": 24.666666666666668 } }
- sum 求和
GET test_index/_search { "size": 0, "aggs": { "sum_age": { "sum": { "field": "age" } } } } --------------> "aggregations": { "sum_age": { "value": 148 } }
- cardinality 基數,不同值的個數,類似sql里面的distinct count概念
GET test_index/_search { "size": 0, "aggs": { "cardinality_age": { "cardinality": { "field": "age" } } } } -----------> "aggregations": { "cardinality_age": { "value": 5 } }
- value_count 值計數
GET test_index/_search { "size": 0, "aggs": { "count_age": { "value_count": { "field": "name" #值計數 } } } }
- 一次可以多個聚合:
GET syslog-2018.07.13/_search { "size": 0, "aggs": { "min_facility": { #聚合名稱 "min": { #聚合類型 "field": "facility" } }, "max_factility":{ #聚合名稱 "max":{ #聚合類型 "field": "facility" } }, "avg_facility":{ #聚合名稱 "avg": { #聚合類型 "field": "facility" } }, "sum_facility":{ "sum": { "field": "facility" }}}}
多值分析
- stats 返回一系列數值類型的統計值。min,max,sum,count,avg
GET test_index/_search { "size": 0, "aggs": { "stats_age": { "stats": { "field": "age" }}}} #結果------------> "aggregations": { "stats_age": { "count": 6, "min": 10, "max": 50, "avg": 24.666666666666668, "sum": 148 }}
- extended stats 在stats的基礎上加了方差標准差等
GET test_index/_search { "size": 0, "aggs": { "age": { "extended_stats": { "field": "age" }}}} -----------> "aggregations": { "age": { "count": 6, "min": 10, "max": 50, "avg": 24.666666666666668, "sum": 148, "sum_of_squares": 4560, "variance": 151.55555555555557, "std_deviation": 12.310790208412925, "std_deviation_bounds": { "upper": 49.28824708349252, "lower": 0.04508624984081777 }}}
- percentiles 百分位數統計,了解數據分布情況
GET test_index/_search { "size": 0, "aggs": { "age": { "percentiles": { "field": "age", "percents": [1,5,25,50,75,95,99] # 不加percents默認為[1,5,25,50,75,95,99] }}}} ---------> "aggregations": { "age": { "values": { "1.0": 10, "5.0": 10, "25.0": 18, "50.0": 23, "75.0": 24, "95.0": 50, "99.0": 50 }}}
-
計算第p百分位數的步驟: 第1步:以遞增順序排列原始數據(即從小到大排列)。 第2步:計算指數i=np% (n等於count,p%等於百分位) 第3步: l)若 i 不是整數,將 i 向上取整。 2) 若i是整數,則第p百分位數是第i項與第(i+l)項數據的平均值。
-
- percentiles ranks 字段的數值所占的百分位是多少
GET test_index/_search { "size": 0, "aggs": { "agg": { "percentile_ranks": { "field": "age", "values": [24,50] }}}} -----------> "aggregations": { "agg": { "values": { "24.0": 66.66666666666666, "50.0": 100 }}}
- top hits 用於分桶后獲取桶內最匹配的頂部文檔
# 選項:from,size,sort
# 按照host分組,分組后取出每組里面時間最近的一條數據 GET syslog-2018.07.12/_search { "size": 0, "aggs": { "host": { "terms": { "field": "host" }, "aggs": { "sort_date": { "top_hits": { "size":1, "sort": [ { "@timestamp": { "order":"desc" } } ], "_source": { "includes": ["host","message"] }}}}}}}}
- Bucket 按照一定規則將文檔分配到不同的桶里,分類分析
- terms 每個唯一值一個桶,返回字段的值和值的個數doc_count。 如果是text類型,則按照分詞后的結果分桶
GET test_index/_search { "size": 0, "aggs": { "agg": { "terms": { "field": "age", "size": 5 #默認情況,返回按排序的前十個。可用size來更改。 }}}} ---------> "buckets": [ { "key": 24, "doc_count": 2 }, { "key": 10, "doc_count": 1 }, { "key": 18, "doc_count": 1 }, { "key": 22, "doc_count": 1 }, { "key": 50, "doc_count": 1 } ]
- range 指定數值范圍來設定分桶
GET test_index/_search { "size": 0, "aggs": { "agg": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }]}}}} --------------------> "aggregations": { "agg": { "buckets": [ { "key": "20.0-30.0", "from": 20, "to": 30, "doc_count": 3 }]}} #示例2: 可以設定這里的key GET syslog-2018.07.13/_search { "size": 0, "aggs": { "priority_range": { "range": { "field": "priority", "ranges": [ { "key": "<50", "to": 50 }, { "from": 50, "to": 80 }, { "key": ">80", "from": 80 }]}}}} -------------------> "aggregations": { "priority_range": { "buckets": [ { "key": "<50", "to": 50, "doc_count": 1990 }, { "key": "50.0-80.0", "from": 50, "to": 80, "doc_count": 31674 }, { "key": ">80", "from": 80, "doc_count": 5828 } ] }
- date_range
#跟range的區別是 date range可以設置date match expression,+1h,-1d等。還可以指定返回字段的日期格式 format GET syslog-2018.07*/_search { "size": 0, "aggs": { "timestamp_range":{ "date_range": { "field": "@timestamp", "format": "yyyy/MM/dd", #可以設置日期格式 "ranges": [ { "from": "now-10d/d", #可以使用date match "to": "now-5d/d" }, { "from": "now-5d/d" } ]}}}} ------------------> "aggregations": { "timestamp_range": { "buckets": [ { "key": "2018/07/03-2018/07/08", "from": 1530576000000, "from_as_string": "2018/07/03", "to": 1531008000000, "to_as_string": "2018/07/08", "doc_count": 739175 }, { "key": "2018/07/08-*", "from": 1531008000000, "from_as_string": "2018/07/08", "doc_count": 760635 } ] } }
- histogram 直方圖
GET test_index/_search { "size": 0, "aggs": { "age": { "histogram": { #關鍵字 "field": "age", "interval": 10 #指定間隔大小 "extended_bounds": #指定數據范圍 { "min": 0, "max": 50 }}}}} #結果---------------> "buckets": [ { "key": 10, "doc_count": 2 }, { "key": 20, "doc_count": 3 }, { "key": 30, "doc_count": 0 }, { "key": 40, "doc_count": 0 }, { "key": 50, "doc_count": 1 }]
-
date_histogram 日期直方圖
GET syslog-2018.07.1*/_search { "size": 0, "aggs": { "range": { "date_histogram": { "field": "@timestamp", "format": "yyyy/MM/dd", #設置返回日期格式 "interval": "day" # 以年月日小時分鍾為間隔 }}}} ---------------> "aggregations": { "range": { "buckets": [ { "key_as_string": "2018/07/10", "key": 1531180800000, "doc_count": 146354 }, { "key_as_string": "2018/07/11", "key": 1531267200000, "doc_count": 143784 }, { "key_as_string": "2018/07/12", "key": 1531353600000, "doc_count": 143137 }, { "key_as_string": "2018/07/13", "key": 1531440000000, "doc_count": 43206 } ] }
- filter 給聚合加過濾條件
GET test_index/_search { "size": 0, "aggs": { "salary": { "filter": { #先過濾 "range": { "salary": { "gte": 8000 } } }, "aggs": { "avg_age": { #后聚合 "avg": { "field": "age" }}}}}} ---------------------> "aggregations": { "salary": { "doc_count": 4, "avg_age": { "value": 23.25 } } }
- filters
-
GET /logs/_search { "aggs": { "count_debug":{ #agg name "filters": { #關鍵字 "filters": { "error": { # 過濾器名字 "match":{ #查詢語句關鍵字match "body":"error" #匹配body字段中帶有error的 } }, "warnings":{ #過濾器名字 "term":{ #查詢語句關鍵字term "body":"warning" #匹配body字段中帶有warning的 }}}}}}} ---------結果--------》 "buckets": { "error": { "doc_count": 1 }, "warnings": { "doc_count": 2 }}
-
nested 嵌套類型聚合
PUT test_index { "mappings": { "doc": { "properties": { "man":{ "type": "nested", #設置man字段為nested類型 "properties": { #子字段 "age":{ "type":"integer" }, "name":{ "type":"text" }}}}}}}} PUT test_index/doc/1 { "man":[ { "name":"alice white", "age":34 }, { "name":"peter brown", "age":26 } ] } GET test_index/_search { "size": 0, "aggs": { #聚合關鍵字 "man": { #聚合名字 "nested": { #關鍵字 "path": "man" #嵌套字段 }, "aggs": { "avg_age": { "avg": { "field": "man.age" #子字段 } } } } } }
嵌套聚合
- bucket+bucket

GET bank/_search { "size": 0, "aggs": { "state": { #名字 "terms": { #關鍵字 "field": "state.keyword" #按照不同國家分桶 }, "aggs": { #嵌套 "range_age": { #名字 "range": { #關鍵字 "field": "age", "ranges": [ { "from": 20, "to": 30 } ] } } } } } } --------------------------> "aggregations": { "state": { "doc_count_error_upper_bound": 20, "sum_other_doc_count": 770, "buckets": [ { "key": "ID", "doc_count": 27, "range_age": { "buckets": [ { "key": "20.0-30.0", "from": 20, "to": 30, "doc_count": 9 } ] } }, { "key": "TX", "doc_count": 27, "range_age": { "buckets": [ { "key": "20.0-30.0", "from": 20, "to": 30, "doc_count": 17 } ] } }, { "key": "AL", "doc_count": 25, "range_age": { "buckets": [ { "key": "20.0-30.0", "from": 20, "to": 30, "doc_count": 12 } ] } },................
- bucket+metrics

GET bank/_search { "size": 0, "aggs": { "state": { #桶名字 "terms": { #bucket聚合分析,按國家名分桶 "field": "state.keyword" }, "aggs": { #嵌套 "avg_age": { #桶名字 "avg": { # metric聚合分析,求不同桶的age平均值 "field": "age" }}}}}} #結果-------------> "aggregations": { "state": { "doc_count_error_upper_bound": 20, "sum_other_doc_count": 770, "buckets": [ { "key": "ID", "doc_count": 27, "avg_age": { "value": 31.59259259259259 } }, { "key": "TX", "doc_count": 27, "avg_age": { "value": 28.77777777777778 } }, { "key": "AL", "doc_count": 25, "avg_age": { "value": 29.16 } }, { "key": "MD", "doc_count": 25, "avg_age": { "value": 31.04 } }, { "key": "TN", "doc_count": 23, "avg_age": { "value": 30.91304347826087 } }, { "key": "MA", "doc_count": 21, "avg_age": { "value": 27.761904761904763 } }, { "key": "NC", "doc_count": 21, "avg_age": { "value": 31.333333333333332 } }
聚合分析的作用范圍:
- filter 只為某個聚合分析設定過濾條件,不改變整體過濾條件
# filter過濾條件只作用於host_priority_little聚合,不作用於host聚合 GET syslog-2018.07.13/_search { "size": 0, "aggs": { "host_priority_little": { "filter": { "range": { "priority": { "to":50 } } }, "aggs": { "host": { "terms": { "field": "host", "size": 2 } } } }, "host":{ "terms": { "field": "host", "size":2 } } } } -------------> "aggregations": { "host_priority_little": { "doc_count": 2530, "host": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 196, "buckets": [ { "key": "10.10.14.16", "doc_count": 2108 }, { "key": "10.10.12.171", "doc_count": 226 } ] } }, "host": { "doc_count_error_upper_bound": 593, "sum_other_doc_count": 37640, "buckets": [ { "key": "10.10.14.248", "doc_count": 7198 }, { "key": "10.10.14.4", "doc_count": 6494 } ] } }
- post_filter 過濾文檔,但不影響聚合
GET syslog-2018.06.13/_search { "size":0, "aggs": { "host": { "terms": { "field": "host" } } }, "post_filter": { "range": { "priority": { "gte": 100 }}}} ----------------> "hits": { "total": 106, #post_filter只作用於命中的文檔數,跟聚合無關 "max_score": 0, "hits": [] }, "aggregations": { #聚合不管post_filter的過濾條件 "host": { "doc_count_error_upper_bound": 430, "sum_other_doc_count": 53134, "buckets": [ { "key": "10.10.14.248", "doc_count": 19590 }, { "key": "172.16.10.37", "doc_count": 17625 }, ........... # 如果使用的query,則是命中文檔數,並作用於聚合分析 GET syslog-2018.06.13/_search { "size":0, "aggs": { "host": { "terms": { "field": "host" } } }, "query": { "range": { "priority": { "gte": 100 }}}} --------------------> "hits": { "total": 106, #根據query條件 命中的文檔數 "max_score": 0, "hits": [] }, "aggregations": { # 對query查詢后的文檔進行聚合 "host": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "172.16.10.253", "doc_count": 106 } ] } }
- global 無視query過濾條件,基於全部文檔分析
GET test_index/_search { "size": 0, "query": { "match": { "name": "lin" } }, "aggs": { "lin_age_avg": { "avg": { "field": "age" } }, "all":{ "global": {}, "aggs": { "avg_age": { "avg": { "field": "age" }}}}}} ----------------------------> "aggregations": { "all": { "doc_count": 4, "avg_age": { "value": 25.25 } }, "lin_age_avg": { "value": 24.5 } } }
聚合分析的排序:
- 使用自帶的關鍵數據進行排序
-
_count 按照文檔數doc_count排序 。如果不指定order則默認按_count倒排
-
_key 按照key值排序
-
-
GET test_index/_search { "size": 0, "aggs": { "age": { "terms": { "field": "salary" }}}} -#-----------------> 不指定order時默認情況下,使用_count倒排 "aggregations": { "age": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 8000, "doc_count": 3 }, { "key": 5000, "doc_count": 2 }, { "key": 4000, "doc_count": 1 }, { "key": 9000, "doc_count": 1 } ] } } # 使用排序的_term類型 ,按照key進行排序 GET test_index/_search { "size": 0, "aggs": { "age": { "terms": { "field": "salary", "order": { # order關鍵詞 "_key": "asc" #按照key升序排序 }}}}}} ------------------> "aggregations": { "age": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 4000, "doc_count": 1 }, { "key": 5000, "doc_count": 2 }, { "key": 8000, "doc_count": 3 }, { "key": 9000, "doc_count": 1 } ] }
- 可以使用子聚合的結果進行排序
GET test_index/_search { "size": 0, "aggs": { "salary": { "terms": { "field": "salary", "order": { "avg_age": "asc" # 按照子聚合里面的avg_age升序排序 } }, "aggs": { #子聚合 "avg_age": { "avg": { "field": "age" }}}}}}} -----------------> "buckets": [ { "key": 9000, "doc_count": 1, "avg_age": { "value": 21 } }, { "key": 4000, "doc_count": 1, "avg_age": { "value": 22 } }, { "key": 8000, "doc_count": 3, "avg_age": { "value": 24 } }, { "key": 5000, "doc_count": 2, "avg_age": { "value": 25.5 } } ] # 在有多值指標的情況下,需要修改一下 GET test_index/_search { "size": 0, "aggs": { "salary": { "terms": { "field": "salary", "order": { "stats_age.avg": "asc" # 按照 stats中的avg 排序,使用. } }, "aggs": { "stats_age": { "stats": { "field": "age" }}}}}} -------------> "buckets": [ { "key": 9000, "doc_count": 1, "stats_age": { "count": 1, "min": 21, "max": 21, "avg": 21, "sum": 21 } }, { "key": 4000, "doc_count": 1, "stats_age": { "count": 1, "min": 22, "max": 22, "avg": 22, "sum": 22 } }, ..................... ] #例2: GET test_index/_search { "size": 0, "aggs": { "salary": { "terms": { "field": "salary", "order": { "filter_age>stats_age.sum": "asc" #注意這里的寫法 } }, "aggs": { #子聚合 "filter_age":{ "filter": { #過濾age>21的 "range": { "age": { "gt": 21 } } }, "aggs": { "stats_age": { "stats": { #多值分析 "field": "age" }}}}}}}} -------------------------> "buckets": [ { "key": 9000, "doc_count": 1, "filter_age": { "doc_count": 0, "stats_age": { "count": 0, "min": null, "max": null, "avg": null, "sum": null } } }, { "key": 4000, "doc_count": 1, "filter_age": { "doc_count": 1, "stats_age": { "count": 1, "min": 22, "max": 22, "avg": 22, "sum": 22 } } }, { "key": 5000, "doc_count": 2, "filter_age": { "doc_count": 2, "stats_age": { "count": 2, "min": 23, "max": 28, "avg": 25.5, "sum": 51 } } }, { "key": 8000, "doc_count": 3, "filter_age": { "doc_count": 3, "stats_age": { "count": 3, "min": 22, "max": 26, "avg": 24, "sum": 72 } } } ]
- Pipeline 根據輸出位置的不同 分為兩類:
-parent 結果內嵌到現有的聚合分析結果中
- derivative 求導 計算父級histgram(date_histgram)中指定指標的導數
- cumulati average 累計總和 計算父histgram(date_histgram)中指定指標的累計總和。
- moving average 移動平均值 聚合將動態移動數據窗口,生成該窗口數據的平均值。
-sibling 結果與現有聚合結果同級
- max_bucket /min_bucket / avg_bucket / sum_bucket
{ "max_bucket": { "buckets_path": "the_sum" } }
GET bank/_search { "size": 0, "aggs": { "state": { #聚合名字 "terms": { # 按照不同國家分桶 "field": "state.keyword" }, "aggs": { "avg_age": { # 聚合名字 "avg": { # 各個桶的age的平均值 "field": "age" } } } }, "max_state_age":{ #pipeline聚合名字, 跟state聚合同級 "max_bucket": { #關鍵字 "buckets_path": "state>avg_age" # 各個國家平均值中的最大值 >state表示包含在state里的avg_age } } } } ------------------------> "aggregations": { "state": { "doc_count_error_upper_bound": 20, "sum_other_doc_count": 770, "buckets": [ { "key": "MA", "doc_count": 21, "avg_age": { "value": 27.761904761904763 } }, { "key": "NC", "doc_count": 21, "avg_age": { "value": 31.333333333333332 } }, { "key": "ND", "doc_count": 21, "avg_age": { "value": 31.238095238095237 }}] }, "max_state_age": { "value": 31.59259259259259, "keys": [ "ID" ]}}
-
sql語句: SELECT COUNT(DISTINCT mac,ip) FROM test 對應的es語句: GET nginx-access-log-2018.07.24/_search { "size": 0, "aggs": { "beat": { "terms": { #先對beat分桶 "field": "beat.name" }, "aggs": { "ip": { #桶內ip去重 "cardinality": { "field": "clientip" } } } }, "sum_beat_ip":{ #不同桶的ip總數 "sum_bucket": { "buckets_path": "beat>ip" } } } }
- stats_bucket / extended_stats_bucket
GET bank/_search { "size": 0, "aggs": { "state": { "terms": { "field": "state.keyword" }, "aggs": { "min_age": { "min": { "field": "age" } } } }, "stats_state_age":{ "stats_bucket": { "buckets_path": "state>min_age" } } } } ---------------> "aggregations": { "state": { "doc_count_error_upper_bound": 20, "sum_other_doc_count": 770, "buckets": [ { "key": "ID", "doc_count": 27, "min_age": { "value": 21 } }, { "key": "MD", "doc_count": 25, "min_age": { "value": 20 } }, { "key": "ND", "doc_count": 21, "min_age": { "value": 21 } }, { "key": "ME", "doc_count": 20, "min_age": { "value": 21 } } ] }, "stats_state_age": { "count": 10, "min": 20, "max": 22, "avg": 20.6, "sum": 206 } } }
{ "stats_bucket": { "buckets_path": "the_sum" }
} - percentiles
GET bank/_search { "size": 0, "aggs": { "state": { "terms": { "field": "state.keyword" }, "aggs": { "avg_age": { "avg": { "field": "age" } } } }, "percen_state_age":{ "percentiles_bucket":{ "buckets_path":"state>avg_age" } } } } --------> "aggregations": { "state": { "doc_count_error_upper_bound": 20, "sum_other_doc_count": 770, "buckets": [ { "key": "ID", "doc_count": 27, "avg_age": { "value": 31.59259259259259 } }, { "key": "TX", "doc_count": 27, "avg_age": { "value": 28.77777777777778 } },.................. ] }, "percen_state_age": { "values": { "1.0": 27.761904761904763, "5.0": 27.761904761904763, "25.0": 28.77777777777778, "50.0": 30.91304347826087, "75.0": 31.238095238095237, "95.0": 31.59259259259259, "99.0": 31.59259259259259 } } }
- Matrix