Elasticsearch數據的聚合查詢

本文轉載自查看原文 2021-11-07 09:18 139 Elasticsearch

聚合框架有助於根據搜索查詢提供聚合數據。聚合查詢是數據庫中重要的功能特性，ES作為搜索引擎兼數據庫，同樣提供了強大的聚合分析能力。它基於查詢條件來對數據進行分桶、計算的方法。有點類似於 SQL 中的 group by 再加一些函數方法的操作。聚合可以嵌套，由此可以組成復雜的操作（Bucketing聚合可以包含sub-aggregation）。

聚合計算的值可以取字段的值，也可是腳本計算的結果。查詢請求體中以aggregations節點的語法定義：

"aggregations" : {                        //也可簡寫為 aggs
    "<aggregation_name>" : {      //聚合的名字
        "<aggregation_type>" : {     //聚合的類型
            <aggregation_body>      //聚合體：對哪些字段進行聚合
        }
        [,"meta" : {  [<meta_data_body>] } ]?                 //元
        [,"aggregations" : { [<sub_aggregation>]+ } ]?   //在聚合里面在定義子聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*                      //聚合的名字
}

1、數據准備

(1) 創建員工索引employee

PUT employee
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      },
      "job": {
        "type": "keyword"
      },
      "age": {
        "type": "integer"
      },
      "gender": {
        "type": "keyword"
      }
    }
  },
  "settings":{
        "index":{
            "number_of_shards":3, #分片數量
            "number_of_replicas":2  #副本數量
        }
    }
}

(2) 插入數據

POST employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"}

{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}

{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}

{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}

{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}

{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}

{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}

{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}

{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}

{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}

{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}

{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}

{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}

{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}

{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}

{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
#這里有換行符

數據說明：插入的數據為員工信息，name是員工的姓名，job是員工的工種，age為員工的年齡，sal為員工的薪水，gender為員工的性別。

指標聚合

指標聚合，它是對文檔進行一些權值計算（比如求所有文檔某個字段求最大、最小、和、平均值），輸出結果往往是文檔的權值，相當於為文檔添加了一些統計信息。

它基於特定字段（field）或腳本值（generated using scripts），計算聚合中文檔的數值權值。數值權值聚合（注意分類只針對數值權值聚合，非數值的無此分類）輸出單個權值的，也叫 single-value numeric metrics，其它生成多個權值（比如：stats）的被叫做 multi-value numeric metrics。

max min sum avg

Max Aggregation，求最大值。基於文檔的某個值（可以是特定的數值型字段，也可以通過腳本計算而來），計算該值在聚合文檔中的均值。

Min Aggregation，求最小值。同上

Sum Aggregation，求和。同上

Avg Aggregation，求平均數。同上

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "max_sal": {
            "max": { "field": "sal"}
        }
    }
}

返回結果
{
    "took": 40,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "max_sal": {
            "value": 23000
        }
    }
}

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "min_sal": {
            "min": { "field": "sal"}
        }
    }
}

返回結果
{
    "took": 40,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "min_sal": {
            "value": 2000
        }
    }
}

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "sum_sal": {
            "sum": { "field": "sal"}
        }
    }
}

返回結果
{
    "took": 17,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sum_sal": {
            "value": 212500
        }
    }
}

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "avg_sal": {
            "avg": { "field": "sal"}
        }
    }
}

返回結果
{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "avg_sal": {
            "value": 13281.25
        }
    }
}

值統計

值計數聚合。計算聚合文檔中某個值（可以是特定的數值型字段，也可以通過腳本計算而來）的個數。該聚合一般與其它 single-value 聚合聯合使用，比如在計算一個字段的平均值的時候，可能還會關注這個平均值是由多少個值計算而來。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_count": {
            "value_count": { "field": "age"}
        }
    }
}
返回結果
{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "value": 16
        }
    }
}

distinct 聚合

基數聚合。它屬於multi-value，基於文檔的某個值（可以是特定的字段，也可以通過腳本計算而來），計算文檔非重復的個數（去重計數），相當於sql中的distinct。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_count": {
            "cardinality": {
                "field": "age"
            }
        },
        "job_count": {
            "cardinality": {
                "field": "job"
            }
        }
    }
}
返回結果
{
    "took": 32,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "job_count": {
            "value": 3
        },
        "age_count": {
            "value": 14
        }
    }
}

統計聚合

統計聚合。它屬於multi-value，基於文檔的某個值（可以是特定的數值型字段，也可以通過腳本計算而來），計算出一些統計信息（min、max、sum、count、avg5個值）。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_stats": {
            "stats": {
                "field": "age"
            }
        }
    }
}
返回結果
{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_stats": {
            "count": 16,
            "min": 18,
            "max": 41,
            "avg": 27.75,
            "sum": 444
        }
    }
}

拓展的統計聚合

擴展統計聚合。它屬於multi-value，比stats多4個統計結果：平方和、方差、標准差、平均值加/減兩個標准差的區間。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_stats": {
            "extended_stats": {
                "field": "age"
            }
        }
    }
}
返回結果
{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_stats": {
            "count": 16,
            "min": 18,
            "max": 41,
            "avg": 27.75,
            "sum": 444,
            "sum_of_squares": 13006,
            "variance": 42.8125,
            "variance_population": 42.8125,
            "variance_sampling": 45.666666666666664,
            "std_deviation": 6.5431261641512,
            "std_deviation_population": 6.5431261641512,
            "std_deviation_sampling": 6.757711644237764,
            "std_deviation_bounds": {
                "upper": 40.8362523283024,
                "lower": 14.6637476716976,
                "upper_population": 40.8362523283024,
                "lower_population": 14.6637476716976,
                "upper_sampling": 41.26542328847553,
                "lower_sampling": 14.234576711524472
            }
        }
    }
}

百分比統計

百分比聚合。它屬於multi-value，對指定字段（腳本）的值按從小到大累計每個值對應的文檔數的占比（占所有命中文檔數的百分比），返回指定占比比例對應的值。默認返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_percents": {
            "percentiles": {
                "field": "age"
            }
        }
    }
}
返回結果
{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_percents": {
            "values": {
                "1.0": 18,
                "5.0": 18.6,
                "25.0": 22.5,
                "50.0": 26.5, //占比為50%的文檔的age值 <= 26.5，或反過來：age<=26.5的文檔數占總命中文檔數的50%
                "75.0": 32,
                "95.0": 40.099999999999994,
                "99.0": 41
            }
        }
    }
}

指定分位值

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_percents": {
            "percentiles": {
                "field": "age",
                "percents": [95,99,99.9]
            }
        }
    }
}
返回結果
{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_percents": {
            "values": {
                "95.0": 40.099999999999994,
                "99.0": 41,
                "99.9": 41
            }
        }
    }
}

百分比排名聚合

統計年齡小於25和年齡小於30的文檔的占比，這里需求可以使用。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "gge_perc_rank": {
            "percentile_ranks": {
                "field": "age",
                "values": [25,30]
            }
        }
    }
}
返回結果
{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "gge_perc_rank": {
            "values": {  //年齡小於25的文檔占比為43.75%，年齡小於30的文檔占比為62.5%
                "25.0": 43.75,
                "30.0": 62.5
            }
        }
    }
}

Top Hits

最高匹配權值聚合。獲取到每組前n條數據，相當於sql 中Top（group by 后取出前n條）。它跟蹤聚合中相關性最高的文檔，該聚合一般用做 sub-aggregation，以此來聚合每個桶中的最高匹配的文檔，較為常用的統計。

POST employee/_doc/_search
{
    "size":0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "group_by_job": {
            "terms": {
                "field": "job",
				"size": 2 //返回的buckets數組長度
            },
            "aggs": {
                "top_tag_hits": {
                    "top_hits": {
                        "size": 5   //返回的最大文檔個數
                    }
                }
            }
        }
    }
}
返回結果
{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "group_by_job": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 9,
            "buckets": [
                {
                    "key": "java",
                    "doc_count": 7,
                    "top_tag_hits": {
                        "hits": {
                            "total": {
                                "value": 7,
                                "relation": "eq"
                            },
                            "max_score": 1,
                            "hits": [
                                {
                                    "_index": "employee",
                                    "_type": "_doc",
                                    "_id": "3",
                                    "_score": 1,
                                    "_source": {
                                        "id": 3,
                                        "name": "Gaving",
                                        "job": "java",
                                        "age": 24,
                                        "sal": 12000,
                                        "gender": "male"
                                    }
                                }
                            ]
                        }
                    }
                }
            ]
        }
    }
}

Geo Bounds Aggregation

地理邊界聚合。基於文檔的某個字段（geo-point類型字段），計算出該字段所有地理坐標點的邊界（左上角/右下角坐標點）。

POST region/_doc/_search
{
  "size": 0
  "query": {
    "match_all": {}
  },
  "aggs": {
    "viewport": {
      "geo_bounds": {
        "field": "location",
        "wrap_longitude": true //是否允許地理邊界與國際日界線存在重疊
      }
    }
  }
}

Geo Centroid Aggregation

地理重心聚合。基於文檔的某個字段（geo-point類型字段），計算所有坐標的加權重心。

POST region/_doc/_search
{
    "query" : {
        "match" : { "crime" : "burglary" }
    },
    "aggs" : {
        "centroid" : {
            "geo_centroid" : {
                "field" : "location" 
            }
        }
    }
}

桶聚合

它執行的是對文檔分組的操作（與sql中的group by類似），把滿足相關特性的文檔分到一個桶里，即桶分，輸出結果往往是一個個包含多個文檔的桶（一個桶就是一個group）。

它有一個關鍵字（field、script），以及一些桶分（分組）的判斷條件。執行聚合操作時候，文檔會判斷每一個分組條件，如果滿足某個，該文檔就會被分為該組（fall in）。

它不進行權值的計算，他們對文檔根據聚合請求中提供的判斷條件（比如：{"from":0, "to":100}）來進行分組（桶分）。桶聚合還會額外返回每一個桶內文檔的個數。

它可以包含子聚合——sub-aggregations（權值聚合不能包含子聚合，可以作為子聚合），子聚合操作將會應用到由父聚合產生的每一個桶上。

它根據聚合條件，可以只定義輸出一個桶；也可以輸出多個（multi-bucket）；還可以在根據聚合條件動態確定桶個數（比如：terms aggregation）

Terms Aggregation

詞聚合。基於某個field，該 field 內的每一個【唯一詞元】為一個桶，並計算每個桶內文檔個數。默認返回順序是按照文檔個數多少排序。它屬於multi-bucket。當不返回所有 buckets 的情況（它size控制），文檔個數可能不准確。

POST employee/_doc/_search

{
    "size": 0, //表示返回的數據為0，一般用於統計、聚合，不需要返回實際的列表
    "aggs": {
        "age_terms": {
            "terms": {
                "field": "job", //字段
                "size": 10, //size用來定義需要返回多個 buckets（防止太多），默認會全部返回。
                "order": {"_count": "asc"}, //根據文檔計數排序，根據分組值排序（{ "_key" : "asc" }）
                "min_doc_count": 1,  //只返回文檔個數不小於該值的 buckets
                "include": ".*dba.*",  //包含過濾,根據字段關鍵字過濾
                "exclude": "html.*",  //排除過濾,根據字段關鍵字過濾
                "missing": "N/A" 
            }
        }
    }
}
返回結果
{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "dba",
                    "doc_count": 5
                }
            ]
        }
    }
}

指定每個分片返回多少個分組

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_terms": {
            "terms": {
                "field": "job",
                "size": 10,
                "shard_size": 20,//指定每個分片返回多少個分組，默認值（索引只有一個分片：= size，多分片：= size * 1.5 + 10）
                "show_term_doc_count_error": true  //每個分組上顯示偏差值
            }
        }
    }
}

返回結果
{
    "took": 15,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,//文檔計數的最大偏差值
            "sum_other_doc_count": 0,//未返回的其他項的文檔數
            "buckets": [  //默認情況下返回按文檔計數從高到低的前10個分組
                {
                    "key": "java",  //job為java的文檔有7個
                    "doc_count": 7,
                    "doc_count_error_upper_bound": 0
                },
                {
                    "key": "dba", //job為dba的文檔有5個
                    "doc_count": 5,
                    "doc_count_error_upper_bound": 0
                },
                {
                    "key": "html",
                    "doc_count": 4,
                    "doc_count_error_upper_bound": 0
                }
            ]
        }
    }
}

Filter Aggregation

過濾聚合。基於一個條件，來對當前的文檔進行過濾的聚合。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "args_term": {
            "filter": {
                "match": {
                    "job": "java"
                }
            },
            "aggs": {
                "avg_age": {
                    "avg": {
                        "field": "age"
                    }
                }
            }
        }
    }
}

返回結果
{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "args_term": {
            "doc_count": 7,
            "avg_age": {
                "value": 30
            }
        }
    }
}

Filters Aggregation

多過濾聚合。基於多個過濾條件，來對當前文檔進行【過濾】的聚合，每個過濾都包含所有滿足它的文檔（多個bucket中可能重復），先過濾再聚合。它屬於multi-bucket。

范圍聚合

范圍分組聚合。基於某個值（可以是 field 或 script），以【字段范圍】來桶分聚合。范圍聚合包括 from 值，不包括 to 值（區間前閉后開）。它屬於multi-bucket。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "age_range": {
            "range": {
                "field": "age",
                "ranges": [
                    {
                        "to": 25
                    },
                    {
                        "from": 25,
                        "to": 35
                    },
                    {
                        "from": 35
                    }
                ]
            },
            "aggs": {
                "bmax": {
                    "max": {
                        "field": "sal"
                    }
                }
            }
        }
    }
}

返回結果
{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_range": {
            "buckets": [
                {
                    "key": "*-25.0",
                    "to": 25,
                    "doc_count": 7,
                    "bmax": {
                        "value": 22000
                    }
                },
                {
                    "key": "25.0-35.0",
                    "from": 25,
                    "to": 35,
                    "doc_count": 6,
                    "bmax": {
                        "value": 23000
                    }
                },
                {
                    "key": "35.0-*",
                    "from": 35,
                    "doc_count": 3,
                    "bmax": {
                        "value": 20000
                    }
                }
            ]
        }
    }
}

時間范圍聚合

日期范圍聚合。基於日期類型的值，以【日期范圍】來桶分聚合。日期范圍可以用各種 Date Math 表達式。同樣的，包括 from 的值，不包括 to 的值。它屬於multi-bucket。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "range": {
            "date_range": {
                "field": "date",
                "format": "MM-yyy",
                "ranges": [
                    {
                        "to": "now-10M/M"
                    },
                    {
                        "from": "now-10M/M"
                    }
                ]
            }
        }
    }
}
返回結果
{
    "took": 19,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "range": {
            "buckets": [
                {
                    "key": "*-01-2021",
                    "to": 1609459200000,
                    "to_as_string": "01-2021",
                    "doc_count": 0
                },
                {
                    "key": "01-2021-*",
                    "from": 1609459200000,
                    "from_as_string": "01-2021",
                    "doc_count": 0
                }
            ]
        }
    }
}

時間柱狀聚合

1、直方圖聚合。基於文檔中的某個【數值類型】字段，通過計算來動態的分桶。它屬於multi-bucket。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "prices": {
            "histogram": {
                "field": "sal",  //字段，必須為數值類型
                "interval": 50,  //分桶間距
                "min_doc_count": 1,  //最少文檔數桶過濾，只有不少於這么多文檔的桶才會返回
                "extended_bounds": { //范圍擴展
                    "min": 0,
                    "max": 500
                },
                "order": {
                    "_count": "desc" //對桶排序，如果 histogram 聚合有一個權值聚合類型的"直接"子聚合，那么排序可以使用子聚合中的結果
                },
                "keyed": true, //hash結構返回，默認以數組形式返回每一個桶
                "missing": 0 //配置缺省默認值
            }
        }
    }
}
返回結果
{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 16,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "prices": {
            "buckets": {
                "23000.0": {
                    "key": 23000,
                    "doc_count": 2
                },
                "2000.0": {
                    "key": 2000,
                    "doc_count": 1
                },
                "3000.0": {
                    "key": 3000,
                    "doc_count": 1
                },
                "4500.0": {
                    "key": 4500,
                    "doc_count": 1
                },
                "6000.0": {
                    "key": 6000,
                    "doc_count": 1
                },
                "7000.0": {
                    "key": 7000,
                    "doc_count": 1
                },
                "8000.0": {
                    "key": 8000,
                    "doc_count": 1
                },
                "12000.0": {
                    "key": 12000,
                    "doc_count": 1
                },
                "14000.0": {
                    "key": 14000,
                    "doc_count": 1
                },
                "15000.0": {
                    "key": 15000,
                    "doc_count": 1
                },
                "16000.0": {
                    "key": 16000,
                    "doc_count": 1
                },
                "18000.0": {
                    "key": 18000,
                    "doc_count": 1
                },
                "19000.0": {
                    "key": 19000,
                    "doc_count": 1
                },
                "20000.0": {
                    "key": 20000,
                    "doc_count": 1
                },
                "22000.0": {
                    "key": 22000,
                    "doc_count": 1
                }
            }
        }
    }
}

2、日期直方圖聚。

基於日期類型，以【日期間隔】來桶分聚合。可用的時間間隔類型為：year、quarter、month、week、day、hour、minute、second，其中，除了year、quarter 和 month，其余可用小數形式。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "articles_over_time": {
            "date_histogram": {
                "field": "date",
                "interval": "month",
                "format": "yyyy-MM-dd",
                "time_zone": "+08:00"
            }
        }
    }
}

Missing Aggregation

缺失值的桶聚合

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "account_without_a_age": {
            "missing": {
                "field": "age"
            }
        }
    }
}

IP范圍聚合

基於一個 IPv4 字段，對文檔進行【IPv4范圍】的桶分聚合。和 Range Aggregation 類似，只是應用字段必須是 IPv4 數據類型。它屬於multi-bucket。

POST employee/_doc/_search
{
    "size": 0,
    "aggs": {
        "ip_ranges": {
            "ip_range": {
                "field": "ip",
                "ranges": [
                    {
                        "to": "10.0.0.5"
                    },
                    {
                        "from": "10.0.0.5"
                    }
                ]
            }
        }
    }
}

Nested Aggregation

嵌套類型聚合。基於嵌套（nested）數據類型，把該【嵌套類型的信息】聚合到單個桶里，然后就可以對嵌套類型做進一步的聚合操作。

矩陣聚合

矩陣聚合。此功能是實驗性的，在將來的版本中可能會完全更改或刪除。

它對多個字段進行操作並根據從請求的文檔字段中提取的值生成矩陣結果的聚合系列。與度量聚合和桶聚合不同，此聚合系列尚不支持腳本編寫。

管道聚合

Pipeline，管道聚合。它對其它聚合操作的輸出（桶或者桶的某些權值）及其關聯指標進行聚合，而不是文檔，是一種后期對每個分桶的一些計算操作。管道聚合的作用是為輸出增加一些有用信息。

管道聚合不能包含子聚合，但是某些類型的管道聚合可以鏈式使用（比如計算導數的導數）。

管道聚合大致分為兩類：

parent，它輸入是其【父聚合】的輸出，並對其進行進一步處理。一般不生成新的桶，而是對父聚合桶信息的增強。
sibling，它輸入是其【兄弟聚合】的輸出。並能在同級上計算新的聚合。

管道聚合通過 buckets_path 參數指定他們要進行聚合計算的權值對象，bucket_path語法：

聚合分隔符 = ">"，指定父子聚合關系，如："my_bucket>my_stats.avg"
權值分隔符= "."，指定聚合的特定權值
聚合名稱 = <name of the aggregation> ，直接指定聚合的名稱
權值 = <name of the metric> ，直接指定權值
完整路徑 = agg_name[> agg_name]*[. metrics] ，綜合利用上面的方式指定完整路徑
特殊值 = "_count"，輸入的文檔個數

特殊情況：

要進行 pipeline aggregation 聚合的對象名稱或權值名稱包含小數點，"buckets_path": "my_percentile[99.9]"
處理對象中包含空桶（無文檔的桶分），參數 gap_policy，可選值有 skip、insert_zeros

參考鏈接

https://blog.csdn.net/alex_xfboy/article/details/86100037

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch(9) --- 聚合查詢(Bucket聚合) Elasticsearch(8) --- 聚合查詢(Metric聚合) elasticsearch筆記(8)聚合查詢 elasticsearch聚合查詢 elasticsearch 聚合查詢 elasticsearch簡單查詢和聚合查詢關於elasticsearch聚合查詢只有10條數據(java) Elasticsearch-數據聚合 elasticsearch 進行聚合+去重查詢 elasticsearch 簡單聚合查詢示例