elasticsearch之聚合函數
前言
聚合函數大家都不陌生,elasticsearch中也沒玩出新花樣,所以,這一章相對簡單,只需要記得:
- avg
- max
- min
- sum
以及各自的用法即可。先來看求平均。
准備數據
PUT zhifou/doc/1
{
"name":"顧老二",
"age":30,
"from": "gu",
"desc": "皮膚黑、武器長、性格直",
"tags": ["黑", "長", "直"]
}
PUT zhifou/doc/2
{
"name":"大娘子",
"age":18,
"from":"sheng",
"desc":"膚白貌美,嬌憨可愛",
"tags":["白", "富","美"]
}
PUT zhifou/doc/3
{
"name":"龍套偏房",
"age":22,
"from":"gu",
"desc":"mmp,沒怎么看,不知道怎么形容",
"tags":["造數據", "真","難"]
}
PUT zhifou/doc/4
{
"name":"石頭",
"age":29,
"from":"gu",
"desc":"粗中有細,狐假虎威",
"tags":["粗", "大","猛"]
}
PUT zhifou/doc/5
{
"name":"魏行首",
"age":25,
"from":"廣雲台",
"desc":"仿佛兮若輕雲之蔽月,飄飄兮若流風之回雪,mmp,最后竟然沒有嫁給顧老二!",
"tags":["閉月","羞花"]
}
avg
現在的需求是查詢from
是gu
的人的平均年齡。
GET zhifou/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"_source": ["name", "age"]
}
上例中,首先匹配查詢from
是gu
的數據。在此基礎上做查詢平均值的操作,這里就用到了聚合函數,其語法被封裝在aggs
中,而my_avg
則是為查詢結果起個別名,封裝了計算出的平均值。那么,要以什么屬性作為條件呢?是age
年齡,查年齡的什么呢?是avg
,查平均年齡。
返回結果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {
"name" : "石頭",
"age" : 29
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"name" : "顧老二",
"age" : 30
}
},
{
"_index" : "zhifou",
"_type" : "doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"name" : "龍套偏房",
"age" : 22
}
}
]
},
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
}
上例中,在查詢結果的最后是平均值信息,可以看到是27歲。
雖然我們已經使用_source
對字段做了過濾,但是還不夠。我不想看都有哪些數據,只想看平均值怎么辦?別忘了size
!
GET zhifou/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
},
"size": 0,
"_source": ["name", "age"]
}
上例中,只需要在原來的查詢基礎上,增加一個size
就可以了,輸出幾條結果,我們寫上0,就是輸出0條查詢結果。
查詢結果如下:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_avg" : {
"value" : 27.0
}
}
}
查詢結果中,我們看hits
下的total
值是3,說明有三條符合結果的數據。最后面返回平均值是27。
max
那怎么查最大值呢?
GET zhifou/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_max": {
"max": {
"field": "age"
}
}
},
"size": 0
}
上例中,只需要在查詢條件中將avg
替換成max
即可。
返回結果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_max" : {
"value" : 30.0
}
}
}
在返回的結果中,可以看到年齡最大的是30歲。
min
那怎么查最小值呢?
GET zhifou/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_min": {
"min": {
"field": "age"
}
}
},
"size": 0
}
最小值則用min
表示。
返回結果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_min" : {
"value" : 22.0
}
}
}
返回結果中,年齡最小的是22歲。
sum
那么,要是想知道它們的年齡總和是多少怎么辦呢?
GET zhifou/doc/_search
{
"query": {
"match": {
"from": "gu"
}
},
"aggs": {
"my_sum": {
"sum": {
"field": "age"
}
}
},
"size": 0
}
上例中,求和用sum
表示。
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_sum" : {
"value" : 81.0
}
}
}
從返回的結果可以發現,年齡總和是81歲。
分組查詢
現在我想要查詢所有人的年齡段,並且按照15~20,20~25,25~30
分組,並且算出每組的平均年齡。
分析需求,首先我們應該先把分組做出來。
GET zhifou/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
}
}
}
}
上例中,在aggs
的自定義別名age_group
中,使用range
來做分組,field
是以age
為分組,分組使用ranges
來做,from
和to
是范圍,我們根據需求做出三組。
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"age_group" : {
"buckets" : [
{
"key" : "15.0-20.0",
"from" : 15.0,
"to" : 20.0,
"doc_count" : 1
},
{
"key" : "20.0-25.0",
"from" : 20.0,
"to" : 25.0,
"doc_count" : 1
},
{
"key" : "25.0-30.0",
"from" : 25.0,
"to" : 30.0,
"doc_count" : 2
}
]
}
}
}
返回的結果中可以看到,已經拿到了三個分組。doc_count
為該組內有幾條數據,此次共分為三組,查詢出4條內容。還有一條數據的age
屬性值是30
,不在分組的范圍內!
那么接下來,我們就要對每個小組內的數據做平均年齡處理。
GET zhifou/doc/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"from": 15,
"to": 20
},
{
"from": 20,
"to": 25
},
{
"from": 25,
"to": 30
}
]
},
"aggs": {
"my_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
上例中,在分組下面,我們使用aggs
對age
做平均數處理,這樣就可以了。
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"age_group" : {
"buckets" : [
{
"key" : "15.0-20.0",
"from" : 15.0,
"to" : 20.0,
"doc_count" : 1,
"my_avg" : {
"value" : 18.0
}
},
{
"key" : "20.0-25.0",
"from" : 20.0,
"to" : 25.0,
"doc_count" : 1,
"my_avg" : {
"value" : 22.0
}
},
{
"key" : "25.0-30.0",
"from" : 25.0,
"to" : 30.0,
"doc_count" : 2,
"my_avg" : {
"value" : 27.0
}
}
]
}
}
}
在結果中,我們可以清晰的看到每組的平均年齡(my_avg
的value
中)。
注意:聚合函數的使用,一定是先查出結果,然后對結果使用聚合函數做處理
小結:
- avg:求平均
- max:最大值
- min:最小值
- sum:求和
歡迎斧正,that's all