elasticsearch之聚合函數


elasticsearch之聚合函數

前言

聚合函數大家都不陌生,elasticsearch中也沒玩出新花樣,所以,這一章相對簡單,只需要記得:

  • avg
  • max
  • min
  • sum

以及各自的用法即可。先來看求平均。

准備數據

PUT zhifou/doc/1
{
  "name":"顧老二",
  "age":30,
  "from": "gu",
  "desc": "皮膚黑、武器長、性格直",
  "tags": ["黑", "長", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"膚白貌美,嬌憨可愛",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龍套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp,沒怎么看,不知道怎么形容",
  "tags":["造數據", "真","難"]
}


PUT zhifou/doc/4
{
  "name":"石頭",
  "age":29,
  "from":"gu",
  "desc":"粗中有細,狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"廣雲台",
  "desc":"仿佛兮若輕雲之蔽月,飄飄兮若流風之回雪,mmp,最后竟然沒有嫁給顧老二!",
  "tags":["閉月","羞花"]
}

avg

現在的需求是查詢fromgu的人的平均年齡。

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

上例中,首先匹配查詢fromgu的數據。在此基礎上做查詢平均值的操作,這里就用到了聚合函數,其語法被封裝在aggs中,而my_avg則是為查詢結果起個別名,封裝了計算出的平均值。那么,要以什么屬性作為條件呢?是age年齡,查年齡的什么呢?是avg,查平均年齡。

返回結果如下:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石頭",
          "age" : 29
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顧老二",
          "age" : 30
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龍套偏房",
          "age" : 22
        }
      }
    ]
  },
  "aggregations" : {
    "my_avg" : {
      "value" : 27.0
    }
  }
}

上例中,在查詢結果的最后是平均值信息,可以看到是27歲。

雖然我們已經使用_source對字段做了過濾,但是還不夠。我不想看都有哪些數據,只想看平均值怎么辦?別忘了size!

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "size": 0, 
  "_source": ["name", "age"]
}

上例中,只需要在原來的查詢基礎上,增加一個size就可以了,輸出幾條結果,我們寫上0,就是輸出0條查詢結果。

查詢結果如下:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_avg" : {
      "value" : 27.0
    }
  }
}

查詢結果中,我們看hits下的total值是3,說明有三條符合結果的數據。最后面返回平均值是27。

max

那怎么查最大值呢?

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  },
  "size": 0
}

上例中,只需要在查詢條件中將avg替換成max即可。

返回結果如下:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_max" : {
      "value" : 30.0
    }
  }
}

在返回的結果中,可以看到年齡最大的是30歲。

min

那怎么查最小值呢?

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  },
  "size": 0
}

最小值則用min表示。

返回結果如下:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_min" : {
      "value" : 22.0
    }
  }
}

返回結果中,年齡最小的是22歲。

sum

那么,要是想知道它們的年齡總和是多少怎么辦呢?

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  },
  "size": 0
}

上例中,求和用sum表示。

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_sum" : {
      "value" : 81.0
    }
  }
}

從返回的結果可以發現,年齡總和是81歲。

分組查詢

現在我想要查詢所有人的年齡段,並且按照15~20,20~25,25~30分組,並且算出每組的平均年齡。

分析需求,首先我們應該先把分組做出來。

GET zhifou/doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}

上例中,在aggs的自定義別名age_group中,使用range來做分組,field是以age為分組,分組使用ranges來做,fromto是范圍,我們根據需求做出三組。

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_group" : {
      "buckets" : [
        {
          "key" : "15.0-20.0",
          "from" : 15.0,
          "to" : 20.0,
          "doc_count" : 1
        },
        {
          "key" : "20.0-25.0",
          "from" : 20.0,
          "to" : 25.0,
          "doc_count" : 1
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 2
        }
      ]
    }
  }
}

返回的結果中可以看到,已經拿到了三個分組。doc_count為該組內有幾條數據,此次共分為三組,查詢出4條內容。還有一條數據的age屬性值是30,不在分組的范圍內!

那么接下來,我們就要對每個小組內的數據做平均年齡處理。

GET zhifou/doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "my_avg": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

上例中,在分組下面,我們使用aggsage做平均數處理,這樣就可以了。

{
 "took" : 1,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "skipped" : 0,
   "failed" : 0
 },
 "hits" : {
   "total" : 5,
   "max_score" : 0.0,
   "hits" : [ ]
 },
 "aggregations" : {
   "age_group" : {
     "buckets" : [
       {
         "key" : "15.0-20.0",
         "from" : 15.0,
         "to" : 20.0,
         "doc_count" : 1,
         "my_avg" : {
           "value" : 18.0
         }
       },
       {
         "key" : "20.0-25.0",
         "from" : 20.0,
         "to" : 25.0,
         "doc_count" : 1,
         "my_avg" : {
           "value" : 22.0
         }
       },
       {
         "key" : "25.0-30.0",
         "from" : 25.0,
         "to" : 30.0,
         "doc_count" : 2,
         "my_avg" : {
           "value" : 27.0
         }
       }
     ]
   }
 }
}

在結果中,我們可以清晰的看到每組的平均年齡(my_avgvalue中)。

注意:聚合函數的使用,一定是先查出結果,然后對結果使用聚合函數做處理

小結:

  • avg:求平均
  • max:最大值
  • min:最小值
  • sum:求和

歡迎斧正,that's all

 
 
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM