ES的常用查詢與聚合

本文轉載自查看原文 2018-11-29 00:02 15214 es

原文地址：http://blog.51cto.com/xpleaf/2307572

說明

基於es 5.4和es 5.6，列舉的是個人工作中經常用到的查詢（只是工作中使用的是Java API），如果需要看完整的，可以參考官方相關文檔
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search.html。

1、查詢

先使用一個快速入門來引入，然后后面列出的各種查詢都是用得比較多的（在我的工作環境是這樣），其它沒怎么用的這里就不列出了。

1.1 快速入門

1.1.1 查詢全部

GET index/type/_search
{
    "query":{ "match_all":{} } }

或

GET index/type/_search

1.1.2 分頁（以term為例）

GET index/type/_search
{
    "from":0, "size":100, "query":{ "term":{ "area":"GuangZhou" } } }

1.1.3 包含指定字段（以term為例）

GET index/type/_search
{
    "_source":["hobby", "name"], "query":{ "term":{ "area":"GuangZhou" } } }

1.1.4 排序（以term為例）

單個字段排序：

GET index/type/_search
{
    "query":{ "term":{ "area":"GuangZhou" } }, "sort":[ {"user_id":{"order":"asc"}}, {"salary":{"order":"desc"}} ] }

1.2 全文查詢

查詢字段會被索引和分析，在執行之前將每個字段的分詞器（或搜索分詞器）應用於查詢字符串。

1.2.1 match query

{
  "query": { "match": { "content": { "query": "里皮恆大", "operator": "and" } } } }

operator默認是or，也就是說，“里皮恆大”被分詞為“里皮”和“恆大”，只要content中出現兩個之一，都會搜索到；設置為and之后，只有同時出現都會被搜索到。

1.2.2 match_phrase query

文檔同時滿足下面兩個條件才會被搜索到：

（1）分詞后所有詞項都要出現在該字段中
（2）字段中的詞項順序要一致

{
  "query": { "match_phrase": { "content": "里皮恆大" } } }

1.3 詞項查詢

詞項搜索時對倒排索引中存儲的詞項進行精確匹配，詞項級別的查詢通過用於結構化數據，如數字、日期和枚舉類型。

1.3.1 term query

{
  "query": { "term": { "postdate": "2015-12-10 00:41:00" } } }

1.3.2 terms query

term的升級版，如上面查詢的postdate字段，可以設置多個。

{
  "query": { "terms": { "postdate": [ "2015-12-10 00:41:00", "2016-02-01 01:39:00" ] } } }

因為term是精確匹配，所以不要問，[]中的關系怎么設置and？這怎么可能，既然是精確匹配，一個字段也不可能有兩個不同的值。

1.3.3 range query

匹配某一范圍內的數據型、日期類型或者字符串型字段的文檔，注意只能查詢一個字段，不能作用在多個字段上。

數值：

{
  "query": { "range": { "reply": { "gte": 245, "lte": 250 } } } }

支持的操作符如下：

gt：大於，gte：大於等於，lt：小於，lte：小於等於

日期：

{
  "query": { "range": { "postdate": { "gte": "2016-09-01 00:00:00", "lte": "2016-09-30 23:59:59", "format": "yyyy-MM-dd HH:mm:ss" } } } }

format不加也行，如果寫的時間格式正確。

1.3.4 exists query

返回對應字段中至少有一個非空值的文檔，也就是說，該字段有值（待會會說明這個概念）。

{
  "query": { "exists": { "field": "user" } } }

參考《從Lucene到Elasticsearch：全文檢索實戰》中的說明。

以下文檔會匹配上面的查詢：

文檔	說明
{"user":"jane"}	有user字段，且不為空
{"user":""}	有user字段，值為空字符串
{"user":"-"}	有user字段，值不為空
{"user":["jane"]}	有user字段，值不為空
{"user":["jane",null]}	有user字段，至少一個值不為空即可

下面的文檔不會被匹配：

文檔	說明
{"user":null}	雖然有user字段，但是值為空
{"user":[]}	雖然有user字段，但是值為空
{"user":[null]}	雖然有user字段，但是值為空
{"foo":"bar"}	沒有user字段

1.3.5 ids query

查詢具有指定id的文檔。

{
  "query": { "ids": { "type": "news", "values": "2101" } } }

類型是可選的，也可以以數據的方式指定多個id。

{
  "query": { "ids": { "values": [ "2101", "2301" ] } } }

1.4 復合查詢

1.4.1 bool query

因為工作中接觸到關於es是做聚合、統計、分類的項目，經常要做各種復雜的多條件查詢，所以實際上，bool query用得非常多，因為查詢條件個數不定，所以處理的邏輯思路時，外層用一個大的bool query來進行承載。（當然，項目中是使用其Java API）

bool query可以組合任意多個簡單查詢，各個簡單查詢之間的邏輯表示如下：

屬性	說明
must	文檔必須匹配must選項下的查詢條件，相當於邏輯運算的AND
should	文檔可以匹配should選項下的查詢條件，也可以不匹配，相當於邏輯運算的OR
must_not	與must相反，匹配該選項下的查詢條件的文檔不會被返回
filter	和must一樣，匹配filter選項下的查詢條件的文檔才會被返回，但是filter不評分，只起到過濾功能

一個例子如下：

{
  "query": { "bool": { "must": { "match": { "content": "里皮" } }, "must_not": { "match": { "content": "中超" } } } } }

需要注意的是，同一個bool下，只能有一個must、must_not、should和filter。

如果希望有多個must時，比如希望同時匹配"里皮"和"中超"，但是又故意分開這兩個關鍵詞（因為事實上，一個must，然后使用match，並且operator為and就可以達到目的），怎么操作？注意must下使用數組，然后里面多個match對象就可以了：

{
  "size": 1, "query": { "bool": { "must": [ { "match": { "content": "里皮" } }, { "match": { "content": "恆大" } } ] } }, "sort": [ { "id": { "order": "desc" } } ] }

當然must下的數組也可以是多個bool查詢條件，以進行更加復雜的查詢。

上面的查詢等價於：

{
  "query": { "bool": { "must": { "match": { "content": { "query": "里皮恆大", "operator": "and" } } } } }, "sort": [ { "id": { "order": "desc" } } ] }

1.5 嵌套查詢

先添加下面一個索引：

PUT /my_index
{
  "mappings": { "my_type": { "properties": { "user":{ "type": "nested", "properties": { "first":{"type":"keyword"}, "last":{"type":"keyword"} } }, "group":{ "type": "keyword" } } } } }

添加數據：

PUT my_index/my_type/1 { "group":"GuangZhou", "user":[ { "first":"John", "last":"Smith" }, { "first":"Alice", "last":"White" } ] } PUT my_index/my_type/2 { "group":"QingYuan", "user":[ { "first":"Li", "last":"Wang" }, { "first":"Yonghao", "last":"Ye" } ] }

查詢：

較簡單的查詢：

{
  "query": { "nested": { "path": "user", "query": { "term": { "user.first": "John" } } } } }

較復雜的查詢：

{
  "query": { "bool": { "must": [ {"nested": { "path": "user", "query": { "term": { "user.first": { "value": "Li" } } } }}, { "nested": { "path": "user", "query": { "term": { "user.last": { "value": "Wang" } } } } } ] } } }

1.6 補充：數組查詢與測試

添加一個索引：

PUT my_index2 { "mappings": { "my_type2":{ "properties": { "message":{ "type": "text" }, "keywords":{ "type": "keyword" } } } } }

添加數據：

PUT /my_index2/my_type/1 { "message":"keywords test1", "keywords":["美女","動漫","電影"] } PUT /my_index2/my_type/2 { "message":"keywords test2", "keywords":["電影","美妝","廣告"] }

搜索：

{
  "query": { "term": { "keywords": "廣告" } } }

Note1：注意設置字段類型時，keywords設置為keyword，所以使用term查詢可以精確匹配，但設置為text，則不一定——如果有添加分詞器，則可以搜索到；如果沒有，而是使用默認的分詞器，只是將其分為一個一個的字，就不會被搜索到。這點尤其需要注意到。

Note2：對於數組字段，也是可以做桶聚合的，做桶聚合的時候，其每一個值都會作為一個值去進行分組，而不是整個數組進行分組，可以使用上面的進行測試，不過需要注意的是，其字段類型不能為text，否則聚合會失敗。

Note3：所以根據上面的提示，一般純數組比較適合存放標簽類的數據，就像上面的案例一樣，同時字段類型設置為keyword，而不是text，搜索時進行精確匹配就好了。

1.7 滾動查詢scroll

如果一次性要查出來比如10萬條數據，那么性能會很差，此時一般會采取用scoll滾動查詢，一批一批的查，直到所有數據都查詢完處理完（es返回的scrollId，可以理解為是es進行此次查詢的操作句柄標識，每發送一次該scrollId，es都會操作一次，或者說循環一次，直到時間窗口到期）。

使用scoll滾動搜索，可以先搜索一批數據，然后下次再搜索一批數據，以此類推，直到搜索出全部的數據來，scoll搜索會在第一次搜索的時候，保存一個當時的視圖快照，之后只會基於該舊的視圖快照提供數據搜索，如果這個期間數據變更，是不會讓用戶看到的，每次發送scroll請求，我們還需要指定一個scoll參數，指定一個時間窗口，每次搜索請求只要在這個時間窗口內能完成就可以了（也就是說，該scrollId只在這個時間窗口內有效，視圖快照也是）。

GET spnews/news/_search?scroll=1m
{
  "query": { "match_all": {} }, "size": 10, "_source": ["id"] } GET _search/scroll { "scroll":"1m", "scroll_id":"DnF1ZXJ5VGhlbkZldGNoAwAAAAAAADShFmpBMjJJY2F2U242RFU5UlAzUzA4MWcAAAAAAAA0oBZqQTIySWNhdlNuNkRVOVJQM1MwODFnAAAAAAAANJ8WakEyMkljYXZTbjZEVTlSUDNTMDgxZw==" }

2、聚合

2.1 指標聚合

相當於MySQL的聚合函數。

max

{
  "size": 0, "aggs": { "max_id": { "max": { "field": "id" } } } }

size不設置為0，除了返回聚合結果外，還會返回其它所有的數據。

min

{
  "size": 0, "aggs": { "min_id": { "min": { "field": "id" } } } }

avg

{
  "size": 0, "aggs": { "avg_id": { "avg": { "field": "id" } } } }

sum

{
  "size": 0, "aggs": { "sum_id": { "sum": { "field": "id" } } } }

stats

{
  "size": 0, "aggs": { "stats_id": { "stats": { "field": "id" } } } }

2.2 桶聚合

相當於MySQL的group by操作，所以不要嘗試對es中text的字段進行桶聚合，否則會失敗。

Terms

相當於分組查詢，根據字段做聚合。

{
  "size": 0, "aggs": { "per_count": { "terms": { "size":100, "field": "vtype", "min_doc_count":1 } } } }

在桶聚合的過程中還可以進行指標聚合，相當於mysql做group by之后，再做各種max、min、avg、sum、stats之類的：

{
  "size": 0, "aggs": { "per_count": { "terms": { "field": "vtype" }, "aggs": { "stats_follower": { "stats": { "field": "realFollowerCount" } } } } } }

Filter

相當於是MySQL根據where條件過濾出結果，然后再做各種max、min、avg、sum、stats操作。

{
  "size": 0, "aggs": { "gender_1_follower": { "filter": { "term": { "gender": 1 } }, "aggs": { "stats_follower": { "stats": { "field": "realFollowerCount" } } } } } }

上面的聚合操作相當於是：查詢gender為1的各個指標。

Filters

在Filter的基礎上，可以查詢多個字段各自獨立的各個指標，即對每個查詢結果分別做指標聚合。

{
  "size": 0, "aggs": { "gender_1_2_follower": { "filters": { "filters": [ { "term": { "gender": 1 } }, { "term": { "gender": 2 } } ] }, "aggs": { "stats_follower": { "stats": { "field": "realFollowerCount" } } } } } }

Range

{
  "size": 0, "aggs": { "follower_ranges": { "range": { "field": "realFollowerCount", "ranges": [ { "to": 500 }, { "from": 500, "to": 1000 }, { "from": 1000, "to": 1500 }, { "from": "1500", "to": 2000 }, { "from": 2000 } ] } } } }

to：小於，from：大於等於

Date Range

跟上面一個類似的，其實只是字段為日期類型的，然后范圍值也是日期。

Date Histogram Aggregation

這個功能十分有用，可以根據年月日來對數據進行分類。
索引下面的文檔：

DELETE my_blog PUT my_blog { "mappings": { "article":{ "properties": { "title":{"type": "text"}, "postdate":{ "type": "date" , "format": "yyyy-MM-dd HH:mm:ss" } } } } } PUT my_blog/article/1 { "title":"Elasticsearch in Action", "postdate":"2014-09-23 23:34:12" } PUT my_blog/article/2 { "title":"Spark in Action", "postdate":"2015-09-13 14:12:22" } PUT my_blog/article/3 { "title":"Hadoop in Action", "postdate":"2016-08-23 23:12:22" }

按年對數據進行聚合：

GET my_blog/article/_search
{
  "size": 0, "aggs": { "agg_year": { "date_histogram": { "field": "postdate", "interval": "year", "order": { "_key": "asc" } } } } } { "took": 18, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0, "hits": [] }, "aggregations": { "agg_year": { "buckets": [ { "key_as_string": "2014-01-01 00:00:00", "key": 1388534400000, "doc_count": 1 }, { "key_as_string": "2015-01-01 00:00:00", "key": 1420070400000, "doc_count": 1 }, { "key_as_string": "2016-01-01 00:00:00", "key": 1451606400000, "doc_count": 1 } ] } } }

按月對數據進行聚合：

GET my_blog/article/_search
{
  "size": 0, "aggs": { "agg_year": { "date_histogram": { "field": "postdate", "interval": "month", "order": { "_key": "asc" } } } } }

這樣聚合的話，包含的年份的每一個月的數據都會被分類，不管其是否包含文檔。

按日對數據進行聚合：

GET my_blog/article/_search
{
  "size": 0, "aggs": { "agg_year": { "date_histogram": { "field": "postdate", "interval": "day", "order": { "_key": "asc" } } } } }

這樣聚合的話，包含的年份的每一個月的每一天的數據都會被分類，不管其是否包含文檔。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 es聚合查詢之桶聚合 es聚合查詢之指標聚合 es聚合查詢 es~ElasticsearchTemplate的查詢和聚合 es聚合查詢失敗 es的聚合查詢分析 ES基本的聚合查詢 ES聚合&去重查詢 Es聚合查詢 es查詢與聚合