ES簡單的查詢

本文轉載自查看原文 2018-06-01 18:39 1117 ES

案例一

1、根據用戶ID、是否隱藏、帖子ID、發帖日期來搜索帖子

（1）插入一些測試帖子數據

POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

//添加數據時會報錯，不用管它，直接查詢看看是否可以查到值，現在有查詢語句可以試試。這個是ES5.6.3版本問題

初步來說，就先搞4個字段，因為整個es是支持json document格式的，所以說擴展性和靈活性非常之好。如果后續隨着業務需求的增加，要在document中增加更多的field，那么我們可以很方便的隨時添加field。但是如果是在關系型數據庫中，比如mysql，我們建立了一個表，現在要給表中新增一些column，那就很坑爹了，必須用復雜的修改表結構的語法去執行。而且可能對系統代碼還有一定的影響。

GET /forum/_mapping/article
{
  "forum": {
    "mappings": {
      "article": {
        "properties": {
          "articleID": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "hidden": {
            "type": "boolean"
          },
          "postDate": {
            "type": "date"
          },
          "userID": {
            "type": "long"
          }
        }
      }
    }
  }
}

現在es 5.2版本，type=text，默認會設置兩個field，一個是field本身，比如articleID，就是分詞的；還有一個的話，就是field.keyword，articleID.keyword，默認不分詞，會最多保留256個字符

（2）根據用戶ID搜索帖子

GET /forum/article/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "userID" : 1
                }
            }
        }
    }
}

term filter/query：對搜索文本不分詞，直接拿去倒排索引中匹配，你輸入的是什么，就去匹配什么

比如說，如果對搜索文本進行分詞的話，“helle world” --> “hello”和“world”，兩個詞分別去倒排索引中匹配
term，“hello world” --> “hello world”，直接去倒排索引中匹配“hello world”

（3）搜索沒有隱藏的帖子

GET /forum/article/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "hidden" : false
                }
            }
        }
    }
}

（4）根據發帖日期搜索帖子

GET /forum/article/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "postDate" : "2017-01-01"
                }
            }
        }
    }
}

（5）根據帖子ID搜索帖子

GET /forum/article/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "articleID" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}
GET /forum/article/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "articleID.keyword" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01"
        }
      }
    ]
  }
}

articleID.keyword，是es最新版本內置建立的field，就是不分詞的。所以一個articleID過來的時候，會建立兩次索引，一次是自己本身，是要分詞的，分詞后放入倒排索引；另外一次是基於articleID.keyword，不分詞，保留256個字符最多，直接一個字符串放入倒排索引中。

所以term filter，對text過濾，可以考慮使用內置的field.keyword來進行匹配。但是有個問題，默認就保留256個字符。所以盡可能還是自己去手動建立索引，指定not_analyzed吧。在最新版本的es中，不需要指定not_analyzed也可以，將type=keyword即可。

（6）查看分詞

GET /forum/_analyze
{
  "field": "articleID",
  "text": "XHDK-A-1293-#fJ3"
}

默認是analyzed的text類型的field，建立倒排索引的時候，就會對所有的articleID分詞，分詞以后，原本的articleID就沒有了，只有分詞后的各個word存在於倒排索引中。

term，是不對搜索文本分詞的，XHDK-A-1293-#fJ3 --> XHDK-A-1293-#fJ3；但是articleID建立索引的時候，XHDK-A-1293-#fJ3 --> xhdk，a，1293，fj3

（7）重建索引

DELETE /forum
PUT /forum
{
  "mappings": {
    "article": {
      "properties": {
        "articleID": {
          "type": "keyword"
        }
      }
    }
  }
}
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

（8）重新根據帖子ID和發帖日期進行搜索

GET /forum/article/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "articleID" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}

2、總結知識點

（1）term filter：根據exact value進行搜索，數字、boolean、date天然支持
（2）text需要建索引時指定為not_analyzed，才能用term query
（3）相當於SQL中的單個where條件

案例二

1、搜索發帖日期為2017-01-01，或者帖子ID為XHDK-A-1293-#fJ3的帖子，同時要求帖子的發帖日期絕對不為2017-01-02

select *
from forum.article
where (post_date='2017-01-01' or article_id='XHDK-A-1293-#fJ3')
and post_date!='2017-01-02'

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "should": [
            {"term": { "postDate": "2017-01-01" }},
            {"term": {"articleID": "XHDK-A-1293-#fJ3"}}
          ],
          "must_not": {
            "term": {
              "postDate": "2017-01-02"
            }
          }
        }
      }
    }
  }
}

must，should，must_not，filter：必須匹配，可以匹配其中任意一個即可，必須不匹配

2、搜索帖子ID為XHDK-A-1293-#fJ3，或者是帖子ID為JODL-X-1937-#pV7而且發帖日期為2017-01-01的帖子

select *
from forum.article
where article_id='XHDK-A-1293-#fJ3'
or (article_id='JODL-X-1937-#pV7' and post_date='2017-01-01')

GET /forum/article/_search 
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "should": [
            {
              "term": {
                "articleID": "XHDK-A-1293-#fJ3"
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term":{
                      "articleID": "JODL-X-1937-#pV7"
                    }
                  },
                  {
                    "term": {
                      "postDate": "2017-01-01"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

3、總結知識點

（1）bool：must，must_not，should，組合多個過濾條件
（2）bool可以嵌套
（3）相當於SQL中的多個and條件：當你把搜索語法學好了以后，基本可以實現部分常用的sql語法對應的功能

方案三

term: {"field": "value"}
terms: {"field": ["value1", "value2"]}

sql中的in

select * from tbl where col in ("value1", "value2")

1、為帖子數據增加tag字段

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"tag" : ["java", "hadoop"]} }
{ "update": { "_id": "2"} }
{ "doc" : {"tag" : ["java"]} }
{ "update": { "_id": "3"} }
{ "doc" : {"tag" : ["hadoop"]} }
{ "update": { "_id": "4"} }
{ "doc" : {"tag" : ["java", "elasticsearch"]} }

2、搜索articleID為KDKE-B-9947-#kL5或QQPX-R-3956-#aD8的帖子，搜索tag中包含java的帖子

GET /forum/article/_search 
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "articleID": [
            "KDKE-B-9947-#kL5",
            "QQPX-R-3956-#aD8"
          ]
        }
      }
    }
  }
}
GET /forum/article/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "terms" : { 
                    "tag" : ["java"]
                }
            }
        }
    }
}

"took": 2,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ]
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ]
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ]
        }
      }
    ]
}
}

3、優化搜索結果，僅僅搜索tag只包含java的帖子

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"tag_cnt" : 2} }
{ "update": { "_id": "2"} }
{ "doc" : {"tag_cnt" : 1} }
{ "update": { "_id": "3"} }
{ "doc" : {"tag_cnt" : 1} }
{ "update": { "_id": "4"} }
{ "doc" : {"tag_cnt" : 2} }
GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "tag_cnt": 1
              }
            },
            {
              "terms": {
                "tag": ["java"]
              }
            }
          ]
        }
      }
    }
  }
}

["java", "hadoop", "elasticsearch"]

4、總結知識點

（1）terms多值搜索
（2）優化terms多值搜索的結果
（3）相當於SQL中的in語句

方案四

1、為帖子數據增加瀏覽量的字段

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"view_cnt" : 30} }
{ "update": { "_id": "2"} }
{ "doc" : {"view_cnt" : 50} }
{ "update": { "_id": "3"} }
{ "doc" : {"view_cnt" : 100} }
{ "update": { "_id": "4"} }
{ "doc" : {"view_cnt" : 80} }

2、搜索瀏覽量在30~60之間的帖子

GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "view_cnt": {
            "gt": 30,
            "lt": 60
          }
        }
      }
    }
  }
}

gte

lte

3、搜索發帖日期在最近1個月的帖子

POST /forum/article/_bulk
{ "index": { "_id": 5 }}
{ "articleID" : "DHJK-B-1395-#Ky5", "userID" : 3, "hidden": false, "postDate": "2017-03-01", "tag": ["elasticsearch"], "tag_cnt": 1, "view_cnt": 10 }
GET /forum/article/_search 
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "postDate": {
            "gt": "2017-03-10||-30d"
          }
        }
      }
    }
  }
}
GET /forum/article/_search 
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "postDate": {
            "gt": "now-30d"
          }
        }
      }
    }
  }
}

4、總結知識點

（1）range，sql中的between，或者是>=1，<=1
（2）range做范圍過濾

方案五

1、為帖子數據增加標題字段

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"title" : "this is java and elasticsearch blog"} }
{ "update": { "_id": "2"} }
{ "doc" : {"title" : "this is java blog"} }
{ "update": { "_id": "3"} }
{ "doc" : {"title" : "this is elasticsearch blog"} }
{ "update": { "_id": "4"} }
{ "doc" : {"title" : "this is java, elasticsearch, hadoop blog"} }
{ "update": { "_id": "5"} }
{ "doc" : {"title" : "this is spark blog"} }

2、搜索標題中包含java或elasticsearch的blog

這個，就跟之前的那個term query，不一樣了。不是搜索exact value，是進行full text全文檢索。
match query，是負責進行全文檢索的。當然，如果要檢索的field，是not_analyzed類型的，那么match query也相當於term query。

GET /forum/article/_search
{
    "query": {
        "match": {
            "title": "java elasticsearch"
        }
    }
}

3、搜索標題中包含java和elasticsearch的blog

搜索結果精准控制的第一步：靈活使用and關鍵字，如果你是希望所有的搜索關鍵字都要匹配的，那么就用and，可以實現單純match query無法實現的效果

GET /forum/article/_search
{
    "query": {
        "match": {
            "title": {
  "query": "java elasticsearch",
  "operator": "and"
        }
        }
    }
}

4、搜索包含java，elasticsearch，spark，hadoop，4個關鍵字中，至少3個的blog

控制搜索結果的精准度的第二步：指定一些關鍵字中，必須至少匹配其中的多少個關鍵字，才能作為結果返回

GET /forum/article/_search
{
  "query": {
    "match": {
      "title": {
        "query": "java elasticsearch spark hadoop",
        "minimum_should_match": "75%"
      }
    }
  }
}

5、用bool組合多個搜索條件，來搜索title

GET /forum/article/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "java" }},
      "must_not": { "match": { "title": "spark"  }},
      "should": [
                  { "match": { "title": "hadoop" }},
                  { "match": { "title": "elasticsearch"   }}
      ]
    }
  }
}

6、bool組合多個搜索條件，如何計算relevance score

must和should搜索對應的分數，加起來，除以must和should的總數

排名第一：java，同時包含should中所有的關鍵字，hadoop，elasticsearch
排名第二：java，同時包含should中的elasticsearch
排名第三：java，不包含should中的任何關鍵字

should是可以影響相關度分數的

must是確保說，誰必須有這個關鍵字，同時會根據這個must的條件去計算出document對這個搜索條件的relevance score
在滿足must的基礎之上，should中的條件，不匹配也可以，但是如果匹配的更多，那么document的relevance score就會更高

{
"took": 6,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.3375794,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1.3375794,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.53484553,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ],
          "tag_cnt": 2,
          "view_cnt": 30,
          "title": "this is java and elasticsearch blog"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.19856805,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog"
        }
      }
    ]
}
}

7、搜索java，hadoop，spark，elasticsearch，至少包含其中3個關鍵字

默認情況下，should是可以不匹配任何一個的，比如上面的搜索中，this is java blog，就不匹配任何一個should條件
但是有個例外的情況，如果沒有must的話，那么should中必須至少匹配一個才可以
比如下面的搜索，should中有4個條件，默認情況下，只要滿足其中一個條件，就可以匹配作為結果返回

但是可以精准控制，should的4個條件中，至少匹配幾個才能作為結果返回

GET /forum/article/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "java" }},
        { "match": { "title": "elasticsearch"   }},
        { "match": { "title": "hadoop"   }},
 { "match": { "title": "spark"   }}
      ],
      "minimum_should_match": 3 
    }
  }
}

總結知識點

1、全文檢索的時候，進行多個值的檢索，有兩種做法，match query；should
2、控制搜索結果精准度：and operator，minimum_should_match

方案六

1、普通match如何轉換為term+should

{
    "match": { "title": "java elasticsearch"}
}

使用諸如上面的match query進行多值搜索的時候，es會在底層自動將這個match query轉換為bool的語法

bool should，指定多個搜索詞，同時使用term query

{
"bool": {
    "should": [
      { "term": { "title": "java" }},
      { "term": { "title": "elasticsearch"   }}
    ]
}
}

2、and match如何轉換為term+must

{
    "match": {
        "title": {
            "query":    "java elasticsearch",
            "operator": "and"
        }
    }
}

{
"bool": {
    "must": [
      { "term": { "title": "java" }},
      { "term": { "title": "elasticsearch"   }}
    ]
}
}

3、minimum_should_match如何轉換

{
    "match": {
        "title": {
            "query":                "java elasticsearch hadoop spark",
            "minimum_should_match": "75%"
        }
    }
}

{
"bool": {
    "should": [
      { "term": { "title": "java" }},
      { "term": { "title": "elasticsearch"   }},
      { "term": { "title": "hadoop" }},
      { "term": { "title": "spark" }}
    ],
    "minimum_should_match": 3
}
}

為啥要講解兩種實現multi-value搜索的方式呢？實際上。match query --> bool + term。

方案七

需求：搜索標題中包含java的帖子，同時呢，如果標題中包含hadoop或elasticsearch就優先搜索出來，同時呢，如果一個帖子包含java hadoop，一個帖子包含java elasticsearch，包含hadoop的帖子要比elasticsearch優先搜索出來

知識點，搜索條件的權重，boost，可以將某個搜索條件的權重加大，此時當匹配這個搜索條件和匹配另一個搜索條件的document，計算relevance score時，匹配權重更大的搜索條件的document，relevance score會更高，當然也就會優先被返回回來

默認情況下，搜索條件的權重都是一樣的，都是1

GET /forum/article/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "blog"
          }
        }
      ],
      "should": [
        {
          "match": {
            "title": {
              "query": "java"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "hadoop"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "elasticsearch"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "spark",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Es簡單條件查詢 ES 並且或者查詢 es查詢 es的基本查詢 es的多種term查詢 es查詢示例 ES elasticsearch 各種查詢 ES 父子文檔查詢 java-es查詢 ES中的查詢操作