【ES】match_phrase與regexp

本文轉載自查看原文 2017-06-12 19:34 2226 elasticsearch

剛開始接觸es，由於弄不清楚match_phrase和regexp導致很多查詢結果與預想的不同。在這整理一下。

regexp：針對的是單個詞項

match_phrase：針對的是多個詞項的相對位置

它們的查詢結果跟分析器分詞的方式有很大關系。

比如，我有兩個字符串"HELLO-world" 和 "hello.WORLD"，字段名稱是title。

針對"HELLO-world"，看下面兩個語句。第二個是可以匹配的，第一個不可以。

{ "regexp": { "title": "hello-w.*" }} 
{ "match_phrase": { "title": "hello world" }}

分析一下，可以看到，HELLO-world被分為了兩個單詞，hello和world。

-GET _analyze
{        
    "field": "title",
    "text": "HELLO-world"
}
---------------------------
{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

首先，es是沒有大寫的，所有的字符都被轉換成了小寫。其次，"-"字符丟失了。

regexp是針對單個詞項的，無論是hello還是world，都不符合正則條件，故沒有匹配。

match_phrase是針對多個詞項的。首先match_phrase的"hello world"被分為了hello和world兩個單詞，然后這兩個單詞在title的分詞中都可以找到，並且相對位置滿足條件，故語句可以匹配。

再看 "hello.WORLD"

{ "regexp": { "title": "hello\\.w.*" }} 
{ "match_phrase": { "title": "hello world" }}

結果是，第一個可以匹配，而第二個不能。

原因看分詞結果：

-GET_analyze
{        
    "field": "title",
    "text": "hello.WORLD"
}
-------------------------------
{
  "tokens" : [
    {
      "token" : "hello.world",
      "start_offset" : 0,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

坑爹的情況出現了，"."並不會被切分，整個"hello.world"被視作了一個詞項。

match_phrase在詞項中查找hello和world都查找不到，故不會匹配

regexp則能找到一個滿足正則表達式的詞項，故可以匹配。

ES的分詞處理非常重要，很大的影響了查詢結果！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ES模糊查詢wildcard的替代方案，nGram + match_phrase ElasticSearch match, match_phrase, term區別 Elasticsearch Query DSL 整理總結（三）—— Match Phrase Query 和 Match Phrase Prefix Query 【ES】term和match的區別 es 之term和match區別 RegExp 正則表達式test和string的match方法 ES筆記七：filter和match的區別 ES 入門記錄之 match和term查詢的區別 Cannot find module 'core-js/modules/es6.regexp.constructor' npm 報錯 Cannot find module 'core-js/modules/es6.regexp.constructor'