Elasticsearch 5.x 關於term query和match query的認識

本文轉載自查看原文 2017-08-28 16:00 8273 es

http://blog.csdn.net/yangwenbo214/article/details/54142786

一、基本情況

前言：term query和match query牽扯的東西比較多，例如分詞器、mapping、倒排索引等。我結合官方文檔中的一個實例，談談自己對此處的理解

string類型在es5.*分為text和keyword。text是要被分詞的，整個字符串根據一定規則分解成一個個小寫的term，keyword類似es2.3中not_analyzed的情況。
string數據put到elasticsearch中，默認是text。

NOTE:默認分詞器為standard analyzer。”Quick Brown Fox!”會被分解成[quick,brown,fox]寫入倒排索引

term query會去倒排索引中尋找確切的term，它並不知道分詞器的存在。這種查詢適合keyword 、numeric、date
match query知道分詞器的存在。並且理解是如何被分詞的
總的來說有如下： 
- term query 查詢的是倒排索引中確切的term 
- match query 會對filed進行分詞操作，然后在查詢

二、測試（1）

准備數據：
POST /termtest/termtype/1
{
  "content":"Name"
}


POST /termtest/termtype/2
{
  "content":"name city"
}

查看數據是否導入
GET /termtest/_search
{
  "query":
  {
    "match_all": {}
  }
}


結果：
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "2",
        "_score": 1,
        "_source": {
          "content": "name city"
        }
      },
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "1",
        "_score": 1,
        "_source": {
          "content": "Name"
        }
      }
    ]
  }
}


如上說明，數據已經被導入。該處字符串類型是text，也就是默認被分詞了

做如下查詢：
POST /termtest/_search
{
  "query":{
    "term":{
      "content":"Name"
    }
  }
}




結果
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}




分析結果：因為是默認被standard analyzer分詞器分詞，大寫字母全部轉為了小寫字母，並存入了倒排索引以供搜索。term是確切查詢， 
必須要匹配到大寫的Name。所以返回結果為空

POST /termtest/_search
{
  "query":{
    "match":{
      "content":"Name"
    }
  }
}


結果
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "content": "Name"
        }
      },
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "2",
        "_score": 0.25811607,
        "_source": {
          "content": "name city"
        }
      }
    ]
  }
}




分析結果: 原因（1）：默認被standard analyzer分詞器分詞，大寫字母全部轉為了小寫字母，並存入了倒排索引以供搜索， 
原因（2）：match query先對filed進行分詞，分詞為”name”,再去匹配倒排索引中的term

三、測試（2）

下面是官網實例官網實例 
1. 導入數據

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "full_text": {
          "type":  "text" 
        },
        "exact_value": {
          "type":  "keyword" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "full_text":   "Quick Foxes!", 
  "exact_value": "Quick Foxes!"  
}




先指定類型，再導入數據

full_text: 指定類型為text，是會被分詞
exact_value: 指定類型為keyword，不會被分詞
full_text： 會被standard analyzer分詞為如下terms [quick,foxes],存入倒排索引
exact_value： 只有[Quick Foxes!]這一個term會被存入倒排索引

做如下查詢
GET my_index/my_type/_search
{
  "query": {
    "term": {
      "exact_value": "Quick Foxes!" 
    }
  }
}


結果：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "full_text": "Quick Foxes!",
          "exact_value": "Quick Foxes!"
        }
      }
    ]
  }
}


exact_value包含了確切的Quick Foxes!，因此被查詢到

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "full_text": "Quick Foxes!" 
    }
  }
}
結果：

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

full_text被分詞了，倒排索引中只有quick和foxes。沒有Quick Foxes!

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "full_text": "foxes" 
    }
  }
}

結果：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.25811607,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.25811607,
        "_source": {
          "full_text": "Quick Foxes!",
          "exact_value": "Quick Foxes!"
        }
      }
    ]
  }
}



full_text被分詞，倒排索引中只有quick和foxes，因此查詢foxes能成功

GET my_index/my_type/_search
{
  "query": {
    "match": {
      "full_text": "Quick Foxes!" 
    }
  }
}

結果：

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.51623213,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.51623213,
        "_source": {
          "full_text": "Quick Foxes!",
          "exact_value": "Quick Foxes!"
        }
      }
    ]
  }
}


match query會先對自己的query string進行分詞。也就是”Quick Foxes!”先分詞為quick和foxes。然后在去倒排索引中查詢，此處full_text是text類型，被分詞為quick和foxes 
因此能匹配上。

參考文獻:http://blog.csdn.net/yangwenbo214/article/details/54142786

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 elasticsearch 5.x Delete By Query API（根據條件刪除） es match、match_phrase、query_string和term的區別 Elasticsearch學習系列之term和match查詢 elasticsearch 中的Multi Match Query [ElasticSearch]Java API 之詞條查詢（Term Level Query） Elasticsearch Query DSL 整理總結（四）—— Multi Match Query ElasticSearch 5.X 搜索並用高亮顯示 spring boot 整合 elasticsearch 5.x Match Query 【ElasticSearch（九）進階】Term精確數值查詢，match keyword精確文本查詢