Elasticsearch 5.0 關於term query和match query的認識
一、基本情況
前言:term query和match query牽扯的東西比較多,例如分詞器、mapping、倒排索引等。我結合官方文檔中的一個實例,談談自己對此處的理解
- string類型在es5.*分為text和keyword。text是要被分詞的,整個字符串根據一定規則分解成一個個小寫的term,keyword類似es2.3中not_analyzed的情況。
string數據put到elasticsearch中,默認是text。
NOTE:默認分詞器為standard analyzer。"Quick Brown Fox!"會被分解成[quick,brown,fox]寫入倒排索引
- term query會去倒排索引中尋找確切的term,它並不知道分詞器的存在。這種查詢適合keyword 、numeric、date
- match query知道分詞器的存在。並且理解是如何被分詞的
總的來說有如下:
- term query 查詢的是倒排索引中確切的term
- match query 會對filed進行分詞操作,然后在查詢
二、測試(1)
- 准備數據:
POST /termtest/termtype/1
{
"content":"Name"
}
POST /termtest/termtype/2
{
"content":"name city"
}
- 查看數據是否導入
GET /termtest/_search
{
"query":
{
"match_all": {}
}
}
- 結果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "termtest",
"_type": "termtype",
"_id": "2",
"_score": 1,
"_source": {
"content": "name city"
}
},
{
"_index": "termtest",
"_type": "termtype",
"_id": "1",
"_score": 1,
"_source": {
"content": "Name"
}
}
]
}
}
如上說明,數據已經被導入。該處字符串類型是text,也就是默認被分詞了
- 做如下查詢:
POST /termtest/_search
{
"query":{
"term":{
"content":"Name"
}
}
}
- 結果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
分析結果:因為是默認被standard analyzer分詞器分詞,大寫字母全部轉為了小寫字母,並存入了倒排索引以供搜索。term是確切查詢,
必須要匹配到大寫的Name。所以返回結果為空
POST /termtest/_search
{
"query":{
"match":{
"content":"Name"
}
}
}
- 結果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "termtest",
"_type": "termtype",
"_id": "1",
"_score": 0.2876821,
"_source": {
"content": "Name"
}
},
{
"_index": "termtest",
"_type": "termtype",
"_id": "2",
"_score": 0.25811607,
"_source": {
"content": "name city"
}
}
]
}
}
分析結果: 原因(1):默認被standard analyzer分詞器分詞,大寫字母全部轉為了小寫字母,並存入了倒排索引以供搜索,
原因(2):match query先對filed進行分詞,分詞為"name",再去匹配倒排索引中的term
三、測試(2)
下面是官網實例官網實例
- 導入數據
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"full_text": {
"type": "text"
},
"exact_value": {
"type": "keyword"
}
}
}
}
}
PUT my_index/my_type/1
{
"full_text": "Quick Foxes!",
"exact_value": "Quick Foxes!"
}
先指定類型,再導入數據
- full_text: 指定類型為text,是會被分詞
- exact_value: 指定類型為keyword,不會被分詞
- full_text: 會被standard analyzer分詞為如下terms [quick,foxes],存入倒排索引
- exact_value: 只有[Quick Foxes!]這一個term會被存入倒排索引
- 做如下查詢
GET my_index/my_type/_search
{
"query": {
"term": {
"exact_value": "Quick Foxes!"
}
}
}
結果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.2876821,
"_source": {
"full_text": "Quick Foxes!",
"exact_value": "Quick Foxes!"
}
}
]
}
}
exact_value包含了確切的Quick Foxes!,因此被查詢到
GET my_index/my_type/_search
{
"query": {
"term": {
"full_text": "Quick Foxes!"
}
}
}
結果:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
full_text被分詞了,倒排索引中只有quick和foxes。沒有Quick Foxes!
GET my_index/my_type/_search
{
"query": {
"term": {
"full_text": "foxes"
}
}
}
結果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.25811607,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.25811607,
"_source": {
"full_text": "Quick Foxes!",
"exact_value": "Quick Foxes!"
}
}
]
}
}
full_text被分詞,倒排索引中只有quick和foxes,因此查詢foxes能成功
GET my_index/my_type/_search
{
"query": {
"match": {
"full_text": "Quick Foxes!"
}
}
}
結果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.51623213,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.51623213,
"_source": {
"full_text": "Quick Foxes!",
"exact_value": "Quick Foxes!"
}
}
]
}
}
match query會先對自己的query string進行分詞。也就是"Quick Foxes!"先分詞為quick和foxes。然后在去倒排索引中查詢,此處full_text是text類型,被分詞為quick和foxes
因此能匹配上。