ElasticSearch 中的 Mapping

本文轉載自查看原文 2021-02-25 10:12 1178 ElasticSearch 筆記

公號：碼農充電站pro
主頁：https://codeshellme.github.io

1，ES 中的 Mapping

ES 中的 Mapping 相當於傳統數據庫中的表定義，它有以下作用：

定義索引中的字段的名字。
定義索引中的字段的類型，比如字符串，數字等。
定義索引中的字段是否建立倒排索引。

一個 Mapping 是針對一個索引中的 Type 定義的：

ES 中的文檔都存儲在索引的 Type 中
在 ES 7.0 之前，一個索引可以有多個 Type，所以一個索引可擁有多個 Mapping
在 ES 7.0 之后，一個索引只能有一個 Type，所以一個索引只對應一個 Mapping

通過下面語法可以獲取一個索引的 Mapping 信息：

GET index_name/_mapping

2，ES 字段的 mapping 參數

字段的 mapping 可以設置很多參數，如下：

analyzer：指定分詞器，只有 text 類型的數據支持。
enabled：如果設置成 false，表示數據僅做存儲，不支持搜索和聚合分析（數據保存在 _source 中）。
- 默認值為 true。
index：字段是否建立倒排索引。
- 如果設置成 false，表示不建立倒排索引（節省空間），同時數據也無法被搜索，但依然支持聚合分析，數據也會出現在 _source 中。
- 默認值為 true。
norms：字段是否支持算分。
- 如果字段只用來過濾和聚合分析，而不需要被搜索（計算算分），那么可以設置為 false，可節省空間。
- 默認值為 true。
doc_values：如果確定不需要對字段進行排序或聚合，也不需要從腳本訪問字段值，則可以將其設置為 false，以節省磁盤空間。
- 默認值為 true。
fielddata：如果要對 text 類型的數據進行排序和聚合分析，則將其設置為 true。
- 默認為 false。
store：默認值為 false，數據存儲在 _source 中。
- 默認情況下，字段值被編入索引以使其可搜索，但它們不會被存儲。這意味着可以查詢字段，但無法檢索原始字段值。
- 在某些情況下，存儲字段是有意義的。例如，有一個帶有標題、日期和非常大的內容字段的文檔，只想檢索標題和日期，而不必從一個大的源字段中提取這些字段。
boost：可增強字段的算分。
coerce：是否開啟數據類型的自動轉換，比如字符串轉數字。
- 默認是開啟的。
dynamic：控制 mapping 的自動更新，取值有 true，false，strict。
eager_global_ordinals
fields：多字段特性。
- 讓一個字段擁有多個子字段類型，使得一個字段能夠被多個不同的索引方式進行索引。
copy_to
format
ignore_above
ignore_malformed
index_options
index_phrases
index_prefixes
meta
normalizer
null_value：定義 null 的值。
position_increment_gap
properties
search_analyzer
similarity
term_vector

2.1，fields 參數

讓一個字段擁有多個子字段類型，使得一個字段能夠被多個不同的索引方式進行索引。

示例 1：

PUT index_name
{
  "mappings": {         # 設置 mappings
    "properties": {     # 屬性，固定寫法
      "city": {         # 字段名
        "type": "text", # city 字段的類型為 text
        "fields": {     # 多字段域，固定寫法
          "raw": {      # 子字段名稱
            "type":  "keyword"  # 子字段類型
          }
        }
      }
    }
  }
}

示例 2 ：

PUT index_name
{
  "mappings": {
    "properties": {
      "title": {               # 字段名稱
        "type": "text",        # 字段類型
        "analyzer": "english", # 字段分詞器
        "fields": {            # 多字段域，固定寫法
          "std": {             # 子字段名稱
            "type": "text",    # 子字段類型
            "analyzer": "standard"  # 子字段分詞器
           }
        }
      }
    }
  }
}

3，ES 字段的數據類型

ES 中字段的數據類型有以下這些：

簡單類型
- Numeric
- Boolean
- Date
- Text
- Keyword
- Binary
- 等
復雜類型
- Object
- Arrays
- Nested：一種對象數據類型。
- Join：為同一索引中的文檔定義父/子關系。
特殊類型

text 類型與 keyword 類型

字符串數據可以定義成 text 或 keyword 類型，text 類型數據會做分詞處理，而 keyword 類型數據不會做分詞處理。

數組類型

對於數組類型 Arrays，ES 並沒有提供專門的數組類型，但是任何字段都可以包含多個相同類型的數據，比如：

["one", "two"] # 一個字符串數組
[1, 2]         # 一個整數數組
[1, [ 2, 3 ]]   # 相當於 [ 1, 2, 3 ]
[{ "name": "Mary", "age": 12 }, { "name": "John", "age": 10 }] # 一個對象數組

當在 Mapping 中查看這些數組的類型時，其實還是數組中的元素的類型，而不是一個數組類型。

3.1，Nested 類型

Nested 是一種對象類型，它保留了子字段之間的關系。

1，為什么需要 Nested 類型

假如我們有如下結構的數據：

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[ # actors 是一個數組類型，數組中的元素是對象類型
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },
    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }
  ]
}

將數據插入 ES 之后，執行下面的查詢：

# 查詢電影信息
POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Hopper"}}
      ]
    }
  }
}

按照上面的查詢語句，我們想查詢的是 first_name=Keanu 且 last_name=Hopper 的數據，所以我們剛才插入的 id 為 1 的文檔應該不符合這個查詢條件。

但是在 ES 中執行上面的查詢語句，卻能查出 id 為 1 的文檔。這是為什么呢？

這是因為，ES 對於這種 actors 字段這樣的結構的數據，ES 並沒有考慮對象的邊界。

實際上，在 ES 內部，id 為 1 的那個文檔是這樣存儲的：

"title":"Speed"
"actors.first_name":["Keanu","Dennis"]
"actors.last_name":["Reeves","Hopper"]

所以這種存儲方式，並不是我們想象的那樣。

如果我們查看 ES 默認為上面（id 為 1）結構的數據生成的 mappings，如下：

{
  "my_movies" : {
    "mappings" : {
      "properties" : {
        "actors" : {           # actors 內部又嵌套了一個 properties
          "properties" : {
            "first_name" : {   # 定義 first_name 的類型
              "type" : "text",
              "fields" : {
                "keyword" : {"type" : "keyword", "ignore_above" : 256}
              }
            },
            "last_name" : {    # 定義 last_name 的類型
              "type" : "text",
              "fields" : {
                "keyword" : {"type" : "keyword", "ignore_above" : 256}
              }
            }
          }
        }, # end actors
        "title" : {  
          "type" : "text",
          "fields" : {
            "keyword" : {"type" : "keyword", "ignore_above" : 256}
          }
        }
      }
    }
  }
}

那如何才能真正的表達一個對象類型呢？這就需要使用到 Nested 類型。

2，使用 Nested 類型

Nested 類型允許對象數組中的對象被獨立（看作一個整體）索引。

我們對 my_movies 索引設置這樣的 mappings：

DELETE my_movies
PUT my_movies
{
    "mappings" : {
    "properties" : {
      "actors" : {
        "type": "nested",  # 將 actors 設置為 nested 類型
        "properties" : {   # 這時 actors 數組中的每個對象就是一個整體了
          "first_name" : {"type" : "keyword"},
          "last_name" : {"type" : "keyword"}
        }},
      "title" : {
        "type" : "text",
        "fields" : {"keyword":{"type":"keyword","ignore_above":256}}
      }
    }
  }
}

寫入數據后，在進行這樣的搜索，就不會搜索出數據了：

# 查詢電影信息
POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Hopper"}}
      ]
    }
  }
}

但是這樣的查詢也查不出數據：

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Reeves"}}
      ]
    }
  }
}

3，搜索 Nested 類型

這是因為，查詢 Nested 類型的數據，要像下面這樣查詢：

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {          # nested 查詢
            "path": "actors",  # 自定 actors 字段路徑
            "query": {         # 查詢語句
              "bool": {
                "must": [
                  {"match": {"actors.first_name": "Keanu"}},
                  {"match": {"actors.last_name": "Hopper"}}
                ]
              }
            }
          } # end nested
        }
      ] # end must
    } # end bool
  }
}

4，聚合 Nested 類型

對 Nested 類型的數據進行聚合，示例：

# Nested Aggregation
POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {            # 自定義聚合名稱
      "nested": {          # 指定 nested 類型
        "path": "actors"   # 聚合的字段名稱
      },
      "aggs": {            # 子聚合
        "actor_name": {    # 自定義子聚合名稱
          "terms": {       # terms 聚合
            "field": "actors.first_name",  # 子字段名稱
            "size": 10
          }
        }
      }
    }
  }
}

使用普通的聚合方式則無法工作：

POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {     # 自定義聚合名稱
      "terms": {    # terms 聚合 
        "field": "actors.first_name",
        "size": 10
      }
    }
  }
}

3.2，Join 類型

Nested 類型的對象與其父/子級文檔的關系，使得每次文檔有更新的時候需要重建整個文檔（包括根對象和嵌套對象）的索引。

Join 數據類型（類似關系型數據庫中的 Join 操作）為同一索引中的文檔定義父/子關系。

Join 數據類型可以維護一個父/子關系，從而分離兩個對象，它的優點是：

父文檔和子文檔是兩個完全獨立的文檔，這使得更新父文檔不會影響到子文檔，更新子文檔也不會影響到父文檔。

Nested 類型與 Join（Parent/Child）類型的優缺點對比：

在這里插入圖片描述

1，定義 Join 類型

定義 Join 類型的語法如下：

DELETE my_blogs

# 設定 Parent/Child Mapping
PUT my_blogs
{
  "mappings": {
    "properties": {
      "blog_comments_relation": {  # 字段名稱
        "type": "join",            # 定義 join 類型
        "relations": {             # 定義父子關系
          "blog": "comment"        # blog 表示父級文檔，comment 表示子級文檔
        }
      },
      "content": {
        "type": "text"
      },
      "title": {
        "type": "keyword"
      }
    }
  }
}

2，插入 Join 數據

先插入兩個父文檔：

# 插入 blog1
PUT my_blogs/_doc/blog1
{
  "title":"Learning Elasticsearch",
  "content":"learning ELK @ geektime",
  "blog_comments_relation":{
    "name":"blog"  # name 為 blog 表示父文檔
  }
}

# 插入 blog2
PUT my_blogs/_doc/blog2
{
  "title":"Learning Hadoop",
  "content":"learning Hadoop",
    "blog_comments_relation":{
    "name":"blog" # name 為 blog 表示父文檔
  }
}

插入子文檔：

其中需要注意 routing 的值是父文檔 id；
這樣可以確保父子文檔被索引到相同的分片，從而確保 join 查詢的性能。

# 插入comment1
PUT my_blogs/_doc/comment1?routing=blog1 # routing 的值是父文檔 id
{                                        # 確保父子文檔被索引到相同的分片
  "comment":"I am learning ELK",
  "username":"Jack",
  "blog_comments_relation":{
    "name":"comment",  # name 為 comment 表示子文檔
    "parent":"blog1"   # 指定父文檔的 id，表示子文檔屬於哪個父文檔
  }
}

# 插入 comment2
PUT my_blogs/_doc/comment2?routing=blog2 # routing 的值是父文檔 id
{                                        # 確保父子文檔被索引到相同的分片
  "comment":"I like Hadoop!!!!!",
  "username":"Jack",
  "blog_comments_relation":{
    "name":"comment", # name 為 comment 表示子文檔
    "parent":"blog2"  # 指定父文檔的 id，表示子文檔屬於哪個父文檔
  }
}

# 插入 comment3
PUT my_blogs/_doc/comment3?routing=blog2 # routing 的值是父文檔 id
{                                        # 確保父子文檔被索引到相同的分片
  "comment":"Hello Hadoop",
  "username":"Bob",
  "blog_comments_relation":{
    "name":"comment", # name 為 comment 表示子文檔
    "parent":"blog2"  # 指定父文檔的 id，表示子文檔屬於哪個父文檔
  }
}

3，parent_id 查詢

根據父文檔 id 來查詢父文檔，普通的查詢無法查出子文檔的信息：

GET my_blogs/_doc/blog2

如果想查到子文檔的信息，需要使用 parent_id 查詢：

POST my_blogs/_search
{
  "query": {
    "parent_id": {        # parent_id 查詢
      "type": "comment",  # comment 表示是子文檔，即是表示想查詢子文檔信息
      "id": "blog2"       # 指定父文檔的 id
    }                     # 這樣可以查詢到 blog2 的所有 comment
  }
}

4，has_child 查詢

has_child 查詢可以通過子文檔的信息，查到父文檔信息。

POST my_blogs/_search
{
  "query": {
    "has_child": {       # has_child 查詢
      "type": "comment", # 指定子文檔類型，表示下面的 query 中的信息要在 comment 子文檔中匹配
      "query" : {        
          "match": {"username" : "Jack"}
      }                  # 在子文檔中匹配信息，最終返回所有的相關父文檔信息
    }
  }
}

5，has_parent 查詢

has_parent 查詢可以通過父文檔的信息，查到子文檔信息。

POST my_blogs/_search
{
  "query": {
    "has_parent": {          # has_parent 查詢
      "parent_type": "blog", # 指定子文檔類型，表示下面的 query 中的信息要在 blog 父文檔中匹配
      "query" : {
          "match": {"title" : "Learning Hadoop"}
      }                      # 在父文檔中匹配信息，最終返回所有的相關子文檔信息
    }
  }
}

6，通過子文檔 id 查詢子文檔信息

普通的查詢無法查到：

GET my_blogs/_doc/comment3

需要指定 routing 參數，提供父文檔 id：

GET my_blogs/_doc/comment3?routing=blog2

7，更新子文檔信息

更新子文檔不會影響到父文檔。

示例：

# URI 中指定子文檔 id，並通過 routing 參數指定父文檔 id
PUT my_blogs/_doc/comment3?routing=blog2
{
    "comment": "Hello Hadoop??",
    "blog_comments_relation": {
      "name": "comment",
      "parent": "blog2"
    }
}

4，ES 動態 Mapping

ES 中的動態 Mapping 指的是：

在寫入新文檔的時候，如果索引不存在，ES 會自動創建索引。
動態 Mapping 使得我們可以不定義 Mapping，ES 會自動根據文檔信息，推斷出字段的類型。
但有時候也會推斷錯誤，不符合我們的預期，比如地理位置信息等。

ES 類型的自動識別規則如下：

在這里插入圖片描述

5，修改文檔字段類型

字段類型是否能夠修改，分兩種情況：

對於新增字段：
- 如果 mappings._doc.dynamic 為 ture，當有新字段寫入時，Mappings 會自動更新。
- 如果 mappings._doc.dynamic 為 false，當有新字段寫入時，Mappings 不會更新；新增字段不會建立倒排索引，但是信息會出現在 _source 中。
- 如果 mappings._doc.dynamic 為 strict，當有新字段寫入時，寫入失敗。
對於已有字段：
- 字段的類型不允許再修改。因為如果修改了，會導致已有的信息無法被搜索。
- 如果希望修改字段類型，需要 Reindex 重建索引。

dynamic 有 3 種取值，使用下面 API 可以修改 dynamic 的值：

PUT index_name/_mapping
{
  "dynamic": false/true/strict
}

通過下面語法可以獲取一個索引的 Mapping：

GET index_name/_mapping

6，自定義 Mapping

自定義 Mapping 的語法如下：

PUT index_name
{
  "mappings" : {
    # 定義
  }
}

自定義 Mapping 的小技巧：

創建一個臨時索引，寫入一些測試數據
獲取該索引的 Mapping 值，修改后，使用它創建新的索引
刪除臨時索引

Mappings 有很多參數可以設置，可以參考這里。

6.1，一個嵌套對象的 mappings

如果我們要在 ES 中插入如下結構的數據：

PUT blog/_doc/1
{
  "content":"I like Elasticsearch",
  "time":"2019-01-01T00:00:00",
  "user": { # 是一個對象類型
    "userid":1,
    "username":"Jack",
    "city":"Shanghai"
  }
}

其中的 user 字段是一個對象類型。

這種結構的數據對應的 mappings 應該像下面這樣定義：

PUT /blog
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      },
      "time": {
        "type": "date"
      },
      "user": {  # user 內部又嵌套了一個 properties
        "properties": {
          "city": {
            "type": "text"
          },
          "userid": {
            "type": "long"
          },
          "username": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

6.2，一個對象數組的 mappings

如果我們要在 ES 中插入如下結構的數據：

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[ # actors 是一個數組類型，數組中的元素是對象類型
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },
    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }
  ]
}

其中的 actors 字段是一個數組類型，數組中的元素是對象類型。

像這種結構的數據對應的 mappings 應該像下面這樣定義：

PUT my_movies
{
  "mappings": {
	"properties": {
	  "actors": {         # actors 字段
		"properties": {   # 嵌入了一個 properties
		   "first_name": {"type": "keyword"},
		   "last_name": {"type": "keyword"}
		 }
		},
		"title": {
		   "type": "text",
		   "fields": {
			   "keyword": {
				   "type": "keyword",
				   "ignore_above": 256
				}
			}
		}
	}
  }
}

7，控制字段是否可被索引

可以通過設置字段的 index 值，來控制某些字段是否可被搜索。

index 有兩種取值：true / false，默認為 true。

當某個字段的 index 值為 false 時，ES 就不會為該字段建立倒排索引（節省空間），該字段也不能被搜索（如果搜索的話會報錯）。

設置語法如下：

PUT index_name
{
    "mappings" : {          # 固定寫法
      "properties" : {      # 固定寫法
        "firstName" : {     # 字段名
          "type" : "text"
        },
        "lastName" : {      # 字段名
          "type" : "text"
        },
        "mobile" : {        # 字段名
          "type" : "text",
          "index": false    # 設置為 false
        }
      }
    }
}

8，控制倒排索引項的內容

我們可以通過設置 index_options 的值來控制倒排索引項的內容，它有 4 種取值：

docs：只記錄文檔 id
freqs：記錄文檔 id 和 詞頻
positions：記錄文檔 id，詞頻 和 單詞 position
offsets：記錄文檔 id，詞頻，單詞 position 和 字符 offset

Text 類型的數據，index_options 的值默認為 positions；其它類型的數據，index_options 的值默認為 docs。

注意：對於 index_options 的默認值，不同版本的 ES，可能不一樣，請查看相應版本的文檔。

對於倒排索引項，其記錄的內容越多，占用的空間也就越大，同時 ES 也會對字段進行更多的分析。

設置語法如下：

PUT index_name
{
  "mappings": {                      # 固定寫法
    "properties": {                  # 固定寫法
      "text": {                      # 字段名
        "type": "text",              # 字段的數據類型
        "index_options": "offsets"   # index_options 值
      }
    }
  }
}

9，設置 null 值可被搜索

默認情況下 null 和空數組[] 是不能夠被搜索的，比如下面的兩個文檔：

PUT my_index/_doc/1
{
  "status_code": null
}

PUT my_index/_doc/2
{
  "status_code": [] 
}

要想使得這兩個文檔能夠被搜索，需要設置 null_value 參數，如下：

PUT my_index
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword",    # 只有 Keyword 類型的數據，才支持設置 null_value
        "null_value": "NULL"  # 將 null_value 設置為 NULL，就可以通過 NULL 搜索了
      }
    }
  }
}

注意只有 Keyword 類型的數據，才支持設置 null_value，將 null_value 設置為 NULL，就可以通過 NULL 搜索了，如下：

GET my-index/_search?q=status_code:NULL

10，索引模板

索引模板（Index Template）設置一個規則，自動生成索引的 Mappings 和 Settings。

索引模板有以下特性：

模板只在索引創建時起作用，修改模板不會影響已創建的索引。
可以設置多個索引模板，這些設置會被 merge 在一起。
可以設置 order 的數值，控制 merge 的過程。

多個模板時的 merge 規則，當一個索引被創建時：

使用 ES 默認的 mappings 和 settings。
使用 order 值低的模板。
使用 order 值高的模板，它會覆蓋 order 值低的模板。
使用用戶自帶的，指定的 mappings 和 settings，這個級別的最高，會覆蓋之前所有的。

對於相同字段的不同只會進行覆蓋，對於不同的字段會進行疊加依次使用。

索引模板示例：

PUT _template/template_1  # template_1 是自定義的索引模板的名稱
{
  "index_patterns": ["te*", "bar*"], # 匹配索引的規則，該模板會作用於這些索引名上
  "settings": {                      # settings 設置
    "number_of_shards": 1
  },
  "mappings": {                      # mappings 設置
    "_source": {
      "enabled": false
    },
    "properties": {
      "host_name": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "EEE MMM dd HH:mm:ss Z yyyy"
      }
    }
  }
}

多個索引模板：

PUT /_template/template_1
{
    "index_patterns" : ["*"],
    "order" : 0,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_source" : { "enabled" : false }
    }
}

PUT /_template/template_2
{
    "index_patterns" : ["te*"],
    "order" : 1,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_source" : { "enabled" : true }
    }
}

11，動態模板

動態模板（Dynamic Template）用於設置某個指定索引中的字段的數據類型。

（本節完。）

推薦閱讀：

ElasticSearch URI 查詢

ElasticSearch DSL 查詢

ElasticSearch 文檔及操作

ElasticSearch 搜索模板與建議

ElasticSearch 聚合分析

歡迎關注作者公眾號，獲取更多技術干貨。

碼農充電站pro

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 elasticsearch中的mapping簡介更改elasticsearch中索引的mapping ElasticSearch Mapping中的字段類型 elasticsearch 之mapping Elasticsearch中mapping全解實戰 Elasticsearch 篇之Mapping 設置 Elasticsearch的mapping講解 elasticsearch-Mapping ElasticSearch Index API && Mapping elasticsearch mapping demo