Elastic Stack 筆記（五）Elasticsearch5.6 Mappings 映射

本文轉載自查看原文 2018-06-08 23:06 1072 Elasticsearch/ Mappings/ ELK

博客地址：http://www.moonxy.com

一、前言

關系型數據庫對我們來說都很熟悉，Elasticsearch 也可以看成是一種數據庫，所以我們經常將關系型數據庫中的概念和 Elasticsearch 中的概念進行對比，如下：

Relational DB（關系型數據庫） -> Databases（數據庫） -> Tables（表） -> Rows（行） -> Columns（列）
Elasticsearch -> Indices（索引） -> Types（類型） -> Documents（文檔） -> Fields（域/字段）

如上所示，Elasticsearch 中的 index（索引）就相當於數據庫，type（類型）相當於表，mapping（映射）相當於表結構，document（文檔）相當於行等等。

但是 Elasticsearch 也有自己的特點：

Elasticsearch 沒有典型意義的事務；

Elasticsearch 是一種面向文檔的數據庫；

Elasticsearch 沒有提供授權和認證特性。

二、映射

為了能夠把日期字段處理成日期，把數字字段處理成數字，把字符串字段處理成全文本（Full-text）或精確（Exact-value）的字符串值，Elasticsearch 需要知道每個字段里面都包含什么數據類型。這些類型和字段的信息存儲在映射中。創建索引的時候，可以預先定義字段的類型以及相關屬性，相當於定義數據庫字段的屬性。以下參考文檔地址均來自官方最新版本 6.2。

Elasticsearch 官網文檔地址：Elasticsearch Reference

2.1 字段數據類型

字段數據類型文檔地址：Field datatypes

核心類型 Core datatypes

字符串類型

string

text and keyword

text：全文檢索需要分詞的類型。

keyword：精確值。合適分組排序。不進行分詞，只能通過精確值搜索到，支持模糊、精確查詢，支持聚合等。

Elasticsearch 1.x 和 2.x 中是 string 類型，5.x 之后，分解為 text 和 keyword。

數字類型

Numeric datatypes

long, integer, short, byte, double, float, half_float, scaled_float

日期類型

Date datatype

date

JSON 中沒有日期類型，所以在 ELasticsearch 中，日期類型可以是以下幾種：

日期格式的字符串：e.g. "2015-01-01" or "2015/01/01 12:10:30".

long類型的毫秒數( milliseconds-since-the-epoch)

integer的秒數(seconds-since-the-epoch)

日期格式可以自定義，如果沒有自定義，默認格式如下：

"strict_date_optional_time||epoch_millis"

布爾類型

Boolean datatype

boolean

true 和 false

二進制類型

Binary datatype

binary

范圍類型

Range datatypes

integer_range, float_range, long_range, double_range, date_range

復雜數據類型 Complex datatypes

數組類型

Array datatype

Array support does not require a dedicated type

數組支持不需要專用類型

對象類型

Object datatype

object for single JSON objects

單個JSON對象的對象

嵌套類型

Nested datatype

nested for arrays of JSON objects

嵌套用於JSON對象數組

地理數據類型 Geo datatypes

地理坐標點類型

Geo-point datatype

geo_point for lat/lon points

用於經緯度坐標點

地理形狀類型

Geo-Shape datatype

geo_shape for complex shapes like polygons

用於復雜的形狀，比如多邊形

專業數據類型 specialised datatypes

IP 地址數據類型

IP datatype

ip for IPv4 and IPv6 addresses

完成數據類型

Completion datatype

completion to provide auto-complete suggestions

completion 提供自動補全建議。

令牌計數數據類型

Token count datatype

token_count to count the number of tokens in a string

murmur3 插件類型

mapper-murmur3

murmur3 to compute hashes of values at index-time and store them in the index

通過插件，可以通過 murmur3 來計算 index 的 hash 值。

過濾器類型

Percolator type

Accepts queries from the query-dsl

連接數據類型

join datatype

Defines parent/child relation for documents within the same index

多字段類型

It is often useful to index the same field in different ways for different purposes. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a text field with the standard analyzer, the english analyzer, and the french analyzer.

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

2.2 元字段

元字段是映射中描述文檔本身的字段，從大的分類上來看，主要有文檔屬性的元字段、源文檔的元字段、索引的元字段、路由的元字段和自定義元字段。

元字段文檔地址：Meta-Fields

元字段用於定制文檔的相關元數據。元字段的示例包括文檔的_index，_type，_id 和 _source 字段。

文檔屬性的元字段 identity_meta_fields

_index

The index to which the document belongs.

索引標識。

_uid

A composite field consisting of the _type and the _id.

由_type和_id組成的復合字段。

_type

The document’s mapping type.

文檔的類型。

_id

The document’s ID.

文檔的id。

源文檔的元字段 Document source meta-fields

_source

The original JSON representing the body of the document.

文檔的原始 JSON 字符串。

_size

The size of the _source field in bytes, provided by the mapper-size plugin.

_source 字段的大小。

索引的元字段 Indexing meta-fieldsedit

_all

A catch-all field that indexes the values of all other fields. Disabled by default.

包含索引全部字段的超級字段。

_all 字段是把其他字段拼接在一起的超級字段，所有的字段內容用空格分開，_all 字段會被解析和索引，但是不存儲。

_field_names

All fields in the document which contain non-null values.

文檔中包含非空值的所有字段。

路由元字段 Routing meta-fieldedit

_routing

A custom routing value which routes a document to a particular shard.

將文檔路由到特定分片的自定義路由值。

其他元字段 Other meta-fieldedit

_meta

Application specific metadata.

應用程序特定的元字段，通常用於自定義元字段。

2.3 映射參數

Elasticsearch 提供了足夠多的映射參數對字段的映射進行參數設置，一些常用功能的實現，比如字段的分詞器，字段的權重、日期格式、檢索模型的選擇等都是通過映射參數來配置完成的。

映射參數文檔地址：Mapping parameters

以映射參數 analyzer 為例，在創建索引時指定分詞器，如下：

PUT forum
{
    "mappings": {
        "article": {
            "properties": {
                "id": {
                    "type": "text"
                },
                "title": {
                    "type": "text"
                },
                "postdate": {
                    "type": "date"
                },
                "content": {
                    "type": "text",
                    "analyzer": "ik_max_word"
                }
            }
        }
    }
}

analyzer 指定文本字段的分詞器，對索引和分詞都有效，默認使用標准分詞器，可以指定第三方分詞器，比如 IK 分詞器，如 ik_smart 將使用智能分詞，屬於粗粒度分詞，ik_max_word 是最細粒度分詞。

以映射參數 index 為例，index 屬性指定字段是否參與索引，不索引也就不可搜索，取值可以為 true 或者 false。

Elasticsearch 1.x 和 2.x 之前版本 "index":"not_analyzed"，表示不分詞，在版本 5.x 中提示已經廢棄了 "not_analyzed"，只能是 true 或 false。5.x 中 string 分為兩種類型 keyword，text。如果不想分詞，用 keyword 即可，"the keyword field for not_analyzed exact string values"。

使用如下 API 查詢分詞結果

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "中國人"
}

返回結果如下：

{
  "tokens": [
    {
      "token": "中國人",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "中國",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "國人",
      "start_offset": 1,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "人",
      "start_offset": 2,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 3
    }
  ]
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elastic Stack 筆記（七）Elasticsearch5.6 聚合分析 Elastic Stack 筆記（四）Elasticsearch5.6 索引及文檔管理 Elastic Stack 筆記（八）Elasticsearch5.6 Java API Elastic Stack-Elasticsearch介紹 Elastic Stack-Elasticsearch使用介紹(一) Elastic Stack-Elasticsearch使用介紹(二) Elastic Stack-Elasticsearch使用介紹(四) Elastic Stack-Elasticsearch使用介紹(五) Elastic Stack-Elasticsearch使用介紹(三) elasticsearch-head安裝方法--Elastic Stack之二