Elastic Search快速上手（2）：將數據存入ES

本文轉載自查看原文 2017-08-17 11:26 18179 elastic-search

前言

在上手使用前，需要先了解一些基本的概念。

推薦
可以到 https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html 閱讀《Elastic Search 權威指南》，有非常詳細和全面的說明。

ES中的一些概念

index（索引）

相當於mysql中的數據庫

type（類型）

相當於mysql中的一張表

document（文檔）

相當於mysql中的一行（一條記錄）

field（域）

相當於mysql中的一列（一個字段）

節點

一個服務器，由一個名字來標識

集群

一個或多個節點組織在一起

分片

將一份數據划分為多小份的能力，允許水平分割和擴展容量。多個分片可以響應請求，提高性能和吞吐量。

副本

復制數據，一個節點出問題時，其余節點可以頂上。

倒排索引

可參考https://www.elastic.co/guide/cn/elasticsearch/guide/current/inverted-index.html。

索引&類型

對索引的基本操作

創建索引

通過以下命令可創建一個索引：

PUT job
{
  "settings":{
    "index":{
      "number_of_shards":5,
      "number_of_replicas":1
    }
  }
}

{
  "acknowledged": true,
  "shards_acknowledged": true
}

Elasticsearch 是利用分片將數據分發到集群內各處的。分片是數據的容器，文檔保存在分片內，分片又被分配到集群內的各個節點里。
當你的集群規模擴大或者縮小時， Elasticsearch 會自動的在各節點中遷移分片，使得數據仍然均勻分布在集群里。

一個分片可以是主分片或者副本分片。索引內任意一個文檔都歸屬於一個主分片，所以主分片的數目決定着索引能夠保存的最大數據量。

一個副本分片只是一個主分片的拷貝。副本分片作為硬件故障時保護數據不丟失的冗余備份，並為搜索和返回文檔等讀操作提供服務。

在上面例子中，主分片為5，副本分片為1.

查看索引的信息

GET job

查看job這個索引的信息：

{
  "job": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "creation_date": "1502342603160",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "LGalsb3eRKeGb5SbWCxO8w",
        "version": {
          "created": "5010199"
        },
        "provided_name": "job"
      }
    }
  }
}

可以只查看某一項信息：

GET job/_settings

可以查看job這個索引的settings信息：

{
  "job": {
    "settings": {
      "index": {
        "creation_date": "1502342603160",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "LGalsb3eRKeGb5SbWCxO8w",
        "version": {
          "created": "5010199"
        },
        "provided_name": "job"
      }
    }
  }
}

修改索引信息

例如，將副本分片數量修改為2：

PUT job/_settings
{
  "number_of_replicas":2
}

映射

在創建索引時，我們可以預先設定映射，規定好各個字段及其數據類型，便於es更好地進行管理。比如說，以文章庫為例，一篇文章的關鍵詞字段應當作為完整的詞語，而文章的正文字段必須通過中文分詞器進行分詞。

通過設置映射mapping，可以告知es這些字段的規則。

更詳細文檔參見：https://www.elastic.co/guide/cn/elasticsearch/guide/current/mapping-intro.html

數據類型

Elasticsearch支持如下類型：

字符串: text, keyword（注：5之前的版本里有string類型，5之后不再支持此類型）
數字: byte, short, integer, long, float, double
布爾型:boolean
日期: date
復雜類型：如object, nested等

查看映射

輸入

GET job/_mapping

可以查看job索引下的所有映射。

默認映射

在創建索引存入數據時，如果不指定類型，es會自動根據實際數據為其添加類型。
例如，通過下面的語句插入文檔：

PUT job/type1/1
{
  "title":"abc",
  "words":123,
  "date":"2017-01-01",
  "isok":true
}

然后查看映射，結果為：

{
  "job": {
    "mappings": {
      "type1": {
        "properties": {
          "date": {
            "type": "date"
          },
          "isok": {
            "type": "boolean"
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "words": {
            "type": "long"
          }
        }
      }
    }
  }
}

可見，es自動根據類型對字段進行了映射。

設置映射

在創建索引時，可以設置映射規則，具體格式形如上面查看映射時的返回結果。

PUT job
{
  "mappings":{
    "type2":{
      "properties":{
        "title":{
          "type":"keyword"
        },
        "salary":{
          "type":"integer"
        },
        "desc":{
          "type":"text",
          "analyzer": "ik_max_word"
        },
        "date":{
          "type":"date",
          "format":"yyyy-MM-dd"
        }
      }
    }
  }
}

注意，在上面為desc字段指定了analyzer，就是一個自定義分詞器。在es-rtf中，默認給安裝了ik_smart和ik_max_word兩個分詞器，區別在於后者會分出更多的詞。
為text類型的字段會被進行分詞，然后索引，而keyword字段不會被分詞。

自動轉換

創建索引和映射后，插入文檔時，字段會自動轉換成映射中規定的類型。比如，插入"123"到integer字段，會自動嘗試對字符串進行類型轉換。如果無法轉換，則會報錯，無法插入。

文檔

一個“文檔”即所謂的一條記錄。可對文檔進行增刪改操作。

插入文檔

可以指定文檔id，即 PUT index_name/type_name/id。

PUT job/type2/1
{
  "title":"Python工程師",
  "salary":1000,
  "desc":"1. 參與devops相關系統開發，包括雲資源管理平台，cmdb平台、資源申請流程、基礎支撐平台開發；2. 參與公司業務系統及自動化運維平台的開發；3. 積累並規范化系統開發的最佳實踐並文檔化；4. 完善並遵守團隊的編碼規范，編寫高質量、結構清晰、易讀、易維護的代碼。",
  "date":"2017-08-08"
}

返回：
{
"_index": "job",
"_type": "type2",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

也可不指定id，則會自動分配id。注意這里要使用POST方式。

POST job/type2/
{
  "title":"Python工程師2",
  "salary":1000,
  "desc":"1. 參與devops相關系統開發，包括雲資源管理平台，cmdb平台、資源申請流程、基礎支撐平台開發；2. 參與公司業務系統及自動化運維平台的開發；3. 積累並規范化系統開發的最佳實踐並文檔化；4. 完善並遵守團隊的編碼規范，編寫高質量、結構清晰、易讀、易維護的代碼。",
  "date":"2017-08-08"
}

查看文檔

只需通過GET方式查看，

GET job/type2/1

返回文檔信息：

{
  "_index": "job",
  "_type": "type2",
  "_id": "1",
  "_version": 3,
  "found": true,
  "_source": {
    "title": "Java",
    "salary": 2000,
    "desc": "易維護的代碼",
    "date": "2017-08-08"
  }
}

可以只查看_source中的部分字段：

GET job/type2/1?_source=title,salary

{
  "_index": "job",
  "_type": "type2",
  "_id": "1",
  "_version": 3,
  "found": true,
  "_source": {
    "title": "Java",
    "salary": 2000
  }
}

修改文檔

一種是通過PUT的全覆蓋方式，舊數據將被刪除，以新的代替。

PUT job/type2/1
{
  "title":"Java",
  "salary":1400,
  "desc":"易維護的代碼",
  "date":"2017-08-08"
}

另一種是通過POST方式，只對部分字段進行修改。

POST job/type2/1/_update
{
  "doc":{
    "salary":2000
  }
}

刪除文檔

通過DELETE方式可刪除文檔：

DELETE job/type2/1

mget取回多個文檔

可參考：https://www.elastic.co/guide/cn/elasticsearch/guide/current/_Retrieving_Multiple_Documents.html
通過將查詢合並，可以減少連接次數，提高效率。

GET _mget
{
   "docs" : [
      {
         "_index" : "job",
         "_type" :  "type2",
         "_id" :    1
      },
      {
         "_index" : "job",
         "_type" :  "type2",
         "_id" :    2,
         "_source": "salary"
      }
   ]
}

返回兩個文檔：

{
  "docs": [
    {
      "_index": "job",
      "_type": "type2",
      "_id": "1",
      "_version": 3,
      "found": true,
      "_source": {
        "title": "Java",
        "salary": 2000,
        "desc": "易維護的代碼",
        "date": "2017-08-08"
      }
    },
    {
      "_index": "job",
      "_type": "type2",
      "_id": "2",
      "found": false
    }
  ]
}

還可進行簡寫，比如，index和type都相同，查找兩個id，可以寫作：

GET job/type2/_mget
{
  "ids":["1", "2"]
    
  }
}

bulk批量操作

bulk API 允許在單個步驟中進行多次 create 、 index 、 update 或 delete 請求。

詳細參考：https://www.elastic.co/guide/cn/elasticsearch/guide/current/bulk.html

bulk批量操作的請求比較特殊，格式為：

{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n ...

一般兩行為一條請求，第一行說明操作和元數據，第二行是操作數據。不過delete請求只有一行。

POST _bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title":    "My first blog post" }
{ "index":  { "_index": "website", "_type": "blog" }}
{ "title":    "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

返回結果會列出每個請求的處理狀態。

{
   "took": 4,
   "errors": false, 
   "items": [
      {  "delete": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 2,
            "status":   200,
            "found":    true
      }},
      {  "create": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 3,
            "status":   201
      }},
      {  "create": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "EiwfApScQiiy7TIKFxRCTw",
            "_version": 1,
            "status":   201
      }},
      {  "update": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 4,
            "status":   200
      }}
   ]
}

通過以上操作，可以將數據以一定的組織方式，寫入到es中。下一篇將總結如何進行搜索和查找。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elastic App Search 快速構建 ES 應用 Nest快速上手 [如何快速上手對拍] 如何快速上手LayUI uniapp快速上手快速上手Vue AutoMapper快速上手 pigx快速上手 uView——快速上手 ElasticJob 快速上手