官方文檔:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
1、添加文檔
1.1、指定文檔ID
PUT blog/_doc/1 { "title":"1、VMware Workstation虛擬機軟件安裝圖解", "author":"chengyuqiang", "content":"1、VMware Workstation虛擬機軟件安裝圖解...", "url":"http://x.co/6nc81" }
Elasticsearch服務會返回一個JSON格式的響應。
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 2 }
響應結果說明:
- _index:文檔所在的索引名
- _type:文檔所在的類型名
- _id:文檔ID
- _version:文檔的版本
- result:created已經創建
- _shards: _shards表示索引操作的復制過程的信息。
- total:指示應在其上執行索引操作的分片副本(主分片和副本分片)的數量。
- successful:表示索引操作成功的分片副本數。
- failed:在副本分片上索引操作失敗的情況下包含復制相關錯誤。
1.2、不指定文檔ID
添加文檔時可以不指定文檔id,則文檔id是自動生成的字符串。注意,需要使用POST方法,而不是PUT方法。
POST blog/_doc { "title":"2、Linux服務器安裝圖解", "author":"chengyuqiang", "content":"2、Linux服務器安裝圖解解...", "url":"http://x.co/6nc82" }
{ "_index" : "blog", "_type" : "_doc", "_id" : "5P2-O2gBNSQY7o-KMw2P", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
2、獲取文檔
2.1、通過文檔id獲取指定的文檔
GET blog/_doc/1
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "title" : "1、VMware Workstation虛擬機軟件安裝圖解", "author" : "chengyuqiang", "content" : "1、VMware Workstation虛擬機軟件安裝圖解...", "url" : "http://x.co/6nc81" } }
響應結果說明:
- found值為true,表明查詢到該文檔
- _source字段是文檔的內容
2.2、文檔不存在的情況
GET blog/_doc/2
{ "_index" : "blog", "_type" : "_doc", "_id" : "2", "found" : false }
found字段值為false表明查詢的文檔不存在。
2.3、判定文檔是否存在
HEAD blog/_doc/1
200 - OK
3、更新文檔
3.1、更改id為1的文檔,刪除了author,修改content字段。
PUT blog/_doc/1 { "title":"1、VMware Workstation虛擬機軟件安裝圖解", "content":"下載得到VMware-workstation-full-15.0.2-10952284.exe可執行文件...", "url":"http://x.co/6nc81" }
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
_version更新為2
查看該文檔
GET blog/_doc/1
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 2, "found" : true, "_source" : { "title" : "1、VMware Workstation虛擬機軟件安裝圖解", "content" : "下載得到VMware-workstation-full-15.0.2-10952284.exe可執行文件...", "url" : "http://x.co/6nc81" } }
3.2、添加文檔時,防止覆蓋已存在的文檔,可以通過_create加以限制。
PUT blog/_doc/1/_create { "title":"1、VMware Workstation虛擬機軟件安裝圖解", "content":"下載得到VMware-workstation-full-15.0.2-10952284.exe可執行文件...", "url":"http://x.co/6nc81" }
該文檔已經存在,添加失敗。
{ "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[_doc][1]: version conflict, document already exists (current version [2])", "index_uuid": "GqC2fSqPS06GRfTLmh1TLg", "shard": "1", "index": "blog" } ], "type": "version_conflict_engine_exception", "reason": "[_doc][1]: version conflict, document already exists (current version [2])", "index_uuid": "GqC2fSqPS06GRfTLmh1TLg", "shard": "1", "index": "blog" }, "status": 409 }
3.3、更新文檔的字段
通過腳本更新制定字段,其中ctx是腳本語言中的一個執行對象,先獲取_source,再修改content字段
POST blog/_doc/1/_update { "script": { "source": "ctx._source.content=\"從官網下載VMware-workstation,雙擊可執行文件進行安裝...\"" } }
響應結果如下:
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 3, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1 }
再次獲取文檔GET blog/_doc/1
,響應結果如下
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 3, "found" : true, "_source" : { "title" : "1、VMware Workstation虛擬機軟件安裝圖解", "content" : "從官網下載VMware-workstation,雙擊可執行文件進行安裝...", "url" : "http://x.co/6nc81" } }
3.4、添加字段
POST blog/_doc/1/_update { "script": { "source": "ctx._source.author=\"chengyuqiang\"" } }
再次獲取文檔GET blog/_doc/1
,響應結果如下
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 4, "found" : true, "_source" : { "title" : "1、VMware Workstation虛擬機軟件安裝圖解", "content" : "從官網下載VMware-workstation,雙擊可執行文件進行安裝...", "url" : "http://x.co/6nc81", "author" : "chengyuqiang" } }
3.5、刪除字段
POST blog/_doc/1/_update { "script": { "source": "ctx._source.remove(\"url\")" } }
再次獲取文檔GET blog/_doc/1
,響應結果如下
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 5, "found" : true, "_source" : { "title" : "1、VMware Workstation虛擬機軟件安裝圖解", "content" : "從官網下載VMware-workstation,雙擊可執行文件進行安裝...", "author" : "chengyuqiang" } }
4、刪除文檔
DELETE blog/_doc/1
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 6, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 6, "_primary_term" : 1 }
再次判定該文檔是否存在,執行HEAD blog/_doc/1
,響應結果404 - Not Found
5、批量操作
如果文檔數量非常龐大,商業運維中都是海量數據,一個一個操作文檔顯然不合實際。幸運的是ElasticSearch提供了文檔的批量操作機制。我們已經知道mget允許一次性檢索多個文檔,ElasticSearch提供了Bulk API,可以執行批量索引、批量刪除、批量更新等操作,也就是說Bulk API允許使用在單個步驟中進行多次 create 、 index 、 update 或 delete 請求。
bulk 與其他的請求體格式稍有不同,bulk請求格式如下:
{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n
...
這種格式類似一個有效的單行 JSON 文檔 流 ,它通過換行符(\n)連接到一起。注意兩個要點:
- 每行一定要以換行符(\n)結尾, 包括最后一行 。這些換行符被用作一個標記,可以有效分隔行。
- 這些行不能包含未轉義的換行符,因為他們將會對解析造成干擾。這意味着這個 JSON 不 能使用 pretty 參數打印。
- action/metadata 行指定 哪一個文檔 做 什么操作 。metadata 應該 指定被索引、創建、更新或者刪除的文檔的 _index 、 _type 和 _id 。
- request body 行由文檔的 _source 本身組成–文檔包含的字段和值。它是 index 和 create 操作所必需的。
5.1、批量導入
POST /_bulk { "create": { "_index": "blog", "_type": "_doc", "_id": "1" }} { "title": "1、VMware Workstation虛擬機軟件安裝圖解" ,"author":"chengyuqiang","content":"官網下載VMware-workstation,雙擊可執行文件進行安裝" , "url":"http://x.co/6nc81" } { "create": { "_index": "blog", "_type": "_doc", "_id": "2" }} { "title": "2、Linux服務器安裝圖解" ,"author": "chengyuqiang" ,"content": "VMware模擬Linux服務器安裝圖解" , "url": "http://x.co/6nc82" } { "create": { "_index": "blog", "_type": "_doc", "_id": "3" }} { "title": "3、Xshell 6 個人版安裝與遠程操作連接服務器" , "author": "chengyuqiang" ,"content": "Xshell 6 個人版安裝與遠程操作連接服務器..." , "url": "http://x.co/6nc84" }
這個 Elasticsearch 響應包含 items 數組, 這個數組的內容是以請求的順序列出來的每個請求的結果。
{ "took" : 132, "errors" : false, "items" : [ { "create" : { "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 7, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 7, "_primary_term" : 1, "status" : 201 } }, { "create" : { "_index" : "blog", "_type" : "_doc", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 8, "_primary_term" : 1, "status" : 201 } }, { "create" : { "_index" : "blog", "_type" : "_doc", "_id" : "3", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } } ] }
5.2、批量操作,包括刪除、更新、新增
POST /_bulk { "delete": { "_index": "blog", "_type": "_doc", "_id": "1" }} { "update": { "_index": "blog", "_type": "_doc", "_id": "3", "retry_on_conflict" : 3} } { "doc" : {"title" : "Xshell教程"} } { "index": { "_index": "blog", "_type": "_doc", "_id": "4" }} { "title": "4、CentOS 7.x基本設置" ,"author":"chengyuqiang","content":"CentOS 7.x基本設置","url":"http://x.co/6nc85" } { "create": { "_index": "blog", "_type": "_doc", "_id": "5" }} { "title": "5、圖解Linux下JDK安裝與環境變量配置","author":"chengyuqiang" ,"content": "圖解JDK安裝配置" , "url": "http://x.co/6nc86" }
在7.0版本中,retry_on_conflict參數取代了之前的_retry_on_conflict
{ "took" : 125, "errors" : false, "items" : [ { "delete" : { "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 1, "status" : 200 } }, { "update" : { "_index" : "blog", "_type" : "_doc", "_id" : "3", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 1, "status" : 200 } }, { "index" : { "_index" : "blog", "_type" : "_doc", "_id" : "4", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1, "status" : 201 } }, { "create" : { "_index" : "blog", "_type" : "_doc", "_id" : "5", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 5, "_primary_term" : 1, "status" : 201 } } ] }
6、批量獲取
GET blog/_doc/_mget { "ids" : ["1", "2","3"] }
id為1的文檔已經刪除,所以沒有搜索到
{ "docs" : [ { "_index" : "blog", "_type" : "_doc", "_id" : "1", "found" : false }, { "_index" : "blog", "_type" : "_doc", "_id" : "2", "_version" : 1, "found" : true, "_source" : { "title" : "2、Linux服務器安裝圖解", "author" : "chengyuqiang", "content" : "VMware模擬Linux服務器安裝圖解", "url" : "http://x.co/6nc82" } }, { "_index" : "blog", "_type" : "_doc", "_id" : "3", "_version" : 2, "found" : true, "_source" : { "title" : "Xshell教程", "author" : "chengyuqiang", "content" : "Xshell 6 個人版安裝與遠程操作連接服務器...", "url" : "http://x.co/6nc84" } } ] }
7、簡單搜索
這里介紹一下簡單的文檔搜索操作,后面章節會詳細介紹。
7.1、詞項查詢, 也稱term查詢
GET blog/_search { "query": { "term": { "title": "centos" } } }
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.71023846, "hits" : [ { "_index" : "blog", "_type" : "_doc", "_id" : "4", "_score" : 0.71023846, "_source" : { "title" : "4、CentOS 7.x基本設置", "author" : "chengyuqiang", "content" : "CentOS 7.x基本設置", "url" : "http://x.co/6nc85" } } ] } }
GET blog/_search { "query": { "term": { "title": "遠程" } } }
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }
GET blog/_search { "query": { "term": { "title": "程" } } }
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.3486402, "hits" : [ { "_index" : "blog", "_type" : "_doc", "_id" : "3", "_score" : 1.3486402, "_source" : { "title" : "Xshell教程", "author" : "chengyuqiang", "content" : "Xshell 6 個人版安裝與遠程操作連接服務器...", "url" : "http://x.co/6nc84" } } ] } }
7.2、匹配查詢,也稱match查詢
與term精確查詢不同,對於match查詢,只要被查詢字段中存在任何一個詞項被匹配,就會搜索到該文檔。
GET blog/_search { "query": { "match": { "title": { "query": "遠程" } } } }
{ "took" : 9, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.3486402, "hits" : [ { "_index" : "blog", "_type" : "_doc", "_id" : "3", "_score" : 1.3486402, "_source" : { "title" : "Xshell教程", "author" : "chengyuqiang", "content" : "Xshell 6 個人版安裝與遠程操作連接服務器...", "url" : "http://x.co/6nc84" } } ] } }
8、路由機制
當你索引(動詞,對該文檔建立倒排索引)一個文檔,它被存儲到master節點上的一個主分片上。
Elasticsearch是如何知道文檔屬於哪個分片的呢?當你創建一個新文檔,它是如何知道是應該存儲在分片1還是分片2上的呢?
解答這個問題,我們需要了解Elasticsearch的路由機制。
簡單地說,Elasticsearch將具有相關Hash值的文檔存放到同一個主分片中,分片位置計算算法如下:
shard = hash(routing) % number_of_primary_shards
算法說明:
- routing值是一個字符串,它默認是文檔_id,也可以自定義。這個routing字符串通過哈希函數生成一個數字,然后除以主切片的數量得到一個余數(remainder),余數的范圍是[0 , number_of_primary_shards-1],這個數字就是特定文檔所在的分片。
- 之前我們介紹過,創建索引時需要指定主分片數量,該不能修改。這是因為如果主分片的數量在未來改變了,所有先前的路由值就失效了,文檔也就永遠找不到了。
- 該算法基本可以保證所有文檔在所有分片上平均分布,不會導致數據分布不均(數據傾斜)的情況。
- 默認情況下,routing值是文檔的_id。我們創建文檔時可以指定id的值;如果不指定id時,Elasticsearch將隨機生成文檔的_id值。這將導致在查詢文檔時,Elasticsearch不能確定文檔的位置,需要將請求廣播到所有的分片節點上。
假設我們有一個10個分片的索引。當一個請求在集群上執行時基本過程如下:
- 這個搜索的請求會被發送到一個節點。
- 接收到這個請求的節點,將這個查詢廣播到這個索引的每個分片上(可能是主分片,也可能是復制分片)。
- 每個分片執行這個搜索查詢並返回結果。
- 結果在通道節點上合並、排序並返回給用戶。
了解Elasticsearch的路由機制后,我們可以在創建某一類文檔時指定文檔的路由值,這樣ElasticSearch就知道在處理這一類文檔時,如何定位到正確的分片。比如,把某一特定類型的書籍存儲到特定的分片上去,這樣在搜索這一類書籍的時候就可以避免搜索其它的分片,也就避免了多個分片搜索結果的合並。路由機制向 Elasticsearch提供一種信息來決定哪些分片用於存儲和查詢。同一個路由值將映射到同一個分片。這基本上就是在說:“通過使用用戶提供的路由值,就可以做到定向存儲,定向搜索。
所有的文檔API(GET、INDEX、DELETE、BULK、UPDATE、MGET)都接收一個routing參數,它用來自定義文檔到分片的映射。添加routing參數形式與URL參數形式相同url?參數名=參數值。
PUT blog/_doc/1?routing=haron { "title":"1、VMware安裝", "author":"hadron", "content":"VMware Workstation虛擬機軟件安裝圖解...", "url":"http://x.co/6nc81" }
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 12, "_primary_term" : 1 }
GET blog/_doc/1?routing=hardon
{ "_index" : "blog", "_type" : "_doc", "_id" : "1", "_version" : 1, "_routing" : "hardon", "found" : true, "_source" : { "title" : "1、VMware安裝", "author" : "hadron", "content" : "VMware Workstation虛擬機軟件安裝圖解...", "url" : "http://x.co/6nc81" } }
注意:自定義routing值可以造成數據分布不均的情況。例如用戶hadron的文檔非常多,有數十萬個,而其他大多數用戶的文檔只有數個到數十個,這樣將導致hadron所在的分片較大。
9、版本控制
參考文檔
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/version-control.html
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/optimistic-concurrency-control.html
https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/version-control.html
https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/optimistic-concurrency-control.html
PUT website { "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 1 } } } PUT /website/_doc/1/_create { "title": "My first blog entry", "text": "Just trying this out..." }
GET website/_doc/1
{ "_index" : "website", "_type" : "_doc", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "title" : "My first blog entry", "text" : "Just trying this out..." } }
PUT website/_doc/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..." }
{ "_index" : "website", "_type" : "_doc", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
例如,要創建一個新的具有外部版本號 5 的博客文章,我們可以按以下方法進行:
PUT /website/_doc/2?version=5&version_type=external { "title": "My first external blog entry", "text": "Starting to get the hang of this..." }
在響應中,我們能看到當前的 _version 版本號是 5 :
{ "_index" : "website", "_type" : "_doc", "_id" : "2", "_version" : 5, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1 }
現在我們更新這個文檔,指定一個新的 version 號是 10 :
PUT /website/_doc/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." }
請求成功並將當前 _version 設為 10 :
{ "_index" : "website", "_type" : "_doc", "_id" : "2", "_version" : 10, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 1 }
如果你要重新運行此請求時,它將會失敗,並返回像我們之前看到的同樣的沖突錯誤, 因為指定的外部版本號不大於 Elasticsearch 的當前版本號。
{ "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[_doc][2]: version conflict, current version [10] is higher or equal to the one provided [10]", "index_uuid": "5616aEUkQ7yvQIYUDyLudg", "shard": "0", "index": "website" } ], "type": "version_conflict_engine_exception", "reason": "[_doc][2]: version conflict, current version [10] is higher or equal to the one provided [10]", "index_uuid": "5616aEUkQ7yvQIYUDyLudg", "shard": "0", "index": "website" }, "status": 409 }
10、refresh
10.1、立即刷新,文檔可見
這些將創建一個文檔並立即刷新索引,使其可見:
DELETE test PUT test/_doc/1?refresh {"message": "測試文檔1"} PUT test/_doc/2?refresh=true {"message": "測試文檔2"}
10.2、不刷新
這些將創建一個文檔而不做任何使搜索可見的內容:
PUT test/_doc/3 {"message": "測試文檔3"} PUT test/_doc/4?refresh=false {"message": "測試文檔4"}
10.3、等待刷新可見
PUT test/_doc/5?refresh=wait_for {"message": "測試文檔5"}