ES 26 - Elasticsearch如何部分更新文檔 (partial update的使用)

本文轉載自查看原文 2019-02-14 19:06 2828 Elasticsearch/ partial update/ 增量修改/ 局部修改/ 並發控制/ 06 - Elasticsearch

1 什么是partial update
2 通過腳本進行partial update操作
3 partial update的並發控制策略
- 3.1 控制方式
- 3.2 retry原理
版權聲明

1 什么是partial update

1.1 全量修改文檔的原理

全量修改文檔的語法: PUT index/type/1, 如果id=1的文檔不存在, 則創建, 如果存在, 將發生替換原有文檔的操作.

全量替換文檔的性能比較低, 為了避免替換操作的發生, 引入partial update: 只修改指定的field, 不用全量修改數據.

1.2 修改指定field的思路

(1) 根據用戶請求, 獲得要修改的文檔;

(2) 在內存中封裝用戶提交的新文檔, 發送PUT請求到ES內部;

(3) 將要替換的舊文檔標記為deleted;

(4) 最后將封裝好的新文檔存入索引中.

1.3 partial update的優勢

(1) 所有的查詢、修改和寫回操作, 都在同一個shard中進行, 避免了網絡傳輸的開銷.

不需要: 從特定shard查詢文檔 -> 返回到內存 -> 內存中修改 -> 將修改的文檔發送到原來的shard -> 寫索引 —— 這個復雜的操作, 顯著提升了性能.

(2) 減少了查詢和修改的時間間隔, 可以有效減少並發沖突.

1.4 partial update的使用

使用方法: 通過_update關鍵字實現增量更新:

// 添加測試數據: 
PUT employee/developer/1
{
    "name": "shou feng", 
    "sex": "male",
    "age": 20
}

// partial update修改指定field: 
POST employee/developer/1/_update
{
    "doc": {
        "age": 21
    }
}

// 響應結果: 
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

// 查看文檔, 發現age已經從20變為21了. 
GET employee/developer/1

如果不使用_update, 則會直接覆蓋掉源文檔, 導致原文檔丟失部分數據:

// 不使用_update:  
POST employee/developer/1
{
    "doc": {
        "age": 22
    }
}

// 再次查看, 發現id=1的該文檔就只剩一個age字段了: 
GET employee/developer/1

2 通過腳本進行partial update操作

ES提供了腳本支持 —— 可以通過Groovy外置腳本(已過時)、內置painless腳本實現各種復雜操作.

2.1 內置painless腳本修改文檔

插入文檔:

PUT employee/developer/1
{
    "name": "shou feng", 
    "age": 20,
    "salary": 10000
}

執行腳本: —— 這里使用的是更輕快簡短的painless腳本, 就是直接由字符串表示的腳本:

POST employee/developer/1/_update    // 發送POST請求, 執行partial update
{
    "script": "ctx._source.salary+=500"    // 為salary自增500
}

查看修改結果:

GET employee/developer/1

// 結果如下: 
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 5,
    "found": true,
    "_source": {
        "name": "shou feng",
        "age": 20,
        "salary": 10500			// 自增500成功
    }
}

2.2 外置Groovy腳本修改文檔

說明: 在ES 6.x版本之后, groovy腳本不再支持, 這里演示所用的是ES 5.6.10版本, 如果在6.x版本中使用, 將會拋出如下異常:
"type": "illegal_argument_exception",
"reason": "script_lang not supported [groovy]"

將腳本文件存放在${ES_HOME}/config/scripts下, 文件名為xxx.groovy, 內容為:

ctx._source.salary+=bonus —— 增加值為將近bonus的值, 腳本信息示例如下:

[root@localhost scripts]# pwd
/data/elk-5.6.10/es-node/config/scripts
[root@localhost scripts]# cat change_salary.groovy 
ctx._source.salary+=bonus
[root@localhost scripts]#

修改文檔:

POST employee/developer/1/_update
{
    "script": {
        "lang": "groovy", 
        "file": "change_salary",
        "params": {
            "bonus": 500
        }
    }
}

// 響應結果為: 
#! Deprecation: [groovy] scripts are deprecated, use [painless] scripts instead
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 6,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

查看修改結果:

GET employee/developer/1
// 結果如下: 
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 6,
    "found": true,
    "_source": {
        "name": "shou feng",
        "age": 20,
        "salary": 9000
    }
}

說明:
在執行外置Groovy腳本時, ES提示Groovy腳本已經過時, 建議我們使用painless —— 更輕快的表達方式, 即類似於ctx._source.salary+=bonus的簡短表達方式.
Elasticsearch 5.6開始, 默認腳本使用的方式就已經是painless了. 關於腳本的詳細使用, 請查看博文: ES 27 - Elasticsearch的painless腳本使用實踐.

2.3 內置painless腳本upsert文檔

(先刪除id=1的文檔: DELETE employee/developer/1) 假設我們並不知道id=1的文檔已經被刪除了, 現在為其添加"level": 1的內容:
```
POST employee/developer/1/_update
{
    "doc": {
        "level": 1
    }
}
```

拋出 [404 - 文檔丟失] 的錯誤:

{
    "error": {
        "root_cause": [
            {
                "type": "document_missing_exception",
                "reason": "[developer][1]: document missing",
                "index_uuid": "rT6tChP2QISaVd2OzdCEMA",
                "shard": "3",
                "index": "employee"
            }
        ],
        "type": "document_missing_exception",
        "reason": "[developer][1]: document missing",
        "index_uuid": "rT6tChP2QISaVd2OzdCEMA",
        "shard": "3",
        "index": "employee"
    },
    "status": 404
}

修改upsert策略: 如果指定的文檔不存在, 就執行upsert中的初始化操作; 如果存在, 就執行doc或script中的partial update操作:
```
POST employee/developer/1/_update
{
    "script": "ctx.source.level+=1",
    "upsert": {
        "name": "heal",
        "age": 20
    }
}
```
此時發現"result" : "created" —— 新建了文檔.

2.4 外置Groovy腳本delete文檔

說明: 這里演示所用的是ES 5.6.10版本.
腳本路徑: ${ES_HOME}/config/scripts/delete_doc.groovy
腳本內容: ctx.op = ctx._source.age == age ? 'delete': 'none' ctx.op = ctx._source.age == param ? 'delete' : 'none'

使用示例:

POST employee/developer/1/_update
{
    "script": {
        "lang": "groovy", 
        "file": "delete_doc",
        "params": {
            "age": 20	// 如果年齡是20, 則刪除之
        }
    }
}

響應結果:

#! Deprecation: [groovy] scripts are deprecated, use [painless] scripts instead
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 13,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

查看文檔是否被刪除:

GET employee/developer/1
// 響應結果 - 成功刪除: 
{
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "found": false
}

3 partial update的並發控制策略

partial update內部也是通過樂觀鎖進行並發控制的.
關於並發控制, 請參見博文: Elasticsearch的並發控制策略.

3.1 控制方式

POST index/type/id/_update?retry_on_conflict=5
POST index/type/id/_update?retry_on_conflict=5&version=5

3.2 retry原理

retry_on_conflict: 發生沖突后的重試次數.

(1) 客戶端A、B幾乎同時獲取同一個文檔, 一並獲得_version版本信息, 假設此時_version=1;

(2) 客戶端A修改文檔中的部分內容, 將修改寫入索引;

(3) Elasticsearch在寫入索引時, 檢查客戶端A提交的文檔的版本信息(這里仍然是1) 和現存的文檔的版本信息(這里也是1), 發現相同后, 執行寫入操作, 並修改版本號_version=2;

(4) 客戶端B也修改文檔中的部分內容, 其操作寫回索引的速度稍慢. 此時同樣執行過程(3): ES發現客戶端B提交的文檔的版本為1, 而現存文檔的版本為2 ===> 發生沖突, 此次partial update將失敗;

(5) partial update操作失敗后, 將重復(1) - (3) 過程, 重復的次數, 就是retry_on_conflict參數的值.

版權聲明

作者: 馬瘦風(https://healchow.com)

出處: 博客園馬瘦風的博客(https://www.cnblogs.com/shoufeng)

感謝閱讀, 如果文章有幫助或啟發到你, 點個[好文要頂👆] 或 [推薦👍] 吧😜

本文版權歸博主所有, 歡迎轉載, 但 [必須在文章頁面明顯位置標明原文鏈接], 否則博主保留追究相關人員法律責任的權利.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch：使用_update_by_query更新文檔 Elasticsearch 更新文檔 ElasticSearch批量更新文檔 ES使用C#添加和更新文檔 mongoDB 方法 -- 更新文檔 update Elasticsearch-Java API操作（一）API基本操作（10）【更新文檔數據（update）】 ElasticSearch入門第四篇：使用C#添加和更新文檔 elasticsearch ES使用文檔 ES給已有索引增加新文檔，更新字段值 elasticsearch _update api 更新部分字段內容