elasticsearch使用bulk實現批量操作

本文轉載自查看原文 2019-02-21 13:50 6034 Elasticsearch

1、批量查詢

Multi Get 批量獲取

Multi Get API可以通過索引名、類型名、文檔id一次得到一個文檔集合，文檔可以來自同一個索引庫，也可以來自不同的索引庫。

GET /_mget
{
  "docs":[
     {
        "_index": "lib",
        "_type": "user",
        "_id": "1"
     },
     {
        "_index": "lib",
        "_type": "user",
        "_id": "2"
     },
     {
        "_index": "lib",
        "_type": "user",
        "_id": "3"
     }
  ]
}

//可以指定具體的字段
GET /_mget
{
  "docs":[
     {
        "_index": "lib",
        "_type": "user",
        "_id": "1",
        "_source": "interests"
     },
     {
        "_index": "lib",
        "_type": "user",
        "_id": "2",
        "_source": {"age","interests"}
     }
  ]
}

//獲取同索引同類型下的不同文檔
GET /lib/user/_mget
{
  "docs":[
     {
        "_id": "1"
     },
     {
        "_type": "user", //若是指定索引和類型必須和請求頭上的保持一致，否者將會報錯。
        "_id": "2"
     }
  ]
}
//也可以使用下面這種更為簡化的寫法
GET /lib/user/_mget
{
  "ids":["1","2"]
}

Bulk 批量操作

（1）　比如，我這里，在$ES_HOME里，新建一文件，命名為requests。（這里為什么命名為request，去看官網就是）在Linux里，有無后綴沒區別。

[hadoop@djt002 elasticsearch-2.4.3]$ vi requests
 
{"index":{"_index":"my_store","_type":"my_index","_id":"11"}}
{"price":10,"productID":"1111"}
{"index":{"_index":"my_store","_type":"my_index","_id":"12"}}
{"price":20,"productID":"1112"}
{"index":{"_index":"my_store","_type":"my_index","_id":"13"}}
{"price":30,"productID":"1113"}
{"index":{"_index":"my_store","_type":"my_index","_id":"14"}}
{"price":40,"productID":"1114"}

（2）執行命令

　curl  -PUT  '192.168.80.200:9200/_bulk'   --data-binary @requests;

或

　  curl  -XPOST  '192.168.80.200:9200/_bulk'   --data-binary @requests;

bulk的格式：
{action:{metadata}}\n
{requstbody}\n (請求體)

action：(行為)，包含create（文檔不存在時創建）、update（更新文檔）、index（創建新文檔或替換已用文檔）、delete（刪除一個文檔）。
create和index的區別：如果數據存在，使用create操作失敗，會提示文檔已存在，使用index則可以成功執行。
metadata：(行為操作的具體索引信息)，需要指明數據的_index、_type、_id。
示例：

{"delete":{"_index":"lib","_type":"user","_id":"1"}}

批量添加

POST /lib2/books/_bulk
{"index":{"_id":1}}  \\行為：索引信息
{"title":"Java","price","55"} \\請求體
{"index":{"_id":2}}
{"title":"Html5","price","45"}
{"index":{"_id":3}}
{"title":"Php","price","35"}`
{"index":{"_id":4}}
{"title":"Python","price","50"}

//返回結果
{
  "took": 60,
  "error": false //請求是否出錯，返回false、具體的錯誤
  "items": [
     //操作過的文檔的具體信息
     {
        "index":{
           "_index": "lib",
           "_type": "user",
           "_id": "1",
           "_version": 1,
           "result": "created", //返回請求結果
           "_shards": {
              "total": 1,
              "successful": 1,
              "failed": 0
           },
           "_seq_no": 0,
           "_primary_trem": 1
           "status": 200
        }
    }, 
    ... 
  ]
}

批量刪除
刪除的批量操作不需要請求體

POST /lib/books/_bulk
{"delete":{"_index":"lib","_type":"books","_id":"4"}} //刪除的批量操作不需要請求體
{"create":{"_index":"tt","_type":"ttt","_id":"100"}}
{"name":"lisi"} //請求體
{"index":{"_index":"tt","_type":"ttt"}} //沒有指定_id，elasticsearch將會自動生成_id
{"name":"zhaosi"} //請求體
{"update":{"_index":"lib","_type":"books","_id":"4"}} //更新動作不能缺失_id，文檔不存在更新將會失敗
{"doc":{"price":58}} //請求體

bluk一次最大處理多少數據量
bulk會將要處理的數據載入內存中，所以數據量是有限的，最佳的數據兩不是一個確定的數據，它取決於你的硬件，你的文檔大小以及復雜性，你的索引以及搜索的負載。

一般建議是1000-5000個文檔，大小建議是5-15MB，默認不能超過100M，可以在es的配置文件（即$ES_HOME下的config下的elasticsearch.yml）中，bulk的線程池配置是內核數+1。

bulk批量操作的json格式解析
bulk的格式：
{action:{metadata}}\n
{requstbody}\n (請求體)

不用將其轉換為json對象，直接按照換行符切割json，內存中不需要json文本的拷貝。
對每兩個一組的json，讀取meta，進行document路由。
直接將對應的json發送到node上。
為什么不使用如下格式：

[{"action":{},"data":{}}]
1
這種方式可讀性好，但是內部處理就麻煩；耗費更多內存，增加java虛擬機開銷：

將json數組解析為JSONArray對象，在內存中就需要有一份json文本的拷貝，寧外好友一個JSONArray對象。
解析json數組里的每個json，對每個請求中的document進行路由。
為路由到同一個shard上的多個請求，創建一個請求數組。
將這個請求數組序列化。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 elasticsearch使用bulk實現批量操作（09）ElasticSearch 使用Bulk實現批量操作 elasticsearch批量操作 elasticsearch6 學習之批量操作 Elasticsearch 使用bulk批量導入數據 Elasticsearch —— bulk批量導入數據 MyBatis批量操作 Hibernate批量操作（一） MyBatis批量操作 redis 批量操作