ES實時重建索引

本文轉載自查看原文 2021-12-07 17:14 1486 【數據庫】

1、實時重建索引

在實際的生產環境中，一個field的設置是不能被修改的，如果要修改一個Field，那么應該重新按照新的mapping，建立一個index，然后將數據批量查詢出來，重新用bulk api寫入index中。

批量查詢的時候，建議采用scroll api，並且采用多線程並發的方式來reindex數據。例如說每次scoll就查詢指定日期的一段數據，交給一個線程即可。

(1) 一開始，依靠dynamic mapping，插入數據，但是不小心有些數據是2019-09-10這種日期格式的，所以title這種field被自動映射為了date類型，實際上它應該是string類型的。

首先插入以下數據

PUT /my_index/_doc/1 { "title": "2019-09-10" } PUT /my_index/_doc/2 { "title": "2019-09-11" }

（2）當后期向索引中加入string類型的title值的時候，就會報錯

PUT /my_index/_doc/3 { "title": "my first article" }

報錯

{
  "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse field [title] of type [date] in document with id '3'. Preview of field's value: 'my first article'" } ], "type": "mapper_parsing_exception", "reason": "failed to parse field [title] of type [date] in document with id '3'. Preview of field's value: 'my first article'", "caused_by": { "type": "illegal_argument_exception", "reason": "failed to parse date field [my first article] with format [strict_date_optional_time||epoch_millis]", "caused_by": { "type": "date_time_parse_exception", "reason": "Failed to parse with all enclosed parsers" } } }, "status": 400 }

（3）如果此時想修改title的類型，是不可能的

PUT /my_index/_mapping
{
  "properties": { "title": { "type": "text" } } }

報錯

{
  "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "mapper [title] of different type, current_type [date], merged_type [text]" } ], "type": "illegal_argument_exception", "reason": "mapper [title] of different type, current_type [date], merged_type [text]" }, "status": 400 }

（4）此時，唯一的辦法，就是進行reindex，也就是說，重新建立一個索引，將舊索引的數據查詢出來，再導入新索引。

（5）如果說舊索引的名字，是old_index，新索引的名字是new_index，終端java應用，已經在使用old_index在操作了，難道還要去停止java應用，修改使用的index為new_index，才重新啟動java應用嗎？這個過程中，就會導致java應用停機，可用性降低。

（6）所以說，給java應用一個別名，這個別名是指向舊索引的，java應用先用着，java應用先用prod_index來操作，此時實際指向的是舊的my_index

PUT /my_index/_alias/prod_index

（7）查看別名，會發現my_index已經存在一個別名prod_index了。

GET my_index/_alias

（8）新建一個index，調整其title的類型為string

PUT /my_index_new
{
  "mappings": { "properties": { "title": { "type": "text" } } } }

（9）使用scroll api將數據批量查詢出來

GET /my_index/_search?scroll=1m
{
  "query": { "match_all": {} }, "size": 1 }

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAARUMWQWx5bzRmTW9TeUNpNmVvN0E2dF9YQQ==", "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "my_index", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "title" : "2019-09-10" } } ] } }

（9）采用bulk api將scoll查出來的一批數據，批量寫入新索引

POST /_bulk
{"index":{"_index":"my_index_new","_id":"1"}} {"title":"2019-09-10"}

（10）反復循環8~9，查詢一批又一批的數據出來，采取bulk api將每一批數據批量寫入新索引

（11）將my_index索引的別名prod_index切換到my_index_new上去，java應用會直接通過index別名使用新的索引中的數據，java應用程序不需要停機，零提交，高可用

POST /_aliases
{
  "actions": [ { "remove": { "index": "my_index", "alias": "prod_index" } }, { "add": { "index": "my_index_new", "alias": "prod_index" } } ] }

（12）直接通過prod_index別名來查詢，是否ok

GET prod_index/_search

可以看到能夠查詢到新索引my_index_new的數據了

{
  "took" : 1117, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "my_index_new", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "title" : "2019-09-10" } } ] } }

2、總結：

基於alias對client透明切換index

PUT /my_index_v1/_alias/my_index

client對my_index進行操作

reindex操作，完成之后，切換v1到v2

POST /_aliases
{
    "actions": [       { "remove": { "index": "my_index_v1", "alias": "my_index" }},       { "add":   { "index": "my_index_v2", "alias": "my_index" }}   ] }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 es之重建索引 solr定時實時重建索引和增量更新 es 遷移數據, 重建索引 ES重建索引(reindex)性能優化建議「Elasticsearch」ES重建索引怎么才能做到數據無縫遷移呢？ ES數據庫重建索引——Reindex(數據遷移) 索引的重建 SQLServer 重建索引前后對比 elasticsearch重建索引 Oracle索引重建