方案一

找到狀態為 `red` 的索引

curl -X GET "http://172.xxx.xxx.174:9288/_cat/indices?v="

red    open   index                          5   1    3058268        97588      2.6gb          1.3gb

狀態為 red 是無法對外提供服務的，說明有主節點沒有分配到對應的機子上。

找到 `UNASSIGNED` 節點

_cat/shards 能夠看到節點的分配情況

curl -X GET "http://172.xxx.xxx.174:9288/_cat/shards"

index                            shard prirep state        docs   store   ip             node         
index                      1    p     STARTED     764505 338.6mb 172.xxx.xxx.174 Calypso      
index                      1    r     STARTED     764505 338.6mb 172.xxx.xxx.89  Savage Steel
index                      2    p     STARTED     763750 336.6mb 172.xxx.xxx.174 Calypso      
index                      2    r     STARTED     763750 336.6mb 172.xxx.xxx.88  Temugin      
index                      3    p     STARTED     764537 340.2mb 172.xxx.xxx.89  Savage Steel
index                      3    r     STARTED     764537 340.2mb 172.xxx.xxx.88  Temugin      
index                      4    p     STARTED     765476 339.3mb 172.xxx.xxx.89  Savage Steel
index                      4    r     STARTED     765476 339.3mb 172.xxx.xxx.88  Temugin      
index                      0    p     UNASSIGNED                                             
index                      0    r     UNASSIGNED

index 有一個主節點 0 和一個副本 0 處於 UNASSIGNED 狀態，也就是沒有分配到機子上，因為主節點沒有分配到機子上，所以狀態為 red。
從 ip 列可以看出一共有三台機子，尾數分別為 174，89 以及 88。一共有 10 個 index 所以對應的 elasticsearch 的 index.number_of_shards: 5，index.number_of_replicas: 1。一共有 10 個分片，可以按照 3，3，4 這樣分配到三台不同的機子上。88 和 89 機子都分配多個節點，所以可以將另外一個主節點分配到 174 機子上。

找出機子的 `id`

找到 174 機子對應的 id，后續重新分配主節點得要用到

curl -X GET "http://172.xxx.xxx.174:9288/_nodes/process?v="
{
  "cluster_name": "es2.3.2-titan-cl",
  "nodes": {
    "Leivp0laTYSqvMVm49SulQ": {
      "name": "Calypso",
      "transport_address": "172.xxx.xxx.174:9388",
      "host": "172.xxx.xxx.174",
      "ip": "172.xxx.xxx.174",
      "version": "2.3.2",
      "build": "b9e4a6a",
      "http_address": "172.xxx.xxx.174:9288",
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 32130,
        "mlockall": false
      }
    },
    "EafIS3ByRrm4g-14KmY_wg": {
      "name": "Savage Steel",
      "transport_address": "172.xxx.xxx.89:9388",
      "host": "172.xxx.xxx.89",
      "ip": "172.xxx.xxx.89",
      "version": "2.3.2",
      "build": "b9e4a6a",
      "http_address": "172.xxx.xxx.89:9288",
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 7560,
        "mlockall": false
      }
    },
    "tojQ9EiXS0m6ZP16N7Ug3A": {
      "name": "Temugin",
      "transport_address": "172.xxx.xxx.88:9388",
      "host": "172.xxx.xxx.88",
      "ip": "172.xxx.xxx.88",
      "version": "2.3.2",
      "build": "b9e4a6a",
      "http_address": "172.xxx.xxx.88:9288",
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 47701,
        "mlockall": false
      }
    }
  }
}

174 機子對應的 id 為 Leivp0laTYSqvMVm49SulQ。

為了簡單也可以直接將該主分片放到 master 機子上，但是如果節點過於集中肯定會影響性能，同時會影響宕機后數據丟失的可能性，所以建議根據機子目前節點的分布情況重新分配。

curl -X GET "http://172.xxx.xxx.174:9288/_cat/master?v="
id                     host          ip            node         
EafIS3ByRrm4g-14KmY_wg 172.xxx.xxx.89 172.xxx.xxx.89 Savage Steel

分配 `UNASSIGNED` 節點到機子

得要找到 UNASSIGNED 狀態的主分片才能夠重新分配，如果重新分配不是 UNASSIGNED 狀態的主分片，例如我視圖重新分配 shard 1 會出現如下的錯誤。

curl -X POST -d '{
    "commands" : [ {
      "allocate" : {
          "index" : "index",
          "shard" : 1,
          "node" : "EafIS3ByRrm4g-14KmY_wg",
          "allow_primary" : true
      }
    }]
}' "http://172.xxx.xxx.174:9288/_cluster/reroute"

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[Savage Steel][172.xxx.xxx.89:9388][cluster:admin/reroute]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[allocate] failed to find [index][1] on the list of unassigned shards"
  },
  "status": 400
}

重新分配 index shard 0 到某一台機子。_cluster/reroute 的參數 allow_primary 得要小心，有概率會導致數據丟失。具體的看看官方文檔該接口的說明吧。

curl -X POST -d '{
    "commands" : [ {
      "allocate" : {
          "index" : "index",
          "shard" : 0,
          "node" : "Leivp0laTYSqvMVm49SulQ",
          "allow_primary" : true
      }
    }]
}' "http://172.xxx.xxx.174:9288/_cluster/reroute"

{
  "acknowledged": true,
  .........
  "index": {
    "shards": {
      "0": [
        {
          "state": "INITIALIZING",
          "primary": true,
          "node": "Leivp0laTYSqvMVm49SulQ",
          "relocating_node": null,
          "shard": 0,
          "index": "index",
          "version": 1,
          "allocation_id": {
            "id": "wk5q0CryQpmworGFalfWQQ"
          },
          "unassigned_info": {
            "reason": "INDEX_CREATED",
            "at": "2017-03-23T12:27:33.405Z",
            "details": "force allocation from previous reason INDEX_REOPENED, null"
          }
        },
        {
          "state": "UNASSIGNED",
          "primary": false,
          "node": null,
          "relocating_node": null,
          "shard": 0,
          "index": "index",
          "version": 1,
          "unassigned_info": {
            "reason": "INDEX_REOPENED",
            "at": "2017-03-23T11:56:25.568Z"
          }
        }
      ]
      }
    }
    .............
}

輸出結果只羅列出了關鍵部分，主節點處於 INITIALIZING 狀態，在看看索引的狀態

curl -X GET "http://172.xxx.xxx.174:9288/_cat/indices?v="

green  open   index                          5   1    3058268        97588      2.6gb          1.3gb

索引狀態已經為 green，恢復正常使用。

以上參考 ELASTICSEARCH幾個問題的解決

方案二

導致集群變red，很可能是因為集群中有機子宕機了，其中一部分數據沒有同步完成，因此將之前宕機的機子起來，和現有集群同步完成，集群也就恢復了。
另外也可以找一台空的機子，與現有的機子組成集群，索引會自動平衡，如果集群沒有數據丟失，也是可以將集群恢復正常。

歡迎轉載，但請注明本文鏈接，謝謝你。
2017.3.24 12:15

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [ElasticSearch] ES集群狀態由非正常狀態(red)恢復為正常狀態(green)的思路與實踐 elasticsearch 索引備份恢復 elasticsearch進入red狀態，修復方法 elk6.2.3集群重啟和red狀態恢復 Elasticsearch-06-索引恢復與ILM Elasticsearch集群狀態健康值處於red狀態問題分析與解決（圖文詳解） ELASTICSEARCH健康red的解決 Elasticsearch 集群和索引健康狀態及常見錯誤說明 ElasticSearch---查詢es集群狀態、分片、索引 Elasticsearch 關鍵字：索引，類型，字段，索引狀態，mapping，文檔

elasticsearch 索引 red 狀態恢復 green

方案一

找到狀態為 red 的索引

找到 UNASSIGNED 節點

找出機子的 id

分配 UNASSIGNED 節點到機子

方案二

免責聲明！

找到狀態為 `red` 的索引

找到 `UNASSIGNED` 節點

找出機子的 `id`

分配 `UNASSIGNED` 節點到機子