ETCD節點故障恢復

本文轉載自查看原文 2018-01-15 00:01 1395 ETCD/ etcd

我在微服務組里面主要負責配置中心的構建，我們的配置中心使用到了ETCD。在我們的內網環境中搭建了三個節點的ETCD，不過這三個節點的ETCD都搭建在同一台機器上。后來機器資源不夠了系統直接kill了ETCD，導致內網的ETCD三個節點全部掛掉了。剛開始想逐個啟動就完事了，但是按照之前的data-dir啟動之后發現三個節點握手存在問題，原因是三個節點緩存數據的data目錄里面都有節點以及數據信息，導致握手不成功。網上查了一些資料后發現這應該算是一次故障的恢復，解決方案是先以掛掉之前的一個節點為基礎啟動只有一個節點的集群，然后往這個集群中添加新節點等待數據同步。事實上，ETCD分開部署掛半數以下還是可以正常訪問的，這次全掛了相當於從備份數據中恢復。

下面是執行恢復的具體命令：

etcd --data-dir=data.etcd2 --name machine-2 --initial-advertise-peer-urls http://127.0.0.1:12380 --listen-peer-urls http://127.0.0.1:12380 --advertise-client-urls http://10.1.45.52:12379 --listen-client-urls http://10.1.41.52:12379,http://127.0.0.1:2379 --initial-cluster machine-2=http://127.0.0.1:12380 --initial-cluster-token token-token --initial-cluster-state new --force-new-cluster >> /var/log/etcd/machine-2 2>&1 &

rm -rf data.etcd3

etcdctl member list
etcdctl member add machine-3 http://127.0.0.1:22380

etcd --data-dir=data.etcd3 --name machine-3 --initial-advertise-peer-urls http://127.0.0.1:22380 --listen-peer-urls http://127.0.0.1:22380 --advertise-client-urls http://10.1.41.52:22379 --listen-client-urls http://10.1.41.52:22379 --initial-cluster machine-2=http://127.0.0.1:12380,machine-3=http://127.0.0.1:22380 --initial-cluster-state existing --initial-cluster-token token-token >> /var/log/etcd/machine-3 2>&1 &

rm -rf data.etcd1

etcdctl member list

etcdctl member add machine-1 http://127.0.0.1:2380

etcd --data-dir=data.etcd1 --name machine-1 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls http://10.1.41.52:2379 --listen-client-urls http://10.1.41.52:2379 --initial-cluster machine-2=http://127.0.0.1:12380,machine-3=http://127.0.0.1:22380,machine-1=http://127.0.0.1:2380 --initial-cluster-state existing --initial-cluster-token token-token >> /var/log/etcd/machine-1 2>&1 &

總計部署了三個節點，節點緩存數據的目錄分別是data.etcd1、data.etcd2、data.etcd3。我是刪除 data.etcd1、data.etcd3目錄保留data.etcd2目錄，然后用data.etcd2為基礎進行恢復的。

當時着急恢復就沒有記錄報錯信息，先記錄一下我恢復一下現場把詳細的報錯信息補充上

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 MongoDB集群節點RECOVERING故障恢復 Ceph osd故障恢復 PostgreSQL 一主兩備節點（兩備節點為同步節點）故障恢復 ETCD集群大於n/2個節點故障之后如何恢復 MongoDB仲裁節點的理解以及memcached,zookeeper,redis,故障恢復方案思考． rabbitMQ故障恢復的順序（重要）記一次mysql故障恢復 mysql MHA高可用故障恢復 Flink 1.9 故障恢復策略(failover) MariaDB Galera Cluster集群故障恢復