ETCD小故障處理積累集合


丟失數據文件故障處理:

 

etcdctl ${ep} endpoint health status {"level":"warn","ts":"2021-05-20T13:58:58.712+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-3f460aae-8ddc-4996-a726-a4aae4691573/172.21.130.169:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.21.130.169:2379: connect: connection refused\""} https://172.21.130.168:2379 is healthy: successfully committed proposal: took = 13.430258ms
https://172.28.17.85:2379 is healthy: successfully committed proposal: took = 15.020918ms
https://172.21.130.169:2379 is unhealthy: failed to commit proposal: context deadline exceeded

 

但是重啟服務不好使(state狀態是new作用是新建,計算的member id還是那個值,但是集群成員查看還是存活的,所以不行)

 

 

[root@master ~]# etcdctl ${ep}  member list -w table +------------------+---------+--------+-----------------------------+-----------------------------+------------+
|        ID        | STATUS  |  NAME  |         PEER ADDRS          |        CLIENT ADDRS         | IS LEARNER |
+------------------+---------+--------+-----------------------------+-----------------------------+------------+
| 4c978cbca553cd70 | started | etcd-1 | https://172.21.130.169:2380 | https://172.21.130.169:2379 | false |
| 568fd04cf936e056 | started | etcd-3 |   https://172.28.17.85:2380 | https://172.28.17.85:2379 | false |
| cc0bba643b3d8ce1 | started | etcd-2 | https://172.21.130.168:2380 | https://172.21.130.168:2379 | false |
+------------------+---------+--------+-----------------------------+-----------------------------+------------+

 

 

 

May 20 14:00:28 master etcd: {"level":"fatal","ts":"2021-05-20T14:00:28.992+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"member 4c978cbca553cd70 has already been bootstrapped","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.16/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.16/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.16/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}

 

 

處理辦法:

 

1、修改ETCD_INITIAL_CLUSTER_STATE="existing"參數,重啟服務即可

 

2、刪除所有節點數據,重啟所有節點(等於重建,所以在生產環境不現實除非你有etcd快照,但是你恢復時間內的數據還是沒有)不推薦

 

3、刪除節點重新添加,雖然能恢復但是一樣費事(看個人愛好)

 

 

ETCD災備恢復問題

 

這是數據目錄所在

 

[root@master etcd]# pwd && tree /var/local/etcd . ├── bin ├── cfg │   └── etcd.conf ├── data └── ssl ├── ca-key.pem ├── ca.pem ├── client-key.pem ├── client.pem ├── member-key.pem ├── member.pem ├── server-key.pem └── server.pem 4 directories, 9 files

 

即使空目錄也不行,備份的時候軟件帶了數據的絕對路徑。如果在執行命令的時候--data-dir指定的目錄不存在會自動創建,如果存在會阻止命令執行

[root@master etcd]# etcdctl snapshot restore 2021-05-20.db --data-dir="/var/local/etcd/data" Error: data-dir "/var/local/etcd/data" exists

 

如果還是想放在之前的目錄下,記得刪除數據所在的文件夾就是data那個目錄。這就是之前災難恢復文章中,恢復創建新的 etcd 數據目錄這句話的意思(新建數據目錄)

 

[root@master ~]# tree /var/local/etcd/ && etcdctl snapshot restore 2021-05-20.db --data-dir="/var/local/etcd/data"
/var/local/etcd/ ├── bin ├── cfg │   └── etcd.conf └── ssl ├── ca-key.pem ├── ca.pem ├── client-key.pem ├── client.pem ├── member-key.pem ├── member.pem ├── server-key.pem └── server.pem 3 directories, 9 files {"level":"info","ts":1621503919.9106126,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"2021-05-20.db","wal-dir":"/var/local/etcd/data/member/wal","data-dir":"/var/local/etcd/data","snap-dir":"/var/local/etcd/data/member/snap"} {"level":"info","ts":1621503919.9167044,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]} {"level":"info","ts":1621503919.9224617,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"2021-05-20.db","wal-dir":"/var/local/etcd/data/member/wal","data-dir":"/var/local/etcd/data","snap-dir":"/var/local/etcd/data/member/snap"} [root@master ~]# [root@master ~]# tree /var/local/etcd /var/local/etcd ├── bin ├── cfg │   └── etcd.conf ├── data │   └── member │   ├── snap │   │   ├── 0000000000000001-0000000000000001.snap │   │   └── db │   └── wal │   └── 0000000000000000-0000000000000000.wal └── ssl ├── ca-key.pem ├── ca.pem ├── client-key.pem ├── client.pem ├── member-key.pem ├── member.pem ├── server-key.pem └── server.pem 7 directories, 12 files

 

 

參數不一致也有導致啟動失敗的可能。

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM