運行生產級別的Kubernetes集群,無論您的集群運行的多穩定,定期備份是未雨綢繆,一定要做的工作。
Kubernetes集群的運行狀態都保存在ETCD中,為了確保您生產環境的穩定性。建議您定期備份。
1. 如何在備份阿里雲容器服務Kubernetes的ETCD數據
首先由於ETCD有三個備份,並且會同步,所以您只需要在一台master機器上執行ETCD備份即可。
另外在運行下列命令前,確保當前機器的kube-apiserver是運行的。
ps -ef|grep kube-apiserver root 2063 2047 1 1月05 ? 00:41:01 kube-apiserver
執行備份命令
export ETCD_SERVERS=$(ps -ef|grep apiserver|grep -Eo "etcd-servers=.*2379"|awk -F= '{print $NF}') mkdir -p /var/lib/etcd_backup/ ETCDCTL_API=3
etcdctl snapshot --endpoints=$ETCD_SERVERS --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-client.pem --key=/var/lib/etcd/cert/etcd-client-key.pem save /var/lib/etcd_backup/backup_$(date "+%Y%m%d%H%M%S").db Snapshot saved at /var/lib/etcd_backup/backup_20180107172459.db
執行完成后,您可以在/var/lib/etcd_backup中找到備份的snapshot
[root@iZwz95q64qi83o88y9lq4cZ etcd_backup]# cd /var/lib/etcd_backup/ [root@iZwz95q64qi83o88y9lq4cZ etcd_backup]# ls backup_20180107172459.db [root@iZwz95q64qi83o88y9lq4cZ etcd_backup]# du -sh backup_20180107172459.db 8.0M backup_20180107172459.db
2. 利用ETCD的備份恢復Kubernetes集群
2.1 首先需要分別停掉三台Master機器的kube-apiserver,
mkdir -p /etc/kubernetes/manifests-backups
mv /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifests-backups/
2.2 確保kube-apiserver已經停止了,執行下列命令返回值為0
ps -ef|grep kube-api|grep -v grep |wc -l 0
2.3 分別在三台Master節點上,停止ETCD服務
service etcd stop
2.4 確保ETCD停止成功
ps -ef|grep etcd|grep -v etcd|wc -l 0
2.5 移除ETCD數據目錄
mv /var/lib/etcd/data.etcd /var/lib/etcd/data.etcd_bak
2.6 分別在各個節點恢復數據,首先需要拷貝數據到每個master節點, 假設備份數據存在於/var/lib/etcd_backup/backup_20180107172459.db
scp /var/lib/etcd_backup/backup_20180107172459.db root@master1:/var/lib/etcd_backup/ scp /var/lib/etcd_backup/backup_20180107172459.db root@master2:/var/lib/etcd_backup/ scp /var/lib/etcd_backup/backup_20180107172459.db root@master3:/var/lib/etcd_backup/
執行恢復命令
set -x export ETCD_NAME=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "name.*-name-[0-9].*--client"|awk '{print $2}') export ETCD_CLUSTER=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "initial-cluster.*--initial"|awk '{print $2}') export ETCD_INITIAL_CLUSTER_TOKEN=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "initial-cluster-token.*"|awk '{print $2}') export ETCD_INITIAL_ADVERTISE_PEER_URLS=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "initial-advertise-peer-urls.*--listen-peer"|awk '{print $2}') ETCDCTL_API=3
etcdctl snapshot --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-client.pem --key=/var/lib/etcd/cert/etcd-client-key.pem restore /var/lib/etcd_backup/backup_20180107172459.db \ --name $ETCD_NAME \ --data-dir /var/lib/etcd/data.etcd \ --initial-cluster $ETCD_CLUSTER \ --initial-cluster-token $ETCD_INITIAL_CLUSTER_TOKEN \ --initial-advertise-peer-urls $ETCD_INITIAL_ADVERTISE_PEER_URLS chown -R etcd:etcd /var/lib/etcd/data.etcd
2.7 分別在三個master節點啟動ETCD,並且通過service命令確認啟動成功
# service etcd start # service etcd status
2.8 檢查ETCD的健康
# export ETCD_SERVERS=$(cat /etc/kubernetes/manifests-backups/kube-apiserver.yaml |grep etcd-server|awk -F= '{print $2}') ETCDCTL_API=3 etcdctl endpoint health --endpoints=$ETCD_SERVERS --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-client.pem --key=/var/lib/etcd/cert/etcd-client-key.pem https://192.168.250.198:2379 is healthy: successfully committed proposal: took = 2.238886ms https://192.168.250.196:2379 is healthy: successfully committed proposal: took = 3.390819ms https://192.168.250.197:2379 is healthy: successfully committed proposal: took = 2.925103ms
2.9 如果ETCD是健康的,就到每台Master上恢復kube-apiserver
# mv /etc/kubernetes/manifests-backups/kube-apiserver.yaml /etc/kubernetes/manifests/
2.10 檢查集群是否恢復正常,可以看到集群已經正常啟動了。之前部署的應用也還在。
# kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy {"health": "true"} etcd-2 Healthy {"health": "true"} etcd-1 Healthy {"health": "true"} # kubectl get no NAME STATUS ROLES AGE VERSION cn-shenzhen.i-wz90xxpi51k2u51t5y0p Ready master 44d v1.8.4 cn-shenzhen.i-wz93236e8pccdscwz3ha Ready master 44d v1.8.4 cn-shenzhen.i-wz953xx6qnlzdi6vo2aa Ready <none> 44d v1.8.4 cn-shenzhen.i-wz953xx6qnlzdi6vo2ab Ready <none> 44d v1.8.4 kubectl get deploy NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE nginx 1 1 1 1 23d
總結:
Kubernetes的備份主要是通過ETCD的備份完成的。而恢復時,主要考慮的是整個順序:停止kube-apiserver,停止ETCD,恢復數據,啟動ETCD,啟動kube-apiserver。
注意:該方案僅適用於同一個集群的元數據備份和恢復,並不適用不同集群之間的遷移(migration)。