背景:
默認情況下部署kubernetes集群的證書一年內便過期,如果不及時升級證書導致證書過期,Kubernetes控制節點便會不可用,所以需要升級Kubernetes集群版本或者及時更新Kubernetes證書(kubeadm升級Kubernetes證書(證書未過期))避免因證書過期導致集群不可用問題,本文主要講解因未及時更新Kubernetes證書導致的證書過期后,使用kubeadm重新頒發Kubernetes證書的步驟,本文kubernetes版本為1.18.6。
注意:生產環境一般不會升級Kubernetes版本,所以還可以修改kubeadm源碼,修改證書過期時間,重新編譯kubeadm使其頒發的Kubernetes證書時間為我們想要的年限(Kubeadm頒發證書延遲到10年 )。
步驟:
1、查看證書過期時間
kubeadm alpha certs check-expiration
該命令僅在kubernetes1.15 之后的版本可用,版本過低無法使用 kubeadm 命令時,還可以通過 openssl 查看對應證書是否過期。
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '
2、備份證書
備份原有證書
cp -r /etc/kubernetes/ /tmp/backup/
備份etcd數據
cp -r /var/lib/etcd /tmp/etcd-backup/
3、刪除舊的證書
將 /etc/kubernetes/pki
下要重新生成的證書刪除
sudo rm -rf /etc/kubernetes/pki/apiserver.key
4、重新生成證書
主要通過 kubeadm alpha certs renew 命令生成,命令簡介如下
kubeadm init phase kubeconfig -h Usage: kubeadm init phase kubeconfig [flags] kubeadm init phase kubeconfig [command] Available Commands: admin Generates a kubeconfig file for the admin to use and for kubeadm itself all Generates all kubeconfig files controller-manager Generates a kubeconfig file for the controller manager to use kubelet Generates a kubeconfig file for the kubelet to use *only* for cluster bootstrapping purposes scheduler Generates a kubeconfig file for the scheduler to use
重新生成所有證書
kubeadm alpha certs renew all
重新生成某個組件的證書
kubeadm alpha certs renew apiserver
5、 重新生成配置文件
備份舊的配置
mv /etc/kubernetes/*.conf /tmp/
生成新的配置,主要通過 kubeadm init phase kubeconfig 命令執行:
kubeadm init phase kubeconfig -h Usage: kubeadm init phase kubeconfig [flags] kubeadm init phase kubeconfig [command] Available Commands: admin Generates a kubeconfig file for the admin to use and for kubeadm itself all Generates all kubeconfig files controller-manager Generates a kubeconfig file for the controller manager to use kubelet Generates a kubeconfig file for the kubelet to use *only* for cluster bootstrapping purposes scheduler Generates a kubeconfig file for the scheduler to use
重新生成所有配置
kubeadm init phase kubeconfig all
重新生成單個配置文件
// 重新生成 admin 配置文件 kubeadm init phase kubeconfig admin // 重新生成 kubelet 配置文件 kubeadm init phase kubeconfig kubelet
6、 后續操作
完成證書和配置文件的更新后,需要進行一系列后續操作保證更新生效,主要包括重啟 kubelet、更新管理配置。
- 重啟 kubelet
systemctl restart kubelet
- 更新 admin 配置
將新生成的 admin.conf 文件拷貝,替換 ~/.kube 目錄下的 config 文件。
cp /etc/kubernetes/admin.conf ~/.kube/config
完成以上操作后整個集群就可以正常通信了,操作過程中主要就是 kubeadm alpha certs renew 命令和 kube init phase kubeconfig命令,在操作過程中發現網上很多資料命令因為版本原因已經不適用了,因此自己在操作時一定要通過 -h 詳細看下命令,避免操作出錯。
注意:如果是多個master節點,需要同步證書到其他master節點上,或者每個master節點都執行上面步驟。
7、其他問題
重新頒發kubernetes證書后可能會遇到下面問題
1)頒發證書后查看證書過期時間發現etcd相關證書都缺失
[root@master1 ~]# kubeadm alpha certs check-expiration [check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [check-expiration] Error reading configuration from the Cluster. Falling back to default configuration W0308 13:01:12.196352 31105 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Mar 08, 2023 04:58 UTC 364d no apiserver Mar 08, 2023 04:57 UTC 364d ca no !MISSING! apiserver-etcd-client apiserver-kubelet-client Mar 08, 2023 04:57 UTC 364d ca no controller-manager.conf Mar 08, 2023 04:58 UTC 364d no !MISSING! etcd-healthcheck-client !MISSING! etcd-peer !MISSING! etcd-server front-proxy-client Mar 08, 2023 04:57 UTC 364d front-proxy-ca no scheduler.conf Mar 08, 2023 04:58 UTC 364d no CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Feb 24, 2031 01:55 UTC 8y no !MISSING! etcd-ca front-proxy-ca Feb 24, 2031 01:55 UTC 8y no
並且kube-apiserver服務也起不來,查看日志如下
c = "transport: Error while dialing dial tcp 192.168.249.141:2379: connect: connection refused". Reconnecting... W0308 04:59:47.532589 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://192.168.249.141:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 192.168.249.141:2379: connect: connection refused". Reconnecting... W0308 04:59:48.850564 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://192.168.249.141:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 192.168.249.141:2379: connect: connection refused". Reconnecting... .....
通過查看etcd服務日志可以看到當前控制節點不能拉取etcd鏡像
[root@master1 manifests]# journalctl -f -u etcd -- Logs begin at 二 2022-03-08 13:04:12 CST. -- 3月 08 13:09:41 master1 systemd[1]: Stopped etcd docker wrapper. 3月 08 13:09:41 master1 systemd[1]: Starting etcd docker wrapper... 3月 08 13:09:42 master1 docker[5857]: Error: No such container: etcd1 3月 08 13:09:42 master1 systemd[1]: Started etcd docker wrapper. 3月 08 13:09:42 master1 etcd[5878]: Unable to find image '192.168.249.143/cloudbases/etcd:v3.3.12' locally 3月 08 13:09:42 master1 etcd[5878]: /usr/bin/docker: Error response from daemon: Get https://192.168.249.143/v2/: dial tcp 192.168.249.143:443: connect: connection refused. 3月 08 13:09:42 master1 etcd[5878]: See '/usr/bin/docker run --help'. 3月 08 13:09:42 master1 systemd[1]: etcd.service: main process exited, code=exited, status=125/n/a 3月 08 13:09:42 master1 systemd[1]: Unit etcd.service entered failed state. 3月 08 13:09:42 master1 systemd[1]: etcd.service failed.
當前控制節點拉取etcd鏡像后,服務都恢復正常。