1.問題出現
因為臨時想在kubernetes集群(測試環境)上創建Pod發現好久沒用得k8s集群無法使用,報錯如下:
kubectl get ns
The connection to the server 10.21.4.113:6443 was refused - did you specify the right host or port?
就想到是kube-spiserver出了問題。所以就使用systemctl status kubelet -l 查看具體原因:
systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 二 2021-12-07 15:20:44 CST; 35s ago
Docs: https://kubernetes.io/docs/
Main PID: 28356 (kubelet)
Tasks: 55
Memory: 48.5M
CGroup: /system.slice/kubelet.service
└─28356 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kub...
12月 07 15:21:18 k8smaster kubelet[28356]: E1207 15:21:18.985804 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.086034 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.186265 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.286503 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.386722 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.486971 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.587204 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.687460 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.787686 28356 kubelet.go:2263] node "k8smaster" not found
12月 07 15:21:19 k8smaster kubelet[28356]: E1207 15:21:19.888025 28356 kubelet.go:2263] node "k8smaster" not found
根據報錯,認為兩個組件出現了問題,要么是kube-apiserver,要么就是etcd出了問題。
2.處理問題
因為我的測試k8s集群是由kubeadm安裝的,所以我就使用docker ps 還有docker logs查看kube-apiserver和etcd的相關log。看看問題到底出現在哪里
可以看出來k8s集群得證書過期了,然后更新證書就可以了
Kubelet組件證書默認有效期為1年。集群運行1年以后就會導致報 certificate has expired or is not yet valid 錯誤,導致集群 Node不能於集群 Master正常通信。重啟的話k8s就起不來了。
重新生成證書
驗證證書是否過期:
openssl x509 -noout -text -in /etc/kubernetes/pki/apiserver.crt
1.14后的版本可以使用這個命令查看過期時間,如下:
kubeadm alpha certs check-expiration
kubeadm 安裝得證書默認為 1 年,注意原證書文件必須保留在服務器上才能做延期操作,否則就會重新生成,集群可能無法恢復。
先把原配置和證書備份
cp -rp /etc/kubernetes /etc/kubernetes.bak
如果 kubeadm配置文件找不到了,就先生成一個默認的,然后自行修改:
kubeadm config print init-defaults > kubeadm.yaml
然后根據你自己的實際情況修改:主要修改:kubernetesVersion、advertiseAddress、imageRepository、serviceSubnet
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.21.4.113
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8smaster
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
修改完,用以上配置重新生成證書:
kubeadm alpha certs renew all --config=/data/kubeadm.yaml
延期配置之后需要更新配置文件
# 注意:更新配置文件前先以 move 方式備份,或刪除配置文件 mv /etc/kubernetes/*.conf /data/kubeconfback/ kubeadm init phase kubeconfig all --config=/data/kubeadm.yaml
之后重啟 kube-apiserver,etcd,scheduler,controller 容器
docker ps | grep -v pause | grep -E "etcd|scheduler|controller|apiserver" | awk '{print $1}' | awk '{print "docker","restart",$1}' | bash
或者重啟 kubelet
systemctl restart kubelet
如果重啟kubelet后發現出現這樣得報錯 error: You must be logged in to the server (Unauthorized)。還得更改下權限(因為重新為組件分配了證書)
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile
如果是非root用戶:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
最后查看集群狀態: