問題一:
Nov 20 06:13:44 production-elk kubelet: I1120 06:13:44.919624 9429 state_mem.go:36] [cpumanager] initializing new in-memory state store Nov 20 06:13:44 production-elk kubelet: E1120 06:13:44.919737 9429 container_manager_linux.go:291] failed to initialize cpu manager: could not initialize checkpoint manager: could not restore state from checkpoint: checkpoint is corrupted Nov 20 06:13:44 production-elk kubelet: Please drain this node and delete the CPU manager checkpoint file "/var/lib/kubelet/cpu_manager_state" before restarting Kubelet. Nov 20 06:13:44 production-elk kubelet: F1120 06:13:44.919751 9429 server.go:262] failed to run Kubelet: could not initialize checkpoint manager: could not restore state from checkpoint: checkpoint is corrupted Nov 20 06:13:44 production-elk kubelet: Please drain this node and delete the CPU manager checkpoint file "/var/lib/kubelet/cpu_manager_state" before restarting Kubelet.
解決辦法:
rm -rf /var/lib/kubelet/cpu_manager_state
問題二:
kubectl get cs查看組件狀態kube-scheduler和kube-controller-manager顯示unhealthy
$ kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused etcd-0 Healthy {"health":"true"}
解決方法:
kubernetes版本:v1.20.1
確認kube-scheduler和kube-controller-manager組件配置是否禁用了非安全端口
配置文件路徑:/etc/kubernetes/manifests/kube-scheduler.yaml、/etc/kubernetes/manifests/kube-controller-manager.yaml
如controller-manager組件的配置如下:可以去掉--port=0這個設置,然后重啟sudo systemctl restart kubelet
重啟服務之后確認組件狀態,顯示就正常了
查看狀態
$ kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"}
問題三:
[root@test ~]# kubectl get nodes Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
解決方法:
執行kubeadm reset命令清除已創建的集群配置之后需要刪除 rm -rf $HOME/.kube 然后重新執行下面的命令
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config