Kubernetes初始化失敗處理


 

Kubernetes網絡之Calico

 

https://www.projectcalico.org/

 

 

 故障:

 

 

Warning  Unhealthy  4m36s  kubelet            Readiness probe failed: 2021-05-06 06:23:16.868 [INFO][135] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.21.130.168,172.28.17.85
Warning  Unhealthy  4m26s  kubelet  Readiness probe failed: 2021-05-06 06:23:26.885 [INFO][174] confd/health.go 180: Number of node(s) with BGP peering established = 0

 

查看節點calico狀態

calicoctl node status 

修改calico的yaml文件,添加配置項

 

- name: IP_AUTODETECTION_METHOD
  value: ""

 

IP_AUTODETECTION_METHOD 配置項默認為first-found,這種模式中calico會使用第一獲取到的有效網卡,雖然會排除docker網絡,localhost,但是在復雜網絡環境下還是有出錯的可能。在這次異常中master上的calico選擇了一個非主網卡。

 

為了解決這種情況,IP_AUTODETECTION_METHOD還提供了兩種配置can-reach=DESTINATION,interface=INTERFACE-REGEX。

 

can-reach=DESTINATION  配置可以理解為calico會從部署節點路由中獲取到達目的ip或者域名的源ip地址。

interface=INTERFACE-REGEX  配置可以指定calico使用匹配的網卡上的第一個IP地址。列出網卡和IP地址的順序取決於系統。匹配支持goalong的正則語法。

 

 

 

 

 

cat >> /etc/sysctl.d/k8s.conf < EOF /etc/sysctl.d/k8s.conf vm.swappiness = 0 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF

 

 

 

初始化:

 

kubeadm init \


--apiserver-advertise-address=172.21.130.169 \


--image-repository=registry.aliyuncs.com/google_containers \


--kubernetes-version=v1.20.5 \


--pod-network-cidr=10.244.0.0/16 \


--service-cidr=10.96.0.0/12 \


--ignore-preflight-errors=all

 

 

 

 

 

初始化過程說明:

 

  1. [preflight] kubeadm 執行初始化前的檢查。
  2. [kubelet-start] 生成kubelet的配置文件”/var/lib/kubelet/config.yaml”
  3. [certificates] 生成相關的各種token和證書
  4. [kubeconfig] 生成 KubeConfig 文件,kubelet 需要這個文件與 Master 通信
  5. [control-plane] 安裝 Master 組件,會從指定的 Registry 下載組件的 Docker 鏡像。
  6. [bootstraptoken] 生成token記錄下來,后邊使用kubeadm join往集群中添加節點時會用到
  7. [addons] 安裝附加組件 kube-proxy 和 kube-dns。
  8. Kubernetes Master 初始化成功,提示如何配置常規用戶使用kubectl訪問集群。
  9. 提示如何安裝 Pod 網絡。
  10. 提示如何注冊其他節點到 Cluster。

 

鏡像加速:

 

cat > /etc/docker/daemon.json <<EOF { "registry-mirrors": ["https://qpu51nc6.mirror.aliyuncs.com"] } EOF

 

 

 

故障:

 

 

 

 

swapoff -a && kubeadm reset  && systemctl daemon-reload && systemctl restart kubelet  && iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

 

 

高可用初始化失敗原因之一:

 

 

 

[root@master1 docker]#   kubeadm join 172.21.130.200:6443 --token abcdef.0123456789abcdef \
>     --discovery-token-ca-cert-hash sha256:299a76c8350a5333f322101230be1aad673822a840eaea7788975ce002c56452 \
>     --control-plane --certificate-key 3359262ac1063c05c30aff1b2fb602fb73300871909222538e14ec4345bc3ea5
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master master1 master2 node1 node2] and IPs [10.0.0.1 172.28.17.85 172.21.130.200 172.21.130.169 172.21.130.168 172.28.17.86 172.28.17.87]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [172.28.17.85 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [172.28.17.85 127.0.0.1 ::1]
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
^C

 

 

 

如果docker 改了cgroup為systemd,並且Kubernetes利用它作了初始化。那么后加入的節點docker的cgroup一定也要是system,不然就會出上面的問題(cgroup資源限制/資源隔離)不一致導致

 

 

 

[root@master1 docker]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0510 22:41:17.209172   18233 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "master1" not found
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0510 22:41:18.170829   18233 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[root@master1 docker]# cat > /etc/docker/daemon.json <<EOF
> {
>     "exec-opts": ["native.cgroupdriver=systemd"], >     "log-driver": "json-file",
>     "log-opts": {
>     "max-size": "100m"
>     },
>     "storage-driver": "overlay2",
>     "registry-mirrors":[
>         "https://kfwkfulq.mirror.aliyuncs.com",
>         "https://2lqq34jg.mirror.aliyuncs.com",
>         "https://pee6w651.mirror.aliyuncs.com",
>         "http://hub-mirror.c.163.com",
>         "https://docker.mirrors.ustc.edu.cn",
>         "https://registry.docker-cn.com"
>     ]
> }
> EOF
[root@master1 docker]# 
[root@master1 docker]# 
[root@master1 docker]# # Restart docker.
[root@master1 docker]# systemctl daemon-reload
[root@master1 docker]# systemctl restart docker
[root@master1 docker]#   kubeadm join 172.21.130.200:6443 --token abcdef.0123456789abcdef     --discovery-token-ca-cert-hash sha256:299a76c8350a5333f322101230be1aad673822a840eaea7788975ce002c56452     --control-plane --certificate-key 3359262ac1063c05c30aff1b2fb602fb73300871909222538e14ec4345bc3ea5
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master master1 master2 node1 node2] and IPs [10.0.0.1 172.28.17.85 172.21.130.200 172.21.130.169 172.21.130.168 172.28.17.86 172.28.17.87]
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [172.28.17.85 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [172.28.17.85 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node master1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

        mkdir -p $HOME/.kube
        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
        sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM