calico問題排障
這個問題幾乎每個人都會遇到。因為官方的step by step太傻白甜,沒有把IP_AUTODETECTION_METHOD這個IP檢測方法的參數放入calico.yaml中,calico會使用第一個找到的network interface(往往是錯誤的interface),導致Calico把master也算進nodes,於是master BGP啟動失敗,而其他workers則啟動成功。
問題描述
k8s集群安裝網絡組件calico后,查看pod
[ansible@k8s-cp calico]$ kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-jm74b 1/2 Running 0 7m16s
kube-system calico-node-xk4fg 1/2 Running 0 2m5s
kube-system coredns-7b47b4c577-447cn 1/1 Running 0 8m27s
kube-system coredns-7b47b4c577-svm5v 1/1 Running 0 8m27s
kube-system etcd-k8s-cp 1/1 Running 0 7m51s
kube-system kube-apiserver-k8s-cp 1/1 Running 0 8m1s
kube-system kube-controller-manager-k8s-cp 1/1 Running 0 8m4s
kube-system kube-proxy-nzmhh 1/1 Running 0 8m27s
kube-system kube-proxy-pjbp8 1/1 Running 0 2m5s
kube-system kube-scheduler-k8s-cp 1/1 Running 0 7m43s
等待幾分鍾后,pod calico-node-jm74b
和calico-node-xk4fg
的READY
值依然是1/2
查看pod calico-node-xk4fg
的詳細信息,發現有如下錯誤信息
Warning Unhealthy 11s (x19 over 3m11s) kubelet, k8s-agent-1 Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 172.18.0.1
這個問題會導致部署完業務容器后,網絡上有問題,導致業務無法正常訪問。
問題解決
calico在多網絡接口時自動檢測到錯誤的網絡接口,導致網絡無法連通,通過指定網絡接口(網卡名)解決問題
修改calico.yaml
在
- name: CLUSTER_TYPE
value: "k8s,bgp"
下增加兩行
- name: IP_AUTODETECTION_METHOD
value: "interface=enp0s3"
enp0s3
是我機器的網卡名
重新部署網絡組件calico, READY
值變為2/2
[ansible@k8s-cp calico]$ kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-jm74b 2/2 Running 0 15m
kube-system calico-node-xk4fg 2/2 Running 0 9m51s
kube-system coredns-7b47b4c577-447cn 1/1 Running 0 16m
kube-system coredns-7b47b4c577-svm5v 1/1 Running 0 16m
kube-system etcd-k8s-cp 1/1 Running 0 15m
kube-system kube-apiserver-k8s-cp 1/1 Running 0 15m
kube-system kube-controller-manager-k8s-cp 1/1 Running 0 15m
kube-system kube-proxy-nzmhh 1/1 Running 0 16m
kube-system kube-proxy-pjbp8 1/1 Running 0 9m51s
kube-system kube-proxy-wgz2c 1/1 Running 0 114s
kube-system kube-scheduler-k8s-cp 1/1 Running 0 15m