部署背景
公司在杭州有1台服務器,前期已部署k8s的master節點,后續上海又新增加1台服務器,這次部署加入k8s集群中,部署為node
添加進集群中后,查看calico的pod中的node一直處於running狀態,但是狀態不是ready
查看pod詳情報錯
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45s default-scheduler Successfully assigned kube-system/calico-node-pkbkv to k8s-node2
Normal Started 45s kubelet Started container install-cni
Normal Pulled 45s kubelet Container image "docker.io/calico/cni:v3.20.0" already present on machine
Normal Started 45s kubelet Started container upgrade-ipam
Normal Pulled 45s kubelet Container image "docker.io/calico/cni:v3.20.0" already present on machine
Normal Created 45s kubelet Created container install-cni
Normal Created 45s kubelet Created container upgrade-ipam
Normal Started 44s kubelet Started container flexvol-driver
Normal Pulled 44s kubelet Container image "docker.io/calico/pod2daemon-flexvol:v3.20.0" already present on machine
Normal Created 44s kubelet Created container flexvol-driver
Normal Pulled 43s kubelet Container image "docker.io/calico/node:v3.20.0" already present on machine
Normal Created 43s kubelet Created container calico-node
Normal Started 43s kubelet Started container calico-node
Warning Unhealthy 40s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 30s kubelet Readiness probe failed: 2021-09-15 02:36:49.282 [INFO][417] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.6.120,172.17.6.121,172.17.6.122
網上搜索了一番,發現是由於calico不配置ip自動檢測策略時,默認為first_found,當服務器有雙網卡配置雙IP時,會導致使用了另外一張網卡的IP地址,導致網絡不可達
IP_AUTODETECTION_METHOD 配置項默認為first-found,這種模式中calico會使用第一獲取到的有效網卡,雖然會排除docker網絡,localhost啥的,但是在復雜網絡環境下還是有出錯的可能。在這次異常中master1上的calico選擇了另外一張網卡enp13s0f1,而該網卡配置的IP為內網IP。
找到原因后,重新修改calico的yaml文件,配置項中添加env參數,添加位置為:spec.template.spec.containers[0]calico-node.env 下
- name: IP_AUTODETECTION_METHOD
value: can-reach=www.baidu.com
修改完成yml文件后,使用命令重建calico資源即可
kubectl replace -f calico.yaml
修改完成后,發現網絡正常可用