k8s跨網段和多網卡部署時遇到的坑


部署背景

公司在杭州有1台服務器,前期已部署k8s的master節點,后續上海又新增加1台服務器,這次部署加入k8s集群中,部署為node
添加進集群中后,查看calico的pod中的node一直處於running狀態,但是狀態不是ready
查看pod詳情報錯

Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  45s   default-scheduler  Successfully assigned kube-system/calico-node-pkbkv to k8s-node2
  Normal   Started    45s   kubelet            Started container install-cni
  Normal   Pulled     45s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Started    45s   kubelet            Started container upgrade-ipam
  Normal   Pulled     45s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Created    45s   kubelet            Created container install-cni
  Normal   Created    45s   kubelet            Created container upgrade-ipam
  Normal   Started    44s   kubelet            Started container flexvol-driver
  Normal   Pulled     44s   kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.20.0" already present on machine
  Normal   Created    44s   kubelet            Created container flexvol-driver
  Normal   Pulled     43s   kubelet            Container image "docker.io/calico/node:v3.20.0" already present on machine
  Normal   Created    43s   kubelet            Created container calico-node
  Normal   Started    43s   kubelet            Started container calico-node
  Warning  Unhealthy  40s   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  30s   kubelet            Readiness probe failed: 2021-09-15 02:36:49.282 [INFO][417] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.6.120,172.17.6.121,172.17.6.122

網上搜索了一番,發現是由於calico不配置ip自動檢測策略時,默認為first_found,當服務器有雙網卡配置雙IP時,會導致使用了另外一張網卡的IP地址,導致網絡不可達

IP_AUTODETECTION_METHOD 配置項默認為first-found,這種模式中calico會使用第一獲取到的有效網卡,雖然會排除docker網絡,localhost啥的,但是在復雜網絡環境下還是有出錯的可能。在這次異常中master1上的calico選擇了另外一張網卡enp13s0f1,而該網卡配置的IP為內網IP。

image
找到原因后,重新修改calico的yaml文件,配置項中添加env參數,添加位置為:spec.template.spec.containers[0]calico-node.env 下

            - name: IP_AUTODETECTION_METHOD
              value: can-reach=www.baidu.com

image
修改完成yml文件后,使用命令重建calico資源即可

kubectl replace -f calico.yaml

修改完成后,發現網絡正常可用


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM