一,前情提要:
集群中有3個節點(1個master, 2個node),部署的業務服務為NodePort類型,通過nodeip:port訪問業務的時候發現,一個nodeIP不通,另外兩個不通
HOST_1=192.168.86.188 --master
HOST_2=192.168.86.189
HOST_3=192.168.86.190
# curl http://192.168.86.188:30038/targets
189, 190上查看:---OK
# curl http://192.168.86.189:30038/targets ---ok
# kubectl get svc -nprometheus NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus ClusterIP 10.96.167.153 <none> 9090/TCP 138m prometheus-nodeport NodePort 10.109.216.20 <none> 9090:30038/TCP 138m ---服務ip地址 # kubectl get pods -nprometheus -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE prometheus-0 2/2 Running 0 143m 10.244.2.6 host189 <none>
1)host188上,即問題節點上 [root@host188 ~]## iptables -t nat -L KUBE-NODEPORTS KUBE-MARK-MASQ tcp -- anywhere anywhere /* prometheus/prometheus-nodeport:http */ tcp dpt:30038 KUBE-SVC-SO4PW3GA7SPS7IHM tcp -- anywhere anywhere /* prometheus/prometheus-nodeport:http */ tcp dpt:30038 Chain KUBE-SVC-SO4PW3GA7SPS7IHM (2 references) target prot opt source destination KUBE-SEP-5GEH5OENN4FETTOQ all -- anywhere anywhere Chain KUBE-SEP-5GEH5OENN4FETTOQ (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 10.244.2.6 anywhere DNAT tcp -- anywhere anywhere tcp to:10.244.2.6:9090
【解析】: 如果目的地址是30038的,則首先,打上做MASQ的mark, 然后進行MASQ即將其進行目的地址轉化,即交給SVC鏈(代表集群level的服務)
然后,SVC鏈會做分流,然后選擇一個SEP鏈(代表一個enpoint),在這里就是交給唯一的pod,地址正是10.244.2.6:9090,
最后,進入路由表進行路由匹配
附: # iptables -t nat -S 和 #iptables -t nat -L
二者還是有區別的,前者信息能更完整一些,所以推薦二者結合使用
2. 路由方面的追蹤
[root@host188 ~]# route -n ----問題節點上的路由表 Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 ens160 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1 ###去往宿主機是190上的路由,ok的 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens160 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
[root@host190 ~]# route -n ----標准節點上的路由表 Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 ens160 10.244.0.0 10.244.0.0 255.255.255.0 UG 0 0 0 flannel.1 ###去往宿主機188上的pods的網段路由 10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.2.0 10.244.2.0 255.255.255.0 UG 0 0 0 flannel.1 ###去往189宿主機上pods的網段路由 10.244.57.0 0.0.0.0 255.255.255.192 U 0 0 0 * 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens160 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
【解析】:可以看到188上的路由表中沒有2.0網段的路由,即去190節點上的pod的路由,而目標pod正好是部署在了190節點上。那么路由是誰來添加的?
根據k8s的網絡原理(flannel)知道,k8s為集群中的每個節點分配一個子網段,然后記錄在etcd中,flannel通過讀取etcd的信息(不一定是直接讀取,可能是借助api server)得到各個其他節點上分配的子網段信息, 然后向路由表中添加路由信息: 能夠到達其他pod的路由信息。由此我們知道,出現問題可能有兩個方向: 1)etcd信息損毀; 2)flannel添加路由失敗。
wxy碎碎念:
實際上在開始查看路由表的時候,188節點上,使用ip route命令是沒有該網段信息,而用route -n命令是可以看到2.0信息,但是其Iface對應的值是*,並不是應該的flannel.1
然后刪除這條路由失敗,提示NO Process...,但后來發現是掩碼不對,之后我們會說
3. 檢查etcd的信息
HOST_1=192.168.86.188 HOST_2=192.168.86.189 HOST_3=192.168.86.190 ENDPOINTS=$HOST_1:2379,$HOST_2:2379,$HOST_3:2379 TLS_CERT=" --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/client.pem --key=/etc/etcd/pki/client-key.pem" # docker exec etcd /bin/sh -c "export ETCDCTL_API=3 && etcdctl --endpoints=$ENDPOINTS $TLS_CERT get /registry/minions/linuxtest1a7a" /registry/minions/host189 結果: ...以下信息為flannel的子網段信息和對應的public ip(host的地址)... "flannel.alpha.coreos.com/public-ip 192.168.86.189bB 10.244.2.0/24 ¿! ....
【解析】: 查看了所有節點的信息,發現子網段以及對應的public ip都是存在且正確的,所以說明問題不在etcd身上。(wxy:因為重啟過etcd, 所以實際上一直擔心是他的問題,這下稍稍放心了)
4. flannel方面的問題
1) 首先,檢查下路由表關於flannel接口方面的信息,如下是查看interface的鄰居,這些都是通過arp來發現的
[root@host188 ~]## ip neigh show dev flannel.1 ###問題節點上,只發現了190一個neighbor 10.244.1.0 lladdr 76:50:75:23:dd:f8 PERMANENT [root@host189 ~]# ip neigh show dev flannel.1 ###ok節點1: 有發現兩個neighbor 10.244.1.0 lladdr 76:50:75:23:dd:f8 PERMANENT 10.244.0.0 lladdr 5a:d6:46:ab:ea:17 PERMANENT [root@host188 net.d]# bridge fdb show flannel.1 ###問題節點上的,轉發表信息 01:00:5e:00:00:01 dev ens160 self permanent 01:00:5e:00:00:01 dev docker0 self permanent b6:88:ae:4a:ac:42 dev docker0 vlan 1 master docker0 permanent 76:50:75:23:dd:f8 dev flannel.1 dst 192.168.86.190 self permanent ------------ [root@host190 ~]# ip neigh show dev flannel.1 ###ok節點2: 也有發現兩個neighbor 10.244.2.0 lladdr 92:e7:b6:4e:e5:73 PERMANENT 10.244.0.0 lladdr 5a:d6:46:ab:ea:17 PERMANENT
【解析】: 這其實是再一次印證了路由缺失的問題,因為如上是arp信息,正是因為路由的不存在,導致了對應的neighbor的缺失,所以還是要看看flannel的問題
2) 然后,flannel插件的檢查,發現其運行日志有報出路由添加失敗,至此直接原因找到
# kubectl logs -f -nkube-system kube-flannel-ds-fvj6h I1028 01:59:01.891322 1 main.go:475] Determining IP address of default interface ----會有刪除操作? I1028 01:59:01.891599 1 main.go:488] Using interface with name ens160 and address 192.168.86.188 I1028 01:59:01.891622 1 main.go:505] Defaulting external address to interface address (192.168.86.188) I1028 01:59:01.996901 1 kube.go:131] Waiting 10m0s for node controller to sync I1028 01:59:01.996901 1 kube.go:294] Starting kube subnet manager I1028 01:59:02.997083 1 kube.go:138] Node controller sync successful I1028 01:59:02.997115 1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - host188 I1028 01:59:02.997131 1 main.go:238] Installing signal handlers I1028 01:59:02.997229 1 main.go:353] Found network config - Backend type: vxlan I1028 01:59:02.997297 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false I1028 01:59:02.997619 1 main.go:300] Wrote subnet file to /run/flannel/subnet.env I1028 01:59:02.997630 1 main.go:304] Running backend. I1028 01:59:02.997636 1 main.go:322] Waiting for all goroutines to exit I1028 01:59:02.997650 1 vxlan_network.go:60] watching for new subnet leases E1028 01:59:02.997781 1 vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument ###關鍵:flannel報出路由添加失敗
3)進一步的,為為了驗證flannel對路由的影響,我修改了DaemonSet,添加了親和性,讓問題節點188上不能被調度flannel,如下
修改DaemonSet的親和性, 讓這個節點不啟動flannel進程,查看路由信息是否會刪除 spec: revisionHistoryLimit: 10 selector: matchLabels: app: flannel tier: node template: metadata: creationTimestamp: null labels: app: flannel tier: node spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: NotIn values: - host188 # kubectl logs -f -nkube-system kube-flannel-ds-fvj6h ... I1028 01:59:02.997650 1 vxlan_network.go:60] watching for new subnet leases E1028 01:59:02.997781 1 vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument I1028 05:31:48.554667 1 main.go:337] shutdownHandler sent cancel signal... E1028 05:31:48.554782 1 vxlan_network.go:183] DelARP failed: no such file or directory E1028 05:31:48.554812 1 vxlan_network.go:187] DelFDB failed: no such file or directory E1028 05:31:48.554835 1 vxlan_network.go:191] failed to delete vxlanRoute (10.244.2.0/24 -> 10.244.2.0): no such process
【解析】: 也就是說,flannel在銷毀的時候,是會嘗試銷毀相關的路由條目,但是為什么會刪除失敗呢,而且還是no such process,很明顯這是內核報出的錯誤
而flannel程序本身不可能有問題,肯定是內核此時的狀態,不允許flannel刪除,為什么呢?在仔細看了下路由表,發現路由表中該條路由的掩碼並不是24位,
這是哪里的進程殘留下來的路由條目么?其實此時就有了蛛絲馬跡,是由殘留存在了...
再結合網絡上的一些博客有提到:flannel報出 invalid argument,有可能是和接口地址沖突了, 再加上此時的這條殘留,突然覺得是不是忽略了什么
附: 刪除路由操作
# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface ..10.244.2.0 0.0.0.0 255.255.255.192 U 0 0 0 * ----就是你 手動刪除: ip route del 10.244.2.0/24 --not ok # ip route del 10.244.2.0/255.255.255.192 --ok,刪除成功 # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 ens160 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens160 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
4) 接口信息全面檢查
# ifconfig -a ###注意要有個參數a, 才能發現這條信息,查看了別的設備,雖然也有這個接口,但是網段都是非沖突的,如下 [root@host188 etc]# ifconfig -a .... tunl0: flags=128<NOARP> mtu 1440 inet 10.244.2.0 netmask 255.255.255.255 ###這個網段,剛好是當前集群網絡中使用的,所以導致發生了沖突 tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 9552860 bytes 9661438904 (8.9 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10309373 bytes 2978060504 (2.7 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@host189 ~]# ifconfig -a ... tunl0: flags=128<NOARP> mtu 1440 inet 10.244.188.0 netmask 255.255.255.255 tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 11031772 bytes 8680651091 (8.0 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10557451 bytes 9723264443 (9.0 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@host190 ~]# ifconfig -a ... tunl0: flags=128<NOARP> mtu 1440 inet 10.244.57.0 netmask 255.255.255.255 tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 720412 bytes 48975635 (46.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 701770 bytes 5701518725 (5.3 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
【解析】: 經過和ll確認,IPIP Tunnel是k8s 1.18版本中,calico類型的網絡中使用的,並不是當前的,而這個測試環境會經常k8s 1.18切換安裝,所以有了這樣的殘留。
三,問題解決
[root@host188 etc]# ifconfig tunl0 down [root@host188 etc]# ip tunnel del tunl0 delete tunnel "tunl0" failed: Operation not permitted [root@host188 etc]# ip tunnel show tunl0: ip/ip remote any local any ttl inherit nopmtudisc [root@host188 etc]# lsmod Module Size Used by ... ipip 13465 0 ---ipip模塊的存在讓tunl0無法刪除 [root@host188 etc]# rmmod ipip ----卸載內核模塊 [root@host188 etc]# ip tunnel show ---此時刪除成功
[root@host188 etc]# route -n ---至此,路由信息正確添加進來了 Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.86.1 0.0.0.0 UG 0 0 0 ens160 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1 10.244.2.0 10.244.2.0 255.255.255.0 UG 0 0 0 flannel.1 ----有了 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens160 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.86.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
最后的最后,重新加載ipip模塊
# modprobe ipip # lsmod Module Size Used by ipip 13465 0
#此時tunl0也自動被創建,但並未申請ip tunl0: flags=128<NOARP> mtu 1480 tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
============================================END=========================================================================