open /run/flannel/subnet.env: no such file or directory
open /run/flannel/subnet.env: no such file or directory Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9a5eade3c13f1eeeb000df80e942ed22e59d2c532def6f1f281fd2ebefdcfa2c" network for pod "mcw01dep-nginx1-69bc6f5957-lzpdv": networkPlugin cni failed to set up pod "mcw01dep-nginx1-69bc6f5957-lzpdv_default" network: open /run/flannel/subnet.env: no such file or directory scp /run/flannel/subnet.env 結點 scp /run/flannel/subnet.env 10.0.0.6:/run/flannel/
問題一:查看網絡狀態報錯:RTNETLINK answers: File exists錯誤解決方法
CentOS7 Failed to start LSB: Bring up/down networking
RTNETLINK answers: File exists錯誤解決方法
https://blog.csdn.net/u010719917/article/details/79423180
chkconfig --level 35 network on
chkconfig --level 0123456 NetworkManager off
service NetworkManager stop
service network stop
service network start
如果還不行,重啟系統看看
service network start 出現RTNETLINKanswers:Fileexists錯誤解決 或者
/etc/init.d/network start 出現RTNETLINKanswers:Fileexists錯誤解決
(其實兩者是等效的,其實前者執行的就是這個命令)
在centos下出現該故障的原因是啟動網絡的兩個服務有沖突:
/etc/init.d/network 和
/etc/init.d/NetworkManager 這兩個服務有沖突吧。
從根本上說是NetworkMaganager(NM)的帶來的沖突,停用NetworkManager即可解決,重啟即可。
1.切換到root賬戶,並用chkconfig命令查看network 和NetworkManager兩個服務的開機啟動配置情況;
=====
只是執行如下三個命令就成功了
service NetworkManager stop
service network stop
service network start
問題二:ping外網時,Destination Host Unreachable。from 內網ip
排查過程
[root@mcw7 ~]$ ping www.baidu.com PING www.a.shifen.com (220.181.38.149) 56(84) bytes of data. From bogon (172.16.1.137) icmp_seq=1 Destination Host Unreachable 查看能通外網的路由表 [root@mcw8 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 101 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 查看不通外網的路由表,發現缺少了一條關於10.0.0.2的路由, 應該加一條如上的路由試試0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 加錯路由了,刪除 route add -host 10.0.0.137 gw 10.0.0.2 [root@mcw7 ~]$ route add -host 10.0.0.137 gw 10.0.0.2 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 10.0.0.137 10.0.0.2 255.255.255.255 UGH 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 刪除路由 -host后面的ip,在路由的第一列,目標地址。我這里應該填0.0.0.0。目標地址是任意的,指定gw是10.0.0.2 [root@mcw7 ~]$ route del -host 10.0.0.137 dev ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 [root@mcw7 ~]$ -host是指去往的目的主機,這里子網掩碼應該設置為0.0.0.0,需要手動刪除重建。旗幟貌似多了H,不知道干嘛的 [root@mcw7 ~]$ route add -host 0.0.0.0 gw 10.0.0.2 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 255.255.255.255 UGH 0 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 刪除指定目的主機,指定網卡接口 [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 看提示信息,指定掩碼用netmask,genmask和mask和是255的那種,取補集 [root@mcw7 ~]$ route add -host 0.0.0.0 MASK 0.0.0.0 gw 10.0.0.2 Usage: inet_route [-vF] del {-host|-net} Target[/prefix] [gw Gw] [metric M] [[dev] If] inet_route [-vF] add {-host|-net} Target[/prefix] [gw Gw] [metric M] [netmask N] [mss Mss] [window W] [irtt I] [mod] [dyn] [reinstate] [[dev] If] inet_route [-vF] add {-host|-net} Target[/prefix] [metric M] reject inet_route [-FC] flush NOT supported [root@mcw7 ~]$ [root@mcw7 ~]$ [root@mcw7 ~]$ route add -host 0.0.0.0 netmask 0.0.0.0 gw 10.0.0.2 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 255.255.255.255 UGH 0 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 [root@mcw7 ~]$ 再次刪除 route del 指定目的主機,指定接口 [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37
真正解決前的描述
刪除默認網關 [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens33 SIOCDELRT: No such process [root@mcw7 ~]$ [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens37 SIOCDELRT: No such process [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 101 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@mcw7 ~]$ route del -host 0.0.0.0 SIOCDELRT: No such process [root@mcw7 ~]$ route del default [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 100 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@mcw7 ~]$ route del default [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@mcw7 ~]$
真正的解決這個問題
參考:https://www.cnblogs.com/skgoo/p/13559964.html
mcw8上ping不能到外網,顯示包來自服務器內網ip。 [root@mcw8 ~]$ ping www.baidu.com PING www.a.shifen.com (39.156.66.14) 56(84) bytes of data. From mcw8 (172.16.1.138) icmp_seq=1 Destination Host Unreachable mcw9上能ping通外網,顯示包來着外網百度ip [root@mcw9 ~]$ ping www.baidu.com PING www.a.shifen.com (39.156.66.18) 56(84) bytes of data. 64 bytes from 39.156.66.18 (39.156.66.18): icmp_seq=1 ttl=128 time=43.2 ms 查看mcw9正常網關,是有10.0.0.2的網關ip [root@mcw9 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 101 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 查看mcw8異常網絡的路由,沒有外網的網關10.0.0.2。 [root@mcw8 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 給mcw8添加默認網關,上面之前添加各種路由,結果genmask都不對,不能變成0.0.0.0。而使用如下命令,才實現了 Destination是0.0.0.0,Gateway是10.0.0.2,Genmask是0.0.0.0 ,Flags是UG,Iface是ens33。然后才成功訪問外網 [root@mcw8 ~]$ route add default gw 10.0.0.2 [root@mcw8 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 0 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 在mcw8上可以正常訪問外網了 [root@mcw8 ~]$ ping www.baidu.com PING www.a.shifen.com (39.156.66.14) 56(84) bytes of data. 64 bytes from 39.156.66.14 (39.156.66.14): icmp_seq=1 ttl=128 time=23.5 ms 64 bytes from 39.156.66.14 (39.156.66.14): icmp_seq=2 ttl=128 time=36.7 ms
由於第二次部署flannel下面網絡不通了,網站訪問不了(查域名是禁止查詢的域名),但是我以前有把這個文件內容保存下來。這樣我直接把文件內容復制進來,直接部署就可以了。如下
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml [machangwei@mcw7 ~]$ ls mcw.txt mm.yml scripts tools [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created
因為忘記init的加入集群的命令了。所以當我要kubeadm init,然后執行kubeadm reset之后,原本有的容器都沒了
排查過程,以及IPtables規則的導出和導入
[root@mcw7 ~]$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [root@mcw7 ~]$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [root@mcw7 ~]$ 重設然后重初始化后,網絡也沒有的 [root@mcw7 ~]$ docker ps|grep kube-flannel [root@mcw7 ~]$ 進入普通用戶重新部署網絡報錯 [machangwei@mcw7 ~]$ ls mcw.txt mm.yml scripts tools [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") [machangwei@mcw7 ~]$ 查詢之前重設的信息。發現說不能清除CNI的信息 [root@mcw7 ~]$ echo y|kubeadm reset [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: [preflight] Running pre-flight checks [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] [reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni] The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file. 移除文件不管用 [root@mcw7 ~]$ mv /etc/cni/net.d /etc/cni/net.dbak [root@mcw7 ~]$ ipvsadm --clear -bash: ipvsadm: command not found 查看了一大堆,不知道咋弄 [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere INPUT_direct all -- anywhere anywhere INPUT_ZONES_SOURCE all -- anywhere anywhere 既然無法清除,那么直接從其它機子導出導入一份規則 導出: [root@mcw9 ~]$ iptables-save > /root/iptables_beifen.txt [root@mcw9 ~]$ cat iptables_beifen.txt # Generated by iptables-save v1.4.21 on Fri Jan 7 23:05:39 2022 *filter :INPUT ACCEPT [1676:135745] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [896:67997] :DOCKER - [0:0] :DOCKER-ISOLATION-STAGE-1 - [0:0] :DOCKER-ISOLATION-STAGE-2 - [0:0] :DOCKER-USER - [0:0] -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN COMMIT # Completed on Fri Jan 7 23:05:39 2022 # Generated by iptables-save v1.4.21 on Fri Jan 7 23:05:39 2022 *nat :PREROUTING ACCEPT [32:2470] :INPUT ACCEPT [32:2470] :OUTPUT ACCEPT [8:528] :POSTROUTING ACCEPT [8:528] :DOCKER - [0:0] -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A DOCKER -i docker0 -j RETURN COMMIT # Completed on Fri Jan 7 23:05:39 2022 [root@mcw9 ~]$ cat mcw7上導入規則。出錯,文件有問題。第一行注釋一下吧 [root@mcw7 ~]$ iptables-restore</root/daoru.txt iptables-restore: line 1 failed [root@mcw7 ~]$ cat daoru.txt #命令 ptables-save v1.4.21 on Fri Jan 7 23:05:39 2022 *filter :INPUT ACCEPT [1676:135745] 導入,防火牆規則一致了 https://blog.csdn.net/jake_tian/article/details/102548306 [root@mcw7 ~]$ iptables-restore</root/daoru.txt [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere ============ 再次執行,試一試 重試 [root@mcw7 ~]$ echo y|kubeadm reset 再看防火牆,貌似是沒有變化 [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere [root@mcw7 ~]$ 重新初始化后 [root@mcw7 ~]$ kubeadm init --apiserver-advertise-address 10.0.0.137 --pod-network-cidr=10.244.0.0/24 --image-repository=registry.aliyuncs.com/google_containers kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u \ --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462 [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES (2 references) target prot opt source destination Chain KUBE-FIREWALL (2 references) target prot opt source destination DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000 你可能無法記住做題的步驟,但是你能根據筆記把題很快做出來,還有把握保證是對的 你可能無法記住部署的步驟,執行的每一個命令,但是你能根據自己以前的筆記很快做出來 原來這個問題跟防火牆沒有關系。 [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") [machangwei@mcw7 ~]$ [machangwei@mcw7 ~]$ kubectl get nodes Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
真正的解決方法:
做法如下,重新用普通用戶配置kubectl,以前的配置失效了
[machangwei@mcw7 ~]$ ls -a . .. .bash_history .bash_logout .bash_profile .bashrc .kube mcw.txt mm.yml scripts tools .viminfo [machangwei@mcw7 ~]$ mv .kube kubebak [machangwei@mcw7 ~]$ mkdir -p $HOME/.kube [machangwei@mcw7 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [machangwei@mcw7 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config [machangwei@mcw7 ~]$ kubectl get node NAME STATUS ROLES AGE VERSION mcw7 NotReady control-plane,master 10m v1.23.1
重新創建網絡
[machangwei@mcw7 ~]$ ls kubebak mcw.txt mm.yml scripts tools [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created
此時再次用-L查看防火牆
[root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- mcw7/16 anywhere ACCEPT all -- anywhere mcw7/16 Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES (2 references) target prot opt source destination Chain KUBE-FIREWALL (2 references) target prot opt source destination DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000 ^C 看規則,應該用下面的才是合適的 [root@mcw7 ~]$ iptables-save # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *nat :PREROUTING ACCEPT [372:18270] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [239:14302] :POSTROUTING ACCEPT [239:14302] :DOCKER - [0:0] :KUBE-KUBELET-CANARY - [0:0] :KUBE-MARK-DROP - [0:0] :KUBE-MARK-MASQ - [0:0] :KUBE-NODEPORTS - [0:0] :KUBE-POSTROUTING - [0:0] :KUBE-PROXY-CANARY - [0:0] :KUBE-SEP-6E7XQMQ4RAYOWTTM - [0:0] :KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0] :KUBE-SEP-N4G2XR5TDX7PQE7P - [0:0] :KUBE-SEP-XOVE7RWZIDAMLO2S - [0:0] :KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0] :KUBE-SEP-ZP3FB6NMPNCO4VBJ - [0:0] :KUBE-SEP-ZXMNUKOKXUTL2MK2 - [0:0] :KUBE-SERVICES - [0:0] :KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0] :KUBE-SVC-JD5MR3NA4I4DYORP - [0:0] :KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0] :KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0] -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN -A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/24 -j RETURN -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE -A DOCKER -i docker0 -j RETURN -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000 -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0 -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE -A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ -A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.3:53 -A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ -A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.2:53 -A KUBE-SEP-N4G2XR5TDX7PQE7P -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ -A KUBE-SEP-N4G2XR5TDX7PQE7P -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.2:9153 -A KUBE-SEP-XOVE7RWZIDAMLO2S -s 10.0.0.137/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ -A KUBE-SEP-XOVE7RWZIDAMLO2S -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 10.0.0.137:6443 -A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ -A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.2:53 -A KUBE-SEP-ZP3FB6NMPNCO4VBJ -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ -A KUBE-SEP-ZP3FB6NMPNCO4VBJ -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.3:9153 -A KUBE-SEP-ZXMNUKOKXUTL2MK2 -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ -A KUBE-SEP-ZXMNUKOKXUTL2MK2 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.3:53 -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4 -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS -A KUBE-SVC-ERIFXISQEP7F7OF4 ! -s 10.244.0.0/24 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-IT2ZTR26TO4XFPTO -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-ZXMNUKOKXUTL2MK2 -A KUBE-SVC-JD5MR3NA4I4DYORP ! -s 10.244.0.0/24 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ -A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-N4G2XR5TDX7PQE7P -A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-SEP-ZP3FB6NMPNCO4VBJ -A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.244.0.0/24 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-XOVE7RWZIDAMLO2S -A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 10.244.0.0/24 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIL6JZP7A3QYXJU2 -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-6E7XQMQ4RAYOWTTM COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *mangle :PREROUTING ACCEPT [376111:67516258] :INPUT ACCEPT [369347:67204288] :FORWARD ACCEPT [6764:311970] :OUTPUT ACCEPT [369958:67425919] :POSTROUTING ACCEPT [371215:67488646] :FORWARD_direct - [0:0] :INPUT_direct - [0:0] :KUBE-KUBELET-CANARY - [0:0] :KUBE-PROXY-CANARY - [0:0] :OUTPUT_direct - [0:0] :POSTROUTING_direct - [0:0] :PREROUTING_ZONES - [0:0] :PREROUTING_ZONES_SOURCE - [0:0] :PREROUTING_direct - [0:0] :PRE_docker - [0:0] :PRE_docker_allow - [0:0] :PRE_docker_deny - [0:0] :PRE_docker_log - [0:0] :PRE_public - [0:0] :PRE_public_allow - [0:0] :PRE_public_deny - [0:0] :PRE_public_log - [0:0] -A PREROUTING -j PREROUTING_direct -A PREROUTING -j PREROUTING_ZONES_SOURCE -A PREROUTING -j PREROUTING_ZONES -A INPUT -j INPUT_direct -A FORWARD -j FORWARD_direct -A OUTPUT -j OUTPUT_direct -A POSTROUTING -j POSTROUTING_direct -A PREROUTING_ZONES -i ens33 -g PRE_public -A PREROUTING_ZONES -i docker0 -j PRE_docker -A PREROUTING_ZONES -i ens37 -g PRE_public -A PREROUTING_ZONES -g PRE_public -A PRE_docker -j PRE_docker_log -A PRE_docker -j PRE_docker_deny -A PRE_docker -j PRE_docker_allow -A PRE_public -j PRE_public_log -A PRE_public -j PRE_public_deny -A PRE_public -j PRE_public_allow COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *security :INPUT ACCEPT [591940:133664590] :FORWARD ACCEPT [1257:62727] :OUTPUT ACCEPT [596315:107591486] :FORWARD_direct - [0:0] :INPUT_direct - [0:0] :OUTPUT_direct - [0:0] -A INPUT -j INPUT_direct -A FORWARD -j FORWARD_direct -A OUTPUT -j OUTPUT_direct COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *raw :PREROUTING ACCEPT [376111:67516258] :OUTPUT ACCEPT [369958:67425919] :OUTPUT_direct - [0:0] :PREROUTING_ZONES - [0:0] :PREROUTING_ZONES_SOURCE - [0:0] :PREROUTING_direct - [0:0] :PRE_docker - [0:0] :PRE_docker_allow - [0:0] :PRE_docker_deny - [0:0] :PRE_docker_log - [0:0] :PRE_public - [0:0] :PRE_public_allow - [0:0] :PRE_public_deny - [0:0] :PRE_public_log - [0:0] -A PREROUTING -j PREROUTING_direct -A PREROUTING -j PREROUTING_ZONES_SOURCE -A PREROUTING -j PREROUTING_ZONES -A OUTPUT -j OUTPUT_direct -A PREROUTING_ZONES -i ens33 -g PRE_public -A PREROUTING_ZONES -i docker0 -j PRE_docker -A PREROUTING_ZONES -i ens37 -g PRE_public -A PREROUTING_ZONES -g PRE_public -A PRE_docker -j PRE_docker_log -A PRE_docker -j PRE_docker_deny -A PRE_docker -j PRE_docker_allow -A PRE_public -j PRE_public_log -A PRE_public -j PRE_public_deny -A PRE_public -j PRE_public_allow COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *filter :INPUT ACCEPT [14882:2406600] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [15254:2447569] :DOCKER - [0:0] :DOCKER-ISOLATION-STAGE-1 - [0:0] :DOCKER-ISOLATION-STAGE-2 - [0:0] :DOCKER-USER - [0:0] :KUBE-EXTERNAL-SERVICES - [0:0] :KUBE-FIREWALL - [0:0] :KUBE-FORWARD - [0:0] :KUBE-KUBELET-CANARY - [0:0] :KUBE-NODEPORTS - [0:0] :KUBE-PROXY-CANARY - [0:0] :KUBE-SERVICES - [0:0] -A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A INPUT -j KUBE-FIREWALL -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A FORWARD -s 10.244.0.0/16 -j ACCEPT -A FORWARD -d 10.244.0.0/16 -j ACCEPT -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT -j KUBE-FIREWALL -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT COMMIT # Completed on Sat Jan 8 07:35:11 2022 [root@mcw7 ~]$ 回頭研究忘記加入集群的命令,如何重新生成,以及是否對已加入集群的節點是否產生影響
/proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1問題解決
重新加入節點,有警告信息,我們應該把警告信息注意起來,比如docker讓它開機啟動,如果我們虛擬機沒有設置開機啟動,那么萬一重啟了虛擬機,容器就掛了
[root@mcw8 ~]$ kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u \ > --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462 [preflight] Running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING Hostname]: hostname "mcw8" could not be reached [WARNING Hostname]: hostname "mcw8": lookup mcw8 on 10.0.0.2:53: no such host error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher 解決方法 [root@mcw8 ~]$ echo "1" >/proc/sys/net/bridge/bridge-nf-call-iptables [root@mcw8 ~]$ kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462[preflight] Running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING Hostname]: hostname "mcw8" could not be reached [WARNING Hostname]: hostname "mcw8": lookup mcw8 on 10.0.0.2:53: no such host ^C [root@mcw8 ~]$ echo y|kubeadm reset 如上之后,還是不行,加不到mcw7 master節點,之前記得mcw8和mcw9兩個node是沒有部署k8s網絡的,現在部署一下再試試。配置普通用戶kubectl,然后 [machangwei@mcw7 ~]$ scp mm.yml 10.0.0.138:/home/machangwei/ machangwei@10.0.0.138's password: mm.yml 100% 5412 8.5MB/s 00:00 [machangwei@mcw7 ~]$ scp mm.yml 10.0.0.139:/home/machangwei/ machangwei@10.0.0.139's password: mm.yml 但是節點是不需要配置普通用戶的kubectl的,因為缺少文件的 [root@mcw8 ~]$ su - machangwei [machangwei@mcw8 ~]$ mkdir -p $HOME/.kube [machangwei@mcw8 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config cp: cannot stat ‘/etc/kubernetes/admin.conf’: No such file or directory [machangwei@mcw8 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config chown: cannot access ‘/home/machangwei/.kube/config’: No such file or directory 加入集群一直卡住 ,加一個--V=2的參數,打印詳情 [root@mcw8 ~]$ kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462 --v=2 I0108 00:54:46.002913 32058 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName I0108 00:54:46.068584 32058 initconfiguration.go:117] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I0108 00:54:46.068919 32058 preflight.go:92] [preflight] Running general checks 發現報錯信息 I0108 00:54:46.849380 32058 checks.go:620] validating kubelet version I0108 00:54:46.927861 32058 checks.go:133] validating if the "kubelet" service is enabled and active I0108 00:54:46.938910 32058 checks.go:206] validating availability of port 10250 I0108 00:54:46.960668 32058 checks.go:283] validating the existence of file /etc/kubernetes/pki/ca.crt I0108 00:54:46.960707 32058 checks.go:433] validating if the connectivity type is via proxy or direct I0108 00:54:46.960795 32058 join.go:530] [preflight] Discovering cluster-info I0108 00:54:46.960846 32058 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "10.0.0.137:6443" I0108 00:54:46.997909 32058 token.go:118] [discovery] Requesting info from "10.0.0.137:6443" again to validate TLS against the pinned public key I0108 00:54:47.003864 32058 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.137:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": x509: certificate has expired or is not yet valid: current time 2022-01-08T00:54:47+08:00 is before 2022-01-07T23:18:44Z 時間不一致,將mcw8改到錯誤時間前。mcw7也改了before 2022-01-07T23:18:44Z。然后錯誤已經變成別的了 [root@mcw8 ~]$ date -s "2022-1-7 23:10:00" Fri Jan 7 23:10:00 CST 2022
根據上面可知,錯誤變成如下了net/http: request canceled (Client.Timeout exceeded while awaiting headers)
錯誤變成如下了
ter-info?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I0108 01:27:42.577217 32662 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.137:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
k8s系統容器總是起不來,停掉,報錯如下。然后把停掉的所有容器多刪幾次,就好了。重新添加進集群,報錯:拒絕
rpc error: code = Unknown desc = failed to create a sandbox for pod \"coredns-6d8c4cb4d-8l99d\": Error response from daemon: Conflict. The container name \"/k8s_POD_coredns-6d8c4cb4d-8l99d_kube-system_e030f426-3e8e-46fe-9e05-6c42a332f650_2\" is already in use by container \"b2dbcdd338ab4b2c35d5386e50e7e116fd41f26a0053a84ec3f1329e09d454a4\". You have to remove (or rename) that container to be able to reuse that name." pod="kube-system/coredns-6d8c4cb4d-8l99d"
[root@mcw8 ~]$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2edd274fd7b5 e6ea68648f0c "/opt/bin/flanneld -…" 7 seconds ago Exited (1) 5 seconds ago k8s_kube-flannel_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_2 5b1715be012d quay.io/coreos/flannel "cp -f /etc/kube-fla…" 28 seconds ago Exited (0) 27 seconds ago k8s_install-cni_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_0 7beb96ed15be rancher/mirrored-flannelcni-flannel-cni-plugin "cp -f /flannel /opt…" About a minute ago Exited (0) About a minute ago k8s_install-cni-plugin_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_0 4e998fdfce3e registry.aliyuncs.com/google_containers/kube-proxy "/usr/local/bin/kube…" 2 minutes ago Up 2 minutes k8s_kube-proxy_kube-proxy-5p7dn_kube-system_92b1b38a-f6fa-4308-93fb-8045d2bae63f_0 fed18476d9a3 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_0 ebc2403e3052 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-proxy-5p7dn_kube-system_92b1b38a-f6fa-4308-93fb-8045d2bae63f_0 已經好了 [machangwei@mcw7 ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcw7 Ready control-plane,master 7m22s v1.23.1 mcw8 Ready <none> 4m51s v1.23.1 mcw9 Ready <none> 3m45s v1.23.1 [machangwei@mcw7 ~]$ 每個加進集群部署好的節點,都有三個容器。加進集群的命令是訪問主節點apiserver服務。然后就開始拉取鏡像部署節點上的容器了 k8s_kube-proxy_kube- k8s_POD_kube-proxy-n k8s_POD_kube-flannel
pod狀態:ContainerCreating,ErrImagePull,ImagePullBackOff
[machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-d2kwp 0/1 ContainerCreating 0 9m7s mcw01dep-nginx-5dd785954d-szdjd 0/1 ErrImagePull 0 9m7s mcw01dep-nginx-5dd785954d-v9x8j 0/1 ErrImagePull 0 9m7s [machangwei@mcw7 ~]$ [machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-d2kwp 0/1 ContainerCreating 0 9m15s mcw01dep-nginx-5dd785954d-szdjd 0/1 ImagePullBackOff 0 9m15s mcw01dep-nginx-5dd785954d-v9x8j 0/1 ImagePullBackOff 0 9m15s node上的容器都刪除,但是主節點pod還是刪不掉了,強制刪除 [machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-v9x8j 0/1 Terminating 0 33m [machangwei@mcw7 ~]$ kubectl delete pod mcw01dep-nginx-5dd785954d-v9x8j --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "mcw01dep-nginx-5dd785954d-v9x8j" force deleted [machangwei@mcw7 ~]$ kubectl get pod No resources found in default namespace. 拉取鏡像無效???容器都起來了 [machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-65zd4 0/1 ContainerCreating 0 118s mcw01dep-nginx-5dd785954d-hfw2k 0/1 ContainerCreating 0 118s mcw01dep-nginx-5dd785954d-qxzpl 0/1 ContainerCreating 0 118s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 112s default-scheduler Successfully assigned default/mcw01dep-nginx-5dd785954d-65zd4 to mcw8 Normal Pulling <invalid> kubelet Pulling image "nginx" 去node節點查看,原來起的是k8s_POD_mcw01dep-nginx這個,不是k8s_mcw01dep-nginx 既然主節點查看pod信息,拉取Nginx的年齡是無效 ,那么去node節點mcw8上直接手動拉取鏡像 [root@mcw8 ~]$ docker pull nginx #鏡像手動拉取成功 Status: Downloaded newer image for nginx:latest docker.io/library/nginx:latest 再次查看pod詳情 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7m21s default-scheduler Successfully assigned default/mcw01dep-nginx-5dd785954d-65zd4 to mcw8 Normal Pulling <invalid> kubelet Pulling image "nginx" 看到第一行顯示調度,也就是每個容器都有個同名的POD容器,那是個調度。來自默認調度,消息里還能看到pod部署到哪個節點了, 多次查看,我已經將mcw8節點拉取了鏡像,但是它沒認出來,也沒有重新拉取,既然如此,我刪掉pod,讓它自動重建pod,從mcw8節點本地拉取鏡像 查看pod,帶有命名空間的顯示年齡是無效的,也就是mcw8和9的網絡存在問題,這個是不是要重新生成呢?這個網絡是節點加入到集群時創建的 [machangwei@mcw7 ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-tvz9q 0/1 CrashLoopBackOff 102 (<invalid> ago) 8h 10.0.0.138 mcw8 <none> <none> kube-system kube-flannel-ds-v28gj 1/1 Running 102 (<invalid> ago) 8h 10.0.0.139 mcw9 <none> <none>
刪除k8s系統的pod要指定命名空間
[machangwei@mcw7 ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-tvz9q 0/1 CrashLoopBackOff 103 (<invalid> ago) 8h 10.0.0.138 mcw8 <none> <none> kube-system kube-flannel-ds-v28gj 0/1 CrashLoopBackOff 102 (<invalid> ago) 8h 10.0.0.139 mcw9 <none> <none> kube-system kube-flannel-ds-vjfkz 1/1 Running 0 8h 10.0.0.137 mcw7 <none> <none> [machangwei@mcw7 ~]$ kubectl delete pod kube-flannel-ds-tvz9q Error from server (NotFound): pods "kube-flannel-ds-tvz9q" not found [machangwei@mcw7 ~]$ kubectl delete pod kube-flannel-ds-tvz9q --namespace=kube-system pod "kube-flannel-ds-tvz9q" deleted [machangwei@mcw7 ~]$ kubectl delete pod kube-flannel-ds-v28gj --namespace=kube-system pod "kube-flannel-ds-v28gj" deleted [machangwei@mcw7 ~]$ kubectl get pod --all-namespaces -o wide #沒啥變化,還是無效的 NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-gr7ck 0/1 CrashLoopBackOff 1 (<invalid> ago) 21s 10.0.0.138 mcw8 <none> <none> kube-system kube-flannel-ds-m6qgl 1/1 Running 1 (<invalid> ago) 6s 10.0.0.139 mcw9 <none> <none> kube-system kube-flannel-ds-vjfkz 1/1 Running 0 8h 10.0.0.137 mcw7 <none> <non
克隆虛擬機容器出各種問題,如果是創建的虛擬機沒有這方面問題。
重新創建三個虛擬機,部署過程中遇到如下問題:coredns一直是peding,
[machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-nsv4x 0/1 Pending 0 8m59s kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 8m59s
排查過程:
查看錯誤信息: [machangwei@mcwk8s-master ~]$ kubectl describe pod coredns-6d8c4cb4d-nsv4x -namespace=kube-system Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 21s (x7 over 7m9s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 解決方案: 默認 k8s 不允許往 master 節點裝東西,強行設置下允許:kubectl taint nodes --all node-role.kubernetes.io/master- [machangwei@mcwk8s-master ~]$ kubectl get nodes #查看節點,主節點未准備。執行如下命令,讓主節點也作為一個node NAME STATUS ROLES AGE VERSION mcwk8s-master NotReady control-plane,master 16m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl taint nodes --all node-role.kubernetes.io/master- node/mcwk8s-master untainted [machangwei@mcwk8s-master ~]$ pod描述里有; Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s 允許master節點部署pod,使用命令如下: kubectl taint nodes --all node-role.kubernetes.io/master- 禁止master部署pod kubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedule Jan 9 11:51:52 mcw10 kubelet: I0109 11:51:52.636701 25612 cni.go:240] "Unable to update cni config" err="no networks found in /etc/cni/net.d" Jan 9 11:51:53 mcw10 kubelet: E0109 11:51:53.909336 25612 kubelet.go:2347] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized" Jan 9 11:51:57 mcw10 kubelet: I0109 11:51:57.637836 25612 cni.go:240] "Unable to update cni config" err="no networks found in /etc/cni/net.d" [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master NotReady control-plane,master 43m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 0/1 Pending 0 18m kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 42m
結果發現跟之前的解決貌似沒有關系,這是因為沒有部署網絡的原因,我部署好網絡,dns的兩個pod就好了
如下: [machangwei@mcwk8s-master ~]$ kubectl apply -f mm.yml #部署網絡 Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created [machangwei@mcwk8s-master ~]$ kubectl get nodes #查看節點還沒有好 NAME STATUS ROLES AGE VERSION mcwk8s-master NotReady control-plane,master 45m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces #查看dns pod沒有好,查看flannel初始化還沒有好 NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 0/1 Pending 0 20m kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 44m kube-system kube-flannel-ds-w8v9s 0/1 Init:0/2 0 14s [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces #再次查看拉取鏡像失敗 NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 0/1 Pending 0 20m kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 45m kube-system kube-flannel-ds-w8v9s 0/1 Init:ErrImagePull 0 45s [machangwei@mcwk8s-master ~]$ kubectl describe pod kube-flannel-ds-w8v9s --namespace=kube-system #查看描述信息 Warning Failed 4m26s kubelet Error: ErrImagePull #一直是拉取鏡像失敗,查看網絡沒有問題的 Warning Failed 4m25s kubelet Error: ImagePullBackOff #三分鍾才拉取鏡像成功 Normal BackOff 4m25s kubelet Back-off pulling image "quay.io/coreos/flannel:v0.15.1" Normal Pulling 4m15s (x2 over 4m45s) kubelet Pulling image "quay.io/coreos/flannel:v0.15.1" Normal Pulled 3m36s kubelet Successfully pulled image "quay.io/coreos/flannel:v0.15.1" in 39.090145025s Normal Created 3m35s kubelet Created container install-cni Normal Started 3m35s kubelet Started container install-cni Normal Pulled 3m35s kubelet Container image "quay.io/coreos/flannel:v0.15.1" already present on machine Normal Created 3m35s kubelet Created container kube-flannel Normal Started 3m34s kubelet Started container kube-flannel 再次查看節點,已經是ready了,也就是說部署好網絡,coredns才好,master節點作為一個node才ready [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master Ready control-plane,master 57m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 1/1 Running 0 32m kube-system coredns-6d8c4cb4d-t7hr6 1/1 Running 0 56m kube-system etcd-mcwk8s-master 1/1 Running 0 57m kube-system kube-apiserver-mcwk8s-master 1/1 Running 0 57m kube-system kube-controller-manager-mcwk8s-master 1/1 Running 0 57m kube-system kube-flannel-ds-w8v9s 1/1 Running 0 12m kube-system kube-proxy-nvw6m 1/1 Running 0 56m kube-system kube-scheduler-mcwk8s-master 1/1 Running 0 57m
node1上執行加入集群后,master上多出兩個網絡flannel沒有ready的pod
是節點的網絡,貌似不影響使用,暫時沒有影響
[root@mcwk8s-node1 ~]$ kubeadm join 10.0.0.140:6443 --token 8yficm.352yz89c44mqk4y6 \ > --discovery-token-ca-cert-hash sha256:bcd36381d3de0adb7e05a12f688eee4043833290ebd39366fc47dd5233c552bf master上多出兩個沒有ready的pod,說明是node上的沒有部署好這個網絡pod呢 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-75npz 0/1 Init:1/2 0 99s kube-system kube-flannel-ds-lpmxf 0/1 Init:1/2 0 111s kube-system kube-flannel-ds-w8v9s 1/1 Running 0 16m [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 4 (50s ago) 4m37s 10.0.0.141 mcwk8s-node1 <none> <none> kube-system kube-flannel-ds-lpmxf 0/1 Init:ImagePullBackOff 0 4m49s 10.0.0.142 mcwk8s-node2 <none> <none> kube-system kube-flannel-ds-w8v9s 1/1 Running 0 19m 10.0.0.140 mcwk8s-master <none> <none> 查看nodes狀態,現在已經有一個是ready了 [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master Ready control-plane,master 65m v1.23.1 mcwk8s-node1 Ready <none> 5m22s v1.23.1 mcwk8s-node2 NotReady <none> 5m35s v1.23.1 此時查看pod情況,雖然node已經ready了,但是網絡的pod的狀態,顯示還是有點問題的 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 5 (44s ago) 6m5s 10.0.0.141 mcwk8s-node1 <none> <none> kube-system kube-flannel-ds-lpmxf 0/1 Init:ImagePullBackOff 0 6m17s 10.0.0.142 mcwk8s-node2 <none> <none> kube-system kube-flannel-ds-w8v9s 1/1 Running 0 21m 10.0.0.140 mcwk8s-master <none> <none> 描述pod信息,查看CrashLoopBackOff這個狀態,好像是重啟容器失敗,容器已經存在了 Normal Created 5m10s (x4 over 5m59s) kubelet Created container kube-flannel Normal Started 5m10s (x4 over 5m58s) kubelet Started container kube-flannel Warning BackOff 4m54s (x5 over 5m52s) kubelet Back-off restarting failed container Normal Pulled 2m52s (x6 over 5m59s) kubelet Container image "quay.io/coreos/flannel:v0.15.1" already present on machine 描述pod信息,查看Init:ImagePullBackOff這個狀態,是鏡像拉取存在問題 Warning Failed 23s (x4 over 5m42s) kubelet Failed to pull image "quay.io/coreos/flannel:v0.15.1": rpc error: code = Unknown desc = context canceled Warning Failed 23s (x4 over 5m42s) kubelet Error: ErrImagePull
鏡像導入導出
建議: 可以依據具體使用場景來選擇命令 若是只想備份images,使用save、load即可 若是在啟動容器后,容器內容有變化,需要備份,則使用export、import 示例 docker save -o nginx.tar nginx:latest 或 docker save > nginx.tar nginx:latest 其中-o和>表示輸出到文件,nginx.tar為目標文件,nginx:latest是源鏡像名(name:tag) 示例 docker load -i nginx.tar 或 docker load < nginx.tar 其中-i和<表示從文件輸入。會成功導入鏡像及相關元數據,包括tag信息 示例 docker export -o nginx-test.tar nginx-test 其中-o表示輸出到文件,nginx-test.tar為目標文件,nginx-test是源容器名(name) docker import nginx-test.tar nginx:imp 或 cat nginx-test.tar | docker import - nginx:imp 區別: export命令導出的tar文件略小於save命令導出的 export命令是從容器(container)中導出tar文件,而save命令則是從鏡像(images)中導出 基於第二點,export導出的文件再import回去時,無法保留鏡像所有歷史(即每一層layer信息,不熟悉的可以去看Dockerfile),不能進行回滾操作;而save是依據鏡像來的,所以導入時可以完整保留下每一層layer信息。如下圖所示,nginx:latest是save導出load導入的,nginx:imp是export導出import導入的。 原文鏈接:https://blog.csdn.net/ncdx111/article/details/79878098
Init:ImagePullBackOff這個狀態的解決
查看node2上沒有flannel鏡像 [root@mcwk8s-node2 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.aliyuncs.com/google_containers/kube-proxy v1.23.1 b46c42588d51 3 weeks ago 112MB rancher/mirrored-flannelcni-flannel-cni-plugin v1.0.0 cd5235cd7dc2 2 months ago 9.03MB registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12 4 months ago 683kB 去主節點上導出一份鏡像然后上傳到node2上 [root@mcwk8s-master ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/coreos/flannel v0.15.1 e6ea68648f0c 8 weeks ago 69.5MB [root@mcwk8s-master ~]$ docker save quay.io/coreos/flannel >mcwflanel-image.tar.gz [root@mcwk8s-master ~]$ ls anaconda-ks.cfg jiarujiqun.txt mcwflanel-image.tar.gz [root@mcwk8s-master ~]$ scp mcwflanel-image.tar.gz 10.0.0.142:/root/ node2上導入鏡像成功 [root@mcwk8s-node2 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.aliyuncs.com/google_containers/kube-proxy v1.23.1 b46c42588d51 3 weeks ago 112MB rancher/mirrored-flannelcni-flannel-cni-plugin v1.0.0 cd5235cd7dc2 2 months ago 9.03MB registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12 4 months ago 683kB [root@mcwk8s-node2 ~]$ ls anaconda-ks.cfg [root@mcwk8s-node2 ~]$ ls anaconda-ks.cfg mcwflanel-image.tar.gz [root@mcwk8s-node2 ~]$ docker load < mcwflanel-image.tar.gz ab9ef8fb7abb: Loading layer [==================================================>] 2.747MB/2.747MB 2ad3602f224f: Loading layer [==================================================>] 49.46MB/49.46MB 54089bc26b6b: Loading layer [==================================================>] 5.12kB/5.12kB 8c5368be4bdf: Loading layer [==================================================>] 9.216kB/9.216kB 5c32c759eea2: Loading layer [==================================================>] 7.68kB/7.68kB Loaded image: quay.io/coreos/flannel:v0.15.1 [root@mcwk8s-node2 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.aliyuncs.com/google_containers/kube-proxy v1.23.1 b46c42588d51 3 weeks ago 112MB quay.io/coreos/flannel v0.15.1 e6ea68648f0c 8 weeks ago 69.5MB rancher/mirrored-flannelcni-flannel-cni-plugin v1.0.0 cd5235cd7dc2 2 months ago 9.03MB registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12 4 months ago 683kB 主節點上查看pod狀態,已經變化了,變成CrashLoopBackOff。重啟了很多次 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 9 (4m47s ago) 28m kube-system kube-flannel-ds-lpmxf 0/1 CrashLoopBackOff 4 (74s ago) 28m kube-system kube-flannel-ds-w8v9s 1/1 Running 0 43m
查看描述信息,重啟失敗。問題CrashLoopBackOff解決
[machangwei@mcwk8s-master ~]$ kubectl describe pod kube-flannel-ds-lpmxf --namespace=kube-system Warning BackOff 3m25s (x20 over 7m48s) kubelet Back-off restarting failed container 雖然節點上的這兩個一直不是ready,但是node狀態已經是ready了,先不管了,部署一個應用驗證一下 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 12 (114s ago) 41m kube-system kube-flannel-ds-lpmxf 0/1 CrashLoopBackOff 8 (3m46s ago) 41m [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master Ready control-plane,master 100m v1.23.1 mcwk8s-node1 Ready <none> 40m v1.23.1 mcwk8s-node2 Ready <none> 41m v1.23.1 查看環境是否安裝好了,已經沒問題可以部署應用了 [machangwei@mcwk8s-master ~]$ kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE mcw01dep-nginx 1/1 1 1 5m58s mcw02dep-nginx 1/2 2 1 71s [machangwei@mcwk8s-master ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-z7s8m 1/1 Running 0 7m21s mcw02dep-nginx-5b8b58857-7mlmh 1/1 Running 0 2m34s mcw02dep-nginx-5b8b58857-pvwdd 1/1 Running 0 2m34s 把測試資源刪掉,然后保存一份虛擬機快照,省的k8s環境變化,需要重新部署等,直接恢復快照就行。 [machangwei@mcwk8s-master ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-z7s8m 1/1 Running 0 7m21s mcw02dep-nginx-5b8b58857-7mlmh 1/1 Running 0 2m34s mcw02dep-nginx-5b8b58857-pvwdd 1/1 Running 0 2m34s [machangwei@mcwk8s-master ~]$ [machangwei@mcwk8s-master ~]$ kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE mcw01dep-nginx 1/1 1 1 7m39s mcw02dep-nginx 2/2 2 2 2m52s [machangwei@mcwk8s-master ~]$ kubectl delete deployment mcw01dep-nginx mcw02dep-nginx deployment.apps "mcw01dep-nginx" deleted deployment.apps "mcw02dep-nginx" deleted [machangwei@mcwk8s-master ~]$ kubectl get deployment No resources found in default namespace. [machangwei@mcwk8s-master ~]$ [machangwei@mcwk8s-master ~]$ kubectl get pod No resources found in default namespace.
kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
太卡,卡了哦半邊天,虛擬機
系統或者網絡占用過多CPU,造成內核軟死鎖(soft lockup)。Soft lockup名稱解釋:所謂,soft lockup就是說,這個bug沒有讓系統徹底死機,但是若干個進程(或者kernel thread)被鎖死在了某個狀態(一般在內核區域),很多情況下這個是由於內核鎖的使用的問題。