解決k8s的coredns一直處於的crashloopbackoff問題


首先來看看采坑記錄

1-查看日志:kubectl logs得到具體的報錯:

1 [root@i-F998A4DE ~]# kubectl logs -n kube-system coredns-fb8b8dccf-hhkfm 2 log is DEPRECATED and will be removed in a future version. Use logs instead. 3 E1230 03:03:51.298180       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
4 E1230 03:03:51.298180       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
5 log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-fb8b8dccf-hhkfm.unknownuser.log.ERROR.20201230-030351.1: no such file or directory

2-查看pod具體信息:kubectl describe pod得到一些可能比較沒用的信息:

 1 [root@i-F998A4DE ~]# kubectl describe po -n kube-system coredns-fb8b8dccf-s2nj9  2 Name:               coredns-fb8b8dccf-s2nj9  3 Namespace:          kube-system  4 Priority:           2000000000
 5 PriorityClassName:  system-cluster-critical  6 Node:               master/10.252.37.41
 7 Start Time:         Wed, 30 Dec 2020 10:28:40 +0800
 8 Labels:             k8s-app=kube-dns  9                     pod-template-hash=fb8b8dccf 10 Annotations:        <none>
11 Status: Running 12 IP:                 10.244.0.3
13 Controlled By:      ReplicaSet/coredns-fb8b8dccf 14 Containers: 15  coredns: 16     Container ID:  docker://50bab6b378f236af89bec945083bfe1af293a71f1276c3c8df324cfbe6540a54
17     Image:         k8s.gcr.io/coredns:1.3.1
18     Image ID:      docker://sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c
19     Ports:         53/UDP, 53/TCP, 9153/TCP 20     Host Ports:    0/UDP, 0/TCP, 0/TCP 21  Args: 22       -conf 23       /etc/coredns/Corefile 24  State: Waiting 25  Reason: CrashLoopBackOff 26  Last State: Terminated 27  Reason: Error 28       Exit Code:    2
29       Started:      Wed, 30 Dec 2020 10:29:00 +0800
30       Finished:     Wed, 30 Dec 2020 10:29:01 +0800
31  Ready: False 32     Restart Count:  2
33  Limits: 34  memory: 170Mi 35  Requests: 36  cpu: 100m 37  memory: 70Mi 38     Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
39     Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
40     Environment:  <none>
41  Mounts: 42       /etc/coredns from config-volume (ro) 43       /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-2gw5w (ro) 44 Conditions: 45  Type Status 46  Initialized True 47  Ready False 48  ContainersReady False 49  PodScheduled True 50 Volumes: 51   config-volume: 52  Type: ConfigMap (a volume populated by a ConfigMap) 53  Name: coredns 54     Optional:  false
55   coredns-token-2gw5w: 56  Type: Secret (a volume populated by a Secret) 57     SecretName:  coredns-token-2gw5w 58     Optional:    false
59 QoS Class: Burstable 60 Node-Selectors:  beta.kubernetes.io/os=linux 61 Tolerations: CriticalAddonsOnly 62                  node-role.kubernetes.io/master:NoSchedule 63                  node.kubernetes.io/not-ready:NoExecute for 300s 64                  node.kubernetes.io/unreachable:NoExecute for 300s 65 Events: 66  Type Reason Age From Message 67   ----     ------            ----               ----               -------
68   Warning  FailedScheduling  38s (x4 over 48s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
69   Normal   Scheduled         30s                default-scheduler  Successfully assigned kube-system/coredns-fb8b8dccf-s2nj9 to master 70   Normal   Pulled            11s (x3 over 29s)  kubelet, master    Container image "k8s.gcr.io/coredns:1.3.1" already present on machine 71  Normal Created 11s (x3 over 29s) kubelet, master Created container coredns 72  Normal Started 10s (x3 over 28s) kubelet, master Started container coredns 73   Warning  BackOff           2s (x6 over 26s)   kubelet, master    Back-off restarting failed container

3-修改coredns配置信息也沒有效果

 1 [root@master ~]# kubectl edit deployment coredns -n kube-system  2 # Please edit the object below. Lines beginning with a '#' will be ignored,  3 # and an empty file will abort the edit. If an error occurs while saving this file will be  4 # reopened with the relevant failures.  5 apiVersion: extensions/v1beta1  6 kind: Deployment  7 metadata:  8  annotations:  9     deployment.kubernetes.io/revision: "1"
 10   creationTimestamp: "2020-12-30T02:28:07Z"
 11   generation: 3
 12  labels:  13     k8s-app: kube-dns  14  name: coredns  15   namespace: kube-system  16   resourceVersion: "6088"
 17   selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/coredns  18   uid: a718d791-4a46-11eb-91a1-d00df998a4de  19 spec:  20   progressDeadlineSeconds: 600
 21   replicas: 2 #先將這里改為0,k8s更新完之后,再將這里改回2  22   revisionHistoryLimit: 10
 23  selector:  24  matchLabels:  25       k8s-app: kube-dns  26  strategy:  27  rollingUpdate:  28       maxSurge: 25%
 29       maxUnavailable: 1
 30  type: RollingUpdate  31  template:  32  metadata:  33       creationTimestamp: null
 34  labels:  35         k8s-app: kube-dns  36  spec:  37  containers:  38       - args:  39         - -conf  40         - /etc/coredns/Corefile  41         image: k8s.gcr.io/coredns:1.3.1
 42  imagePullPolicy: IfNotPresent  43  livenessProbe:  44           failureThreshold: 5
 45  httpGet:  46             path: /health  47             port: 8080
 48  scheme: HTTP  49           periodSeconds: 10
 50           successThreshold: 1
 51           timeoutSeconds: 1
 52  resources:  53  limits:  54  memory: 170Mi  55  requests:  56  cpu: 100m  57  memory: 70Mi  58  securityContext:  59           allowPrivilegeEscalation: false
 60  capabilities:  61  add:  62             - NET_BIND_SERVICE  63  drop:  64             - all  65  procMount: Default  66           readOnlyRootFilesystem: true
 67         terminationMessagePath: /dev/termination-log  68  terminationMessagePolicy: File  69  volumeMounts:  70         - mountPath: /etc/coredns  71           name: config-volume  72           readOnly: true
 73  dnsPolicy: Default  74  nodeSelector:  75         beta.kubernetes.io/os: linux  76       priorityClassName: system-cluster-critical  77  restartPolicy: Always  78       schedulerName: default-scheduler  79  securityContext: {}  80  serviceAccount: coredns  81  serviceAccountName: coredns  82       terminationGracePeriodSeconds: 30
 83  tolerations:  84       - key: CriticalAddonsOnly  85  operator: Exists  86       - effect: NoSchedule  87         key: node-role.kubernetes.io/master  88  volumes:  89       - configMap:  90           defaultMode: 420
 91  items:  92           - key: Corefile  93  path: Corefile  94  name: coredns  95         name: config-volume  96 status:  97   availableReplicas: 2
 98  conditions:  99   - lastTransitionTime: "2020-12-30T03:00:38Z"
100     lastUpdateTime: "2020-12-30T03:00:38Z"
101     message: ReplicaSet "coredns-fb8b8dccf" has successfully progressed. 102  reason: NewReplicaSetAvailable 103     status: "True"
104  type: Progressing 105   - lastTransitionTime: "2020-12-30T03:38:49Z"
106     lastUpdateTime: "2020-12-30T03:38:49Z"
107  message: Deployment has minimum availability. 108  reason: MinimumReplicasAvailable 109     status: "True"
110  type: Available 111   observedGeneration: 3
112   readyReplicas: 2
113   replicas: 2 #先將這里改為0,k8s更新完后,這里不用動 114   updatedReplicas: 2

4-強制刪除coredns pod沒有效果

1 [root@i-F998A4DE ~]# kubectl delete po coredns-fb8b8dccf-hhkfm --grace-period=0 --force -n kube-system 2 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. 3 pod "coredns-fb8b8dccf-hhkfm" force deleted 4 [root@i-F998A4DE flannel-dashboard]# kubectl delete po coredns-fb8b8dccf-ll2mp --grace-period=0 --force -n kube-system 5 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. 6 pod "coredns-fb8b8dccf-ll2mp" force deleted

5-查看kubelet的訪問也是coredns出錯

1 [root@i-F998A4DE ~]# journalctl -f -u kubelet 2 -- Logs begin at Tue 2020-12-29 11:56:05 CST. --
3 Dec 30 11:30:38 master kubelet[20570]: W1230 11:30:38.307384   20570 container.go:409] Failed to create summary reader for "/libcontainer_16449_systemd_test_default.slice": none of the resources are being tracked. 4 Dec 30 11:30:40 master kubelet[20570]: E1230 11:30:40.356882   20570 pod_workers.go:190] Error syncing pod 2c0dffd5-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"
5 Dec 30 11:30:41 master kubelet[20570]: E1230 11:30:41.375798   20570 pod_workers.go:190] Error syncing pod 2c0dffd5-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"
6 Dec 30 11:30:45 master kubelet[20570]: E1230 11:30:45.899200   20570 pod_workers.go:190] Error syncing pod 1ed96c42-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-24sxn_kube-system(1ed96c42-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-24sxn_kube-system(1ed96c42-4a4f-11eb-8c6b-d00df998a4de)"

6-本地dns配置也沒有什么問題

1 [root@i-F998A4DE ~]# cat /etc/resolv.conf 2 # Generated by NetworkManager 3 nameserver 114.114.114.114
4 nameserver 8.8.8.8
 

最后解決方案

這個問題很可能是由iptables規則的錯亂或者緩存導致的,可以依次執行以下命令解決:

1 [root@master ~]# systemctl stop kubelet 2 [root@master ~]# systemctl stop docker 3 [root@master ~]# iptables --flush 4 [root@master ~]# iptables -tnat --flush 5 [root@master ~]# systemctl start kubelet 6 [root@master ~]# systemctl start docker
 
  這里順便解釋一下,遇到問題的服務器是雲平台服務器,需要添加iptables的規則才能遠程通過ssh(22)、vnc(3389)、\\ip(445)使用這台服務器。如果在開了防火牆,並且沒有通過firewall-cmd命令設置訪問規則的情況下,清除iptables規則會導致遠程連接不上。但是如果是使用firewall-cmd命令設置的端口,iptables設置的規則即使清除了,對服務器系統本身的防火牆也沒有影響。從Centos7以后,iptables服務的啟動腳本已被忽略,會使用firewalld來取代iptables服務。在RHEL7里,默認是使用firewalld來管理netfilter子系統,不過底層調用的命令仍然是iptables。firewalld是iptables的前端控制器,用於實現持久的網絡流量規則。
 
  這是第一篇博文,有誤之處,歡迎指正!
 
 
參考:
 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM