1、高可用腳本安裝完etcd后啟動失敗 解決:所有節點重啟即可解決。這樣的情況遇到了三次,就是因為電腦太卡了,當時cpu利用率很高,達到了94%。腳本是正確的,跟腳本沒有關系
所以最好分開安裝,先安裝etcd集群,然后重啟所有節點,再安裝k8s部分, 2、kube-apiserver報錯: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input 現象如下: [root@test1 ssl]# systemctl status kube-apiserver -l ● kube-apiserver.service - Kubernetes API Server Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2019-02-06 18:14:58 EST; 1h 3min ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 1684 (kube-apiserver) Tasks: 16 Memory: 11.4M CGroup: /system.slice/kube-apiserver.service └─1684 /opt/k8s/bin/kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --anonymous-auth=false # --experimental-encryption-provider-config=/etc/kubernetes/encryption-config.yaml --advertise-address=192.168.0.91 --bind-address=192.168.0.91 --insecure-port=8080 --authorization-mode=Node,RBAC # --runtime-config=api/all --enable-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-cluster-ip-range=10.254.0.0/16 --service-node-port-range=8000-30000 --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem --client-ca-file=/etc/kubernetes/cert/ca.pem --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem --etcd-cafile=/etc/kubernetes/cert/ca.pem --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem --service-account-key-file=/etc/kubernetes/cert/sa.pub --etcd-servers=https://192.168.0.91:2379,https://192.168.0.92:2379,https://192.168.0.93:2379 --enable-swagger-ui=true --secure-port=6443 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --allow-privileged=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/log/kube-apiserver-audit.log --event-ttl=1h --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.055401 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.650493 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.074728 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.666053 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.103077 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.689155 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.123484 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.707282 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.246831 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.729613 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input 解決:用腳本重新裝了一遍好了。 3、kube-apiserver無法啟動:external host was not specified, using 192.168.0.91 解決:kube-apiserver啟動文件里面的注釋都刪掉即可解決 4、kubelet日志有錯誤:No valid private key and/or certificate found, reusing existing private key or creating a new one 下面報錯是正常的,但是還是排查了一遍發現兩個致命錯誤 [root@test4 kubernetes]# systemctl status kubelet -l ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled) Active: active (running) since Thu 2019-02-07 07:24:53 EST; 5s ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 73646 (kubelet) Tasks: 12 Memory: 15.2M CGroup: /system.slice/kubelet.service └─73646 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.021451 73646 server.go:407] Version: v1.13.0 Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024450 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024837 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025195 73646 plugins.go:103] No cloud provider specified. Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025304 73646 server.go:523] No cloud provider specified: "" from the config file: "" Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025410 73646 bootstrap.go:65] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.043219 73646 bootstrap.go:96] No valid private key and/or certificate found, reusing existing private key or creating a new one Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.176716 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials Feb 07 07:24:56 test4 kubelet[73646]: I0207 07:24:56.347469 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials Feb 07 07:24:58 test4 kubelet[73646]: I0207 07:24:58.451741 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials 錯誤一: 查看生成boootstrap配置文件發現錯誤, 發現BOOTSTRAP_TOKEN=(kubeadm )竟然沒有加$,必須要加上$符號。這是最主要的錯誤,還有個錯誤看下面 BOOTSTRAP_TOKEN=(kubeadm token create --description kubelet-bootstrap-token --groups system:bootstrappers:test1 --kubeconfig ~/.kube/config) [root@test1 profile]# cat bootstrap-kubeconfig.sh #!/bin/bash #定義變量 export MASTER_VIP="192.168.0.235" export KUBE_APISERVER="https://192.168.0.235:8443" export NODE_NAMES=(test1 test2 test3 test4) cd $HOME/ssl/ for node_name in ${NODE_NAMES[*]} do #創建 token export BOOTSTRAP_TOKEN=(kubeadm token create \ --description kubelet-bootstrap-token \ --groups system:bootstrappers:${node_name} \ --kubeconfig ~/.kube/config) #設置集群參數 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/cert/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #設置客戶端認證參數 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #設置上下文參數 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #設置默認上下文 kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig done 錯誤二:查看參數配置文件發現一個錯誤 [root@test4 ~]# cat /etc/kubernetes/kubelet.config.json { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "address": "0.0.0.0", "port": 10250, "readOnlyPort": 0, "cgroupDriver": "cgroupfs", "hairpinMode": "promiscuous-bridge", "serializeImagePulls": false, "featureGates": { "RotateKubeletClientCertificate": true, "RotateKubeletServerCertificate": true }, "clusterDomain": "cluster.local.", "clusterDNS": ["10.254.0.2"] } 發現address: 0.0.0.0並不是真實的ip地址。在test4節點用hostname -i 看到的竟然是0.0.0.0,把address改成真實的worker節點ip即可 5、通過csr請求后發現沒有node 解決:發現是kubelet停了;原因是往配置文件里面加上cavidor參數后重啟了下,並沒有看狀態,之后才發現掛了,重啟即可 6、kubectl無法查詢pod資源:Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) 請仔細閱讀完下面 現象如下: [root@test4 profile]# kubectl run -it --rm --image=infoblox/dnstools dns-client kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused deployment.apps "dns-client" deleted Error from server: Get https://test4:10250/containerLogs/default/dns-client-86c6d59f7-tzh5c/dns-client: dial tcp 0.0.0.0:10250: connect: connection refused 查看coredns.yaml 文件 [root@test4 profile]# cat coredns.yaml apiVersion: v1 kind: ServiceAccount metadata: name: coredns namespace: kube-system --- apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system annotations: prometheus.io/port: "9153" prometheus.io/scrape: "true" labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" kubernetes.io/name: "CoreDNS" spec: selector: k8s-app: kube-dns clusterIP: cluster_dns_svc_ip ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP 發現沒有ip "address": "0.0.0.0", [root@test4 profile]# cat /etc/kubernetes/kubelet.config.json { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "address": "0.0.0.0", "port": 10250, "readOnlyPort": 0, "cgroupDriver": "cgroupfs", "hairpinMode": "promiscuous-bridge", "serializeImagePulls": false, "featureGates": { "RotateKubeletClientCertificate": true, "RotateKubeletServerCertificate": true }, "clusterDomain": "cluster.local.", "clusterDNS": ["10.254.0.2"] } 解決上面的問題后,扔然不管用。就懷疑是apiserver的問題,最后參照這篇文檔中的apiserver啟動配置文件 https://www.cnblogs.com/effortsing/p/10312081.html 需要在所有master節點kube-apiserver 啟動參數中添加這句話:--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname 然后重啟所有master節點 kube-apiserver,就不再報dial tcp 192.168.0.93:10250: connect: no route to host,這個錯誤,但是出現新的報錯,報錯如下: 執行查看資源報錯: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) 解決:創建apiserver到kubelet的權限,就是沒有給kubernetes用戶rbac授權,授權即可,進行如下操作: 注意:user=kubernetes ,這個user要替換掉下面yaml文件里面的用戶名 cat > apiserver-to-kubelet.yaml <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kubernetes-to-kubelet rules: - apiGroups: - "" resources: - nodes/proxy - nodes/stats - nodes/log - nodes/spec - nodes/metrics verbs: - "*" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:kubernetes namespace: "" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:kubernetes-to-kubelet subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: kubernetes EOF 創建授權: kubectl create -f apiserver-to-kubelet.yaml [root@test4 ~]# kubectl create -f apiserver-to-kubelet.yaml clusterrole.rbac.authorization.k8s.io/system:kubernetes-to-kubelet created clusterrolebinding.rbac.authorization.k8s.io/system:kubernetes created 重新進到容器查看資源 [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh / # exit 現在可以進到容器里面查看資源了 參照文檔:https://www.jianshu.com/p/b3d8e8b8fd7e 7、無法創建flannel、coredns 問題: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system 現象如下:pod都掛掉狀態 [root@test4 profile]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-mdskk 0/1 ContainerCreating 0 4s coredns-69d58bd968-xjqpj 0/1 ContainerCreating 0 3m6s kube-flannel-ds-4bgqb 0/1 Init:0/1 0 94s 查看pod日志發現錯誤: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system [root@test4 profile]# kubectl describe pod coredns-69d58bd968-f9tn4 --namespace kube-system Name: coredns-69d58bd968-f9tn4 Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: test4/192.168.0.94 Start Time: Fri, 08 Feb 2019 23:50:28 -0500 Labels: k8s-app=kube-dns pod-template-hash=69d58bd968 Annotations: <none> Status: Pending IP: Controlled By: ReplicaSet/coredns-69d58bd968 Containers: coredns: Container ID: Image: coredns/coredns:1.2.0 Image ID: Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-29dbl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-29dbl: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-29dbl Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16m default-scheduler Successfully assigned kube-system/coredns-69d58bd968-f9tn4 to test4 Warning FailedMount 68s (x7 over 14m) kubelet, test4 Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system(38cb8d7e-2c26-11e9-8db2-000c2935f634)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"coredns-69d58bd968-f9tn4". list of unmounted volumes=[coredns-token-29dbl]. list of unattached volumes=[config-volume coredns-token-29dbl] Warning FailedMount 7s (x16 over 16m) kubelet, test4 MountVolume.SetUp failed for volume "coredns-token-29dbl" : couldn't propagate object cache: timed out waiting for the condition 查看docker日志報錯是一樣的: Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist" [root@test4 profile]# systemctl status docker -l ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2019-02-08 23:23:56 EST; 50min ago Docs: https://docs.docker.com Main PID: 956 (dockerd) CGroup: /system.slice/docker.service ├─ 956 /usr/bin/dockerd └─1152 docker-containerd --config /var/run/docker/containerd/containerd.toml Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.245990170-05:00" level=error msg="Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.248503580-05:00" level=error msg="Failed to load container mount f4e32003f4c0fc39d292b2dd76dd0a0016a0b1e72028c7d4910749fc7836efde: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.250961209-05:00" level=error msg="Failed to load container mount fb5ca71237d38e0bb413ac95a858ee3e41c209a936a1f41081bf2b6a57f10a45: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.253042348-05:00" level=error msg="Failed to load container mount fb8dfb7d9813b638ac24dc9b0cde97ed095c222b22f8d44f082f5130e2f233e4: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.666363859-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.760913207-05:00" level=info msg="Loading containers: done." Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.864408002-05:00" level=info msg="Docker daemon" commit=0520e24 graphdriver(s)=overlay2 version=18.03.0-ce Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.867069598-05:00" level=info msg="Daemon has completed initialization" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.883546083-05:00" level=info msg="API listen on /var/run/docker.sock" Feb 08 23:23:56 test4 systemd[1]: Started Docker Application Container Engine. 解決:重啟docker即可 systemctl restart docker 再次查看pod馬上就正常 [root@test4 profile]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-mdskk 1/1 Running 0 3m26s coredns-69d58bd968-xjqpj 1/1 Running 0 6m28s kube-flannel-ds-4bgqb 1/1 Running 0 4m56s 再次查看docker 這才是docker和k8s結合的正常狀態 [root@test4 profile]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2019-02-09 00:28:04 EST; 33s ago Docs: https://docs.docker.com Main PID: 18711 (dockerd) Tasks: 246 Memory: 98.6M CGroup: /system.slice/docker.service ├─18711 /usr/bin/dockerd ├─18718 docker-containerd --config /var/run/docker/containerd/containerd.toml ├─19312 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19325 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19337 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19344 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19384 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19434 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19463 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19478 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19509 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19562 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19566 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20190 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20473 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20506 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20670 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20685 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20702 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20741 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21002 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21054 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... └─21270 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21054 Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21070 Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=66d80ea2e0c9a995f325.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.244294325-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=42e08bbdf67aabd17173.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.774220924-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:33 test4 dockerd[18711]: time="2019-02-09T00:28:33-05:00" level=info msg="shim docker-containerd-shim started"...d=21270 Feb 09 00:28:34 test4 dockerd[18711]: time="2019-02-09T00:28:34-05:00" level=info msg="shim docker-containerd-shim started"...d=21328 Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35-05:00" level=info msg="shim reaped" id=5114cc9a4a74c294de17.../tasks" Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35.767869814-05:00" level=info msg="ignoring event" module=lib...Delete" Hint: Some lines were ellipsized, use -l to show in full. 8、測試coredns功能時候,執行kubectl run -it --rm --image=infoblox/dnstools dns-client卡住 現象如下: [root@test4 ~]# kubectl run -it --rm --image=infoblox/dnstools dns-client kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. 查看pod [root@test4 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE busybox1-54dc95466f-kmjcp 1/1 ContainerCreating 1 40m dig-5c7554b84f-sdl8k 1/1 ContainerCreating 1 40m dns-client-2-56bdd8dfd5-pn5zn 1/1 ContainerCreating 1 40m dns-client-3-6f98f9f7df-g29d6 1/1 ContainerCreating 1 40m dns-client-86c6d59f7-znnbb 1/1 ContainerCreating 1 40m dnstools-6d4979fbbf-294ns 1/1 ContainerCreating 1 40m 原因:可能是因為flannal和coredns有問題,后來查看docker日志發現有錯誤日志;也可能是cpu標的太高,當時cpu86%。大概就是這兩種情況。 解決: 關掉一個master節點來降低cpu 重啟docker,查看docker狀態,docker和k8s結合的正常狀態應該是下面這樣的: [root@test4 profile]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2019-02-09 00:28:04 EST; 33s ago Docs: https://docs.docker.com Main PID: 18711 (dockerd) Tasks: 246 Memory: 98.6M CGroup: /system.slice/docker.service ├─18711 /usr/bin/dockerd ├─18718 docker-containerd --config /var/run/docker/containerd/containerd.toml ├─19312 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19325 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19337 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19344 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19384 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19434 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19463 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19478 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19509 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19562 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19566 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20190 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20473 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20506 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20670 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20685 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20702 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20741 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21002 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21054 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... └─21270 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21054 Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21070 Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=66d80ea2e0c9a995f325.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.244294325-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=42e08bbdf67aabd17173.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.774220924-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:33 test4 dockerd[18711]: time="2019-02-09T00:28:33-05:00" level=info msg="shim docker-containerd-shim started"...d=21270 Feb 09 00:28:34 test4 dockerd[18711]: time="2019-02-09T00:28:34-05:00" level=info msg="shim docker-containerd-shim started"...d=21328 Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35-05:00" level=info msg="shim reaped" id=5114cc9a4a74c294de17.../tasks" Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35.767869814-05:00" level=info msg="ignoring event" module=lib...Delete" Hint: Some lines were ellipsized, use -l to show in full. 再次查看pod [root@test4 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE busybox1-54dc95466f-kmjcp 1/1 Running 1 40m dig-5c7554b84f-sdl8k 1/1 Running 1 40m dns-client-2-56bdd8dfd5-pn5zn 1/1 Running 1 40m dns-client-3-6f98f9f7df-g29d6 1/1 Running 1 40m dns-client-86c6d59f7-znnbb 1/1 Running 1 40m dnstools-6d4979fbbf-294ns 1/1 Running 1 40m 9、執行刪除pod操作不管用 第一種情況很可能是cpu標高了,關掉一個master節點來降低cpu 第二種情況是其他組件出現了問題,比如flannal、coredns、docker,看看是否正常,尤其看docker是否有報錯,這很關鍵 10、不管什么報錯,時刻查看flannal、coredns、docker的狀態,很有可能和這幾個組件有關系 11、flannel處於Init:0/1狀態、coredns無法創建 [root@test4 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-brz8w 0/1 Pending 0 9m6s coredns-69d58bd968-jvfkf 0/1 Pending 0 9m7s kube-flannel-ds-w2r7l 0/1 Init:0/1 0 3m32s 首先查看有沒有docker容器 [root@test4 profile]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 發現一個容器都沒有,之前的pod就算創建失敗,從這個也可以看個id,這次連id都沒有,奇怪的很,只能通過kubectl去查看日志了,如下: 再查看pod日志: [root@test4 profile]# cat /var/log/kubernetes/kubelet.test4.root.log.ERROR.20190210-071055.86336 Log file created at: 2019/02/10 07:10:55 Running on machine: test4 Binary: Built with gc go1.11.2 for linux/amd64 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0210 07:10:55.151126 86336 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache E0210 07:14:56.172087 86336 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded 發現這句話:rpc error: code = DeadlineExceeded desc = context deadline exceeded 上面這句話從網上搜說是網絡問題。之前一直沒事,就沒動過網絡,能有啥問題。這時候突然想起來了防火牆,就去看看防火牆狀態吧,發現防火牆開着的,如下: 剛開始安裝就已經禁用了防火牆怎么會開着的,奇怪了。猜測可能是配置ip_vs內核參數時候,自動開啟了防火牆,這可能是默認規則,然后需要關閉即可 [root@test4 profile]# systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: active (running) since Sun 2019-02-10 04:56:42 EST; 2h 28min ago Docs: man:firewalld(1) Main PID: 28767 (firewalld) Tasks: 2 Memory: 372.0K CGroup: /system.slice/firewalld.service └─28767 /usr/bin/python2.7 /usr/sbin/firewalld --nofork --nopid Feb 10 07:10:27 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -...hain?). Feb 10 07:10:27 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C PREROUTING -m addrtype -...t name. Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C OUTPUT -m addrtype --dst...t name. Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C POSTROUTING -s 172.17.0....t name. Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C DOCKER -i docker0 -j RET...hain?). Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -...hain?). Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 -o...hain?). Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 ! ...hain?). Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -j...t name. Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -m...hain?). Hint: Some lines were ellipsized, use -l to show in full. 一直沒動過防火牆,怎么會開着的,奇怪了都,關掉防火牆,重新創建flannel,等幾分鍾flannel就會處於running狀態,查看如下: [root@test4 profile]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE kube-flannel-ds-w2r7l 1/1 Running 0 6m33s 再次查看docker中的容器,發現就有個flannel容器,查看如下: [root@test4 profile]# docker ps -l CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 23930c2ebb47 b949a39093d6 "/opt/bin/flanneld -…" 13 minutes ago Up 12 minutes k8s_kube-flannel_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 [root@test4 profile]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 23930c2ebb47 b949a39093d6 "/opt/bin/flanneld -…" 13 minutes ago Up 13 minutes k8s_kube-flannel_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 9d886b599345 b949a39093d6 "cp -f /etc/kube-fla…" 13 minutes ago Exited (0) 13 minutes ago k8s_install-cni_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 5c85c66161fb registry.access.redhat.com/rhel7/pod-infrastructure:latest "/usr/bin/pod" 13 minutes ago Up 13 minutes k8s_POD_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 12、pod一直處於pending狀態 [root@test4 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-brz8w 0/1 Pending 0 9m6s coredns-69d58bd968-jvfkf 0/1 Pending 0 9m7s 查看 kubelet的日志發現總是報Unable to update cni config,干脆直接不用cni插件,直接從kubelet啟動參數中剔除掉cni先關的幾個參數即可解決 [root@test4 ~]# systemctl status kubelet -l ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled) Active: active (running) since Sun 2019-02-10 05:21:42 EST; 12min ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 33598 (kubelet) CGroup: /system.slice/kubelet.service └─33598 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 10 05:34:16 test4 kubelet[33598]: W0210 05:34:16.651679 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:16 test4 kubelet[33598]: E0210 05:34:16.652336 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:21 test4 kubelet[33598]: W0210 05:34:21.656085 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:21 test4 kubelet[33598]: E0210 05:34:21.656587 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:26 test4 kubelet[33598]: W0210 05:34:26.665157 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:26 test4 kubelet[33598]: E0210 05:34:26.666018 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:31 test4 kubelet[33598]: W0210 05:34:31.669777 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:31 test4 kubelet[33598]: E0210 05:34:31.671423 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:36 test4 kubelet[33598]: W0210 05:34:36.677673 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:36 test4 kubelet[33598]: E0210 05:34:36.679154 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized 13、用kubeadm token 命令查看token總是報錯說 base64 格式不正確,這是因為token過期了,重啟搭建k8s集群即可解決,並不是不能用kubeadm命令了。試過, kubeadm token list --kubeconfig ~/.kube/config 14、第二次安裝k8s再次出現flannel處於Init:0/1狀態 [root@test4 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-gqsqz 0/1 ContainerCreating 0 11m coredns-69d58bd968-wg2jf 0/1 ContainerCreating 0 11m kube-flannel-ds-kljwc 0/1 Init:0/1 0 14m 這時候發現cpu已經標到 93%,可能是cpu導致的,也可能是防火牆導致的。做了如下兩步就解決了 啟動防火牆然后關掉 關掉其中一個master節點,因為cpu使用率太高了 15、安裝完etcd集群顯示有一個etcd不健康, 解決: 不影響使用,繼續安裝即可,里面問題太多了。是因為之前執行腳本的時候出現報錯信息,中間給停過腳本,之后安裝完就出現有一個etcd節點不健康 或者重新安裝一遍即可解決,親測有效
