1 背景說明
在部署k8s node節點時,kubelet的基礎設施鏡像,修改為私有倉庫的鏡像,發現在創建pod的時候,一直報錯無法拉取pause的鏡像。
2 現象
pod無法啟動,一直顯示ContainerCreating
[root@node-08 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5bbf8d494b-qf98r 0/1 ContainerCreating 0 94s <none> 172.20.59.57 <none> <none>
通過kubectl describe pod如下報錯:
Normal Scheduled <unknown> default-scheduler Successfully assigned default/zmm-nginx-deployment-66548984d9-ghx59 to 172.20.59.57
Warning MissingClusterDNS 8s (x2 over 27s) kubelet, 172.20.59.57 pod: "zmm-nginx-deployment-66548984d9-ghx59_default(3f71451b-9004-43b9-9519-047041bd8c35)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning FailedCreatePodSandBox 2s (x2 over 20s) kubelet, 172.20.59.57 Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "172.20.59.190/kubernetes/pause-amd64:3.1": Error response from daemon: pull access denied for 172.20.59.190/kubernetes/pause-amd64, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
查看kubelet的后台日志,報錯如下:
Oct 28 18:17:48 nccztsjb-node-08 kubelet: E1028 18:17:48.761788 15938 pod_workers.go:191] Error syncing pod 18eca05b-803e-413b-ad1a-de948fe212ce ("zmm-nginx-deployment-78667db46c-n7tbp_default(18eca05b-803e-413b-ad1a-de948fe212ce)"), skipping: failed to "CreatePodSandbox" for "zmm-nginx-deployment-78667db46c-n7tbp_default(18eca05b-803e-413b-ad1a-de948fe212ce)" with CreatePodSandboxError: "CreatePodSandbox for pod \"zmm-nginx-deployment-78667db46c-n7tbp_default(18eca05b-803e-413b-ad1a-de948fe212ce)\" failed: rpc error: code = Unknown desc = failed pulling image \"172.20.59.190/kubernetes/pause-amd64:3.1\": Error response from daemon: pull access denied for 172.20.59.190/kubernetes/pause-amd64, repository does not exist or may require 'docker login': denied: requested access to the resource is denied"
3 問題分析
通過上面的日志輸出來看,是沒有權限拉取私有鏡像倉庫中的pause鏡像。
kubelet的啟動參數如下
Description=Kubernetes Kubelet Server
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet \
--kubeconfig=/root/.kube/config \
--hostname-override=172.20.59.57 \
--pod-infra-container-image=172.20.59.190:81/k8s/pause-amd64:3.1 \ #該配置為pod的基礎容器使用的鏡像
--logtostderr=false \
--log-dir=/var/log/kubernetes \
--v=4
Restart=on-failure
[Install]
WantedBy=multi-user.target
4 嘗試的解決方法
4.1 本地docker login登錄鏡像倉庫,通過docker pull拉取該pause鏡像,可以解決該問題(需要所有的node都將pause緩存到本地)
4.2 配置docker-registry secret,在pod的yaml文件中,配置imagePullSecrets來使用該secret,創建pod,仍然報這個錯誤
該方法說明,pod中容器,和kubelet下載的pause使用的是不同的憑證
查找了kubelet配置的所有參數,沒有看到和鏡像倉庫相關的
配置使用secret的方法,可以參考:kuernetes集群中,pod拉取私有鏡像倉庫(harbor)中的鏡像的方法
4.3 修改habor中,pause鏡像所在的項目為公開類型--可以解決該問題

思考過程:
- 1.默認的kubelet中pause的配置是gcr.io/google_containers/pause-amd64,也就是表示是從公共倉庫中拉取,自然不需要什么權限、認證
- 2.我們配置了私有的pause鏡像地址,但是是放在私有權限的倉庫里的,拉取的時候需要認證的
- 3.嘗試,把harbor里面,pause所在的倉庫修改為公開類型,發現,把本地的pause,應用鏡像都刪除后,重新創建pod,是可以拉取pause鏡像成功的,pod啟動成功,docker images可以查到
- 4.pause使用公開的倉庫去拉取,應用的鏡像,通過secret,到私有的倉庫拉取
