本篇主要用於記錄在實施docker和kubenetes過程中遇到的一個問題和解決辦法.
本節部分內容摘自互聯網,有些部分為自己在測試環境中遇到到實際問題,后面還會根據實際情況不斷分享關於docker/k8s在開發和維護過程中出現的種種問題,以便后來者少走彎路.
系列目錄
kubernets nodeport 無法訪問
環境:
Os: centos7.1
Kubelet: 1.6.7
Docker: 17.06-ce
Calico: 2.3
K8s Cluster: master, node-1, node-2
問題:
現有 service A, 為了能使外部訪問,故將 service type 設為NodePort。端口為 31246。
A 所對應的 pod 運行在 node-1 上。
經過測試發現,外部訪問 master:31246 和 node-2:31246 時均出現失敗,只有通過 node-1:31246 才可正常訪問。
起因:
為了安全起見, docker 在 1.13 版本之后,將系統iptables 中 FORWARD 鏈的默認策略設置為 DROP,並為連接到 docker0 網橋的容器添加了放行規則。這里引用 moby issue#14041 中的描述:
When docker starts, it enables net.ipv4.ip_forward without changing the iptables FORWARD chain default policy to DROP. This means that another machine on the same network as the docker host can add a route to their routing table, and directly address any containers running on that docker host.
For example, if the docker0 subnet is 172.17.0.0/16 (the default subnet), and the docker host’s IP address is 192.168.0.10, from another host on the network run:
$ ip route add 172.17.0.0/16 via 192.168.0.10
$ nmap 172.17.0.0/16
1
2
The above will scan for containers running on the host, and report IP addresses & running services found.
To fix this, docker needs to set the FORWARD policy to DROP when it enables the net.ipv4.ip_forward sysctl parameter.
kubernetes 使用的 cni 插件會因此受影響(cni並不會在 FORWARD 鏈中生成相應規則),由此導致除 pod 所在 host 以外節點無法轉發報文而訪問失敗。
解決辦法:
如果對安全要求較低,可將 FORWARD 鏈的默認規則設為 ACCEPT
iptables -P FORWARD ACCEPT
google 網絡不可達
https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#7 - "Failed to connect to 2404:6800:4005:809::200e: 網絡不可達"
經典問題,需要自備梯子哦!
CentOS中設置yum的proxy
vi /etc/yum.conf
# 增加內容如下:
proxy=http://xxx.xx.x.xx:xxx #代理地址
沒有梯子的換阿里源
cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
EOF
關閉Swap
F1213 10:20:53.304755 2266 server.go:261] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained:
執行 swapoff -a
設定master錯誤
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
按提示設定為1
echo "1" >/proc/sys/net/ipv4/ip_forward
echo "1" >/proc/sys/net/bridge/bridge-nf-call-iptables
Kubeadm init 安裝鏡像卡住
# kubeadm config images list --kubernetes-version v1.13.0 # 看下該版本下的鏡像名
# 拉取鏡像
docker pull mirrorgooglecontainers/kube-apiserver:v1.13.0
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.0
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.0
docker pull mirrorgooglecontainers/kube-proxy:v1.13.0
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker pull coredns/coredns:1.2.6
# 重命名鏡像標簽
docker tag docker.io/mirrorgooglecontainers/kube-proxy:v1.13.0 k8s.gcr.io/kube-proxy:v1.13.0
docker tag docker.io/mirrorgooglecontainers/kube-scheduler:v1.13.0 k8s.gcr.io/kube-scheduler:v1.13.0
docker tag docker.io/mirrorgooglecontainers/kube-apiserver:v1.13.0 k8s.gcr.io/kube-apiserver:v1.13.0
docker tag docker.io/mirrorgooglecontainers/kube-controller-manager:v1.13.0 k8s.gcr.io/kube-controller-manager:v1.13.0
docker tag docker.io/mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
docker tag docker.io/mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag docker.io/coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6
# 刪除舊鏡像
docker rmi docker.io/mirrorgooglecontainers/kube-proxy:v1.13.0
docker rmi docker.io/mirrorgooglecontainers/kube-scheduler:v1.13.0
docker rmi docker.io/mirrorgooglecontainers/kube-apiserver:v1.13.0
docker rmi docker.io/mirrorgooglecontainers/kube-controller-manager:v1.13.0
docker rmi docker.io/mirrorgooglecontainers/etcd:3.2.24
docker rmi docker.io/mirrorgooglecontainers/pause:3.1
docker rmi docker.io/coredns/coredns:1.2.6
終於看到了
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 192.168.232.204:6443 --token m2hxkd.scxjrxgew6pyhvmb --discovery-token-ca-cert-hash sha256:8b94cefbe54ae4b3d7201012db30966c53870aad55be80a2888ec0da178c3610
network配置
# 這是我的虛機/etc/hosts的配置
192.168.232.204 k8a204
192.168.232.203 k8a203
192.168.232.202 k8a202
按手冊安裝選用的網絡,並等待dns安裝OK,然后增加node。
NAME STATUS ROLES AGE VERSION
k8a204 Ready master 6m6s v1.13.0
[root@localhost .kube]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8a203 NotReady <none> 4s v1.13.0
k8a204 Ready master 6m19s v1.13.0
注意,配置較慢,耐心等待
kubectl get pods --all-namespaces
===============以下是結果===============
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-2vdvx 1/1 Running 0 7m32s
kube-system coredns-86c58d9df4-88fjk 1/1 Running 0 7m32s
kube-system etcd-k8a204 1/1 Running 0 6m39s
kube-system kube-apiserver-k8a204 1/1 Running 0 6m30s
kube-system kube-controller-manager-k8a204 1/1 Running 0 6m30s
kube-system kube-proxy-tl7g5 1/1 Running 0 7m32s
kube-system kube-proxy-w2jgl 0/1 ContainerCreating 0 95s
kube-system kube-scheduler-k8a204 1/1 Running 0 6m49s
節點加入后NotReady
接上一問題:
**ContainerCreating **狀態時,請耐心等待,但是如果超過10分鍾仍然無響應,則必定是出錯了,囧!
最主要的問題:節點的鏡像拉不下來。
采用下列方式:
1)在master主機內保存鏡像為文件:
docker save -o /opt/kube-pause.tar k8s.gcr.io/pause:3.1
docker save -o /opt/kube-proxy.tar k8s.gcr.io/kube-proxy:v1.13.0
docker save -o /opt/kube-flannel1.tar quay.io/coreos/flannel:v0.9.1
docker save -o /opt/kube-flannel2.tar quay.io/coreos/flannel:v0.10.0-amd64
docker save -o /opt/kube-calico1.tar quay.io/calico/cni:v3.3.2
docker save -o /opt/kube-calico2.tar quay.io/calico/node:v3.3.2
2)拷貝文件到node計算機
scp /opt/*.tar root@192.168.232.203:/opt/
- 在node節點執行docker導入
docker load -i /opt/kube-flannel1.tar
docker load -i /opt/kube-flannel2.tar
docker load -i /opt/kube-proxy.tar
docker load -i /opt/kube-pause.tar
docker load -i /opt/kube-calico1.tar
docker load -i /opt/kube-calico2.tar
- 檢查node節點鏡像文件
docker images
==============================================以下是結果======================================
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.13.0 8fa56d18961f 9 days ago 80.2 MB
quay.io/calico/node v3.3.2 4e9be81e3a59 9 days ago 75.3 MB
quay.io/calico/cni v3.3.2 490d921fa49c 9 days ago 75.4 MB
quay.io/coreos/flannel v0.10.0-amd64 f0fad859c909 10 months ago 44.6 MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 11 months ago 742 kB
quay.io/coreos/flannel v0.9.1 2b736d06ca4c 13 months ago 51.3 MB
搞定了,所有服務均running
[root@localhost .kube]# kubectl get pods --all-namespaces
====================================以下是結果========================================
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-4dsg5 1/2 Running 0 42m
kube-system calico-node-5dtk2 1/2 Running 0 41m
kube-system calico-node-78qvp 1/2 Running 0 41m
kube-system coredns-86c58d9df4-26vr7 1/1 Running 0 43m
kube-system coredns-86c58d9df4-s5ljf 1/1 Running 0 43m
kube-system etcd-k8a204 1/1 Running 0 42m
kube-system kube-apiserver-k8a204 1/1 Running 0 42m
kube-system kube-controller-manager-k8a204 1/1 Running 0 42m
kube-system kube-proxy-8c7hs 1/1 Running 0 41m
kube-system kube-proxy-dls8l 1/1 Running 0 41m
kube-system kube-proxy-t65tc 1/1 Running 0 43m
kube-system kube-scheduler-k8a204 1/1 Running 0 42m
重啟恢復master
swapoff -a
# 啟動所有容器
# 更簡潔的命令: docker start $(docker ps -aq)
docker start $(docker ps -a | awk '{ print $1}' | tail -n +2)
systemctl start kubelet
# 查看啟動錯誤
journalctl -xefu kubelet
# docker 開機自啟
docker run --restart=always
- DNS解析 kubernetes.default失敗
安裝busybox進行dns檢測,一直出現如下錯誤:
kubectl exec -ti busybox -- nslookup kubernetes.default
=============================以下是結果============================
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
經查,新版busybox的dns解析有變化或bug,采用舊版本busybox images <= 1.28.4 后測試OK
- token過期后重新生成
# 生成新的token
kubeadm token create
# 生成新的token hash碼
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
# 利用新的token和hash碼加入節點
# master地址,token,hash請自行更換
kubeadm join 192.168.232.204:6443 --token m87q91.gbcqhfx9ansvaf3o --discovery-token-ca-cert-hash sha256:fdd34ef6c801e382f3fb5b87bc9912a120bf82029893db121b9c8eae29e91c62