實踐環境
CentOS-7-x86_64-DVD-1810
Docker 19.03.9
Kubernetes version: v1.20.5
開始之前
1台Linux操作或更多,兼容運行deb
,rpm
確保每台機器2G內存或以上
確保當控制面板的結點機,其CPU核數為雙核或以上
確保集群中的所有機器網絡互連
目標
- 安裝一個
Kubernetes
集群控制面板 - 基於集群安裝一個
Pod network
以便集群之間可以相互通信
安裝指導
安裝Docker
安裝過程略
注意,安裝docker
時,需要指Kubenetes
支持的版本(參見如下),如果安裝的docker
版本過高導致,會提示以下問題
WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
安裝docker時指定版本
sudo yum install docker-ce-19.03.9 docker-ce-cli-19.03.9 containerd.io
如果沒有安裝docker
,運行kubeadm init
時會提示以下問題
cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
[preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH
安裝kubeadm
如果沒有安裝的話,先安裝kubeadm
,如果已安裝,可通過apt-get update
&& apt-get upgrade
或yum update
命令更新kubeadm
最新版
注意:更新kubeadm
過程中,kubelet
每隔幾秒中就會重啟,這個是正常現象。
其它前置操作
關閉防火牆
# systemctl stop firewalld && systemctl disable firewalld
運行上述命令停止並禁用防火牆,否則運行kubeadm init
時會提示以下問題
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
修改/etc/docker/daemon.json
文件
編輯/etc/docker/daemon.json
文件,添加以下內容
{
"exec-opts":["native.cgroupdriver=systemd"]
}
然后執行systemctl restart docker
命令重啟docker
如果不執行以上操作,運行kubeadm init
時會提示以下問題
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
安裝socat
,conntrack
等依賴軟件包
# yum install socat conntrack-tools
如果按未安裝上述依賴包,運行kubeadm init
時會提示以下問題
[WARNING FileExisting-socat]: socat not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:`
[ERROR FileExisting-conntrack]: conntrack not found in system path`
設置net.ipv4.ip_forward
值為1
設置net.ipv4.ip_forward
值為1,具體如下
# sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1
說明:net.ipv4.ip_forward
如果為0,則表示禁止轉發數據包,為1則表示允許轉發數據包,如果net.ipv4.ip_forward
值不為1,運行kubeadm init
時會提示以下問題
ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
以上配置臨時生效,為了避免重啟機器后失效,進行如下設置
# echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
注意:網上有推薦以下方式進行永久配置的,但是筆者試過,實際不起作用
# echo "sysctl -w net.ipv4.ip_forward=1" >> /etc/rc.local
# chmod +x /etc/rc.d/rc.local
設置net.bridge.bridge-nf-call-iptables
值為1
做法參考 net.ipv4.ip_forward
設置
注意:上文操作,在每個集群結點都要實施一次
初始化控制面板結點
控制面板組件運行的機器,稱之為控制面板結點,包括 etcd (集群數據庫) 和 API Server (供 kubectl 命令行工具調用)
-
(推薦)如果打算升級單個控制面板
kubeadm
集群為高可用版(high availability
),應該為kubeadm init
指定--control-plane-endpoint
參數選項以便為所有控制面板結點設置共享endpont
。該endpont
可以是DNS
名稱或者本地負載均衡IP地址。 -
選擇一個網絡插件,並確認該插件是否需要傳遞參數給
kubeadm init
,這取決於你所選插件,比如使用flannel
,就必須為kubeadm init
指定--pod-network-cidr
參數選項 -
(可選)1.14版本開始,
kubeadm
會自動檢測容器運行時,如果需要使用不同的容器運行時,或者有多於1個容器運行時的情況下,需要為kubeadm init
指定--cri-socket
參數選項 -
(可選)除非指定了其它的,
kubeadm
使用與默認網關關聯的網絡接口為指定控制面板結點API服務器設置advertise
地址。如果需要指定其它的網絡接口,需要為kubeadm init
指定apiserver-advertise-address=<ip-address>
參數選項。發布IPV6
Kubernetes
集群,需要為kubeadm init
指定--apiserver-advertise-address
參數選項,以設置IPv6
地址,形如--apiserver-advertise-address=fd00::101
-
(可選)運行
kubeadm init
之前,先運行kubeadm config images pull
,以確認可連接到gcr.io
容器鏡像注冊中心
如下,帶參數運行kubeadm init
以便初始化控制面板結點機,運行該命令時會先執行一系列的預檢,以確保機器滿足運行kubernetes
。如果預檢發現錯誤,則自動退出程序,否則繼續執行,下載並安裝集群控制面板組件。這可能會花費幾分鍾
# kubeadm init --image-repository=registry.aliyuncs.com/google_containers --kubernetes-version stable --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost.localdomain] and IPs [10.96.0.1 10.118.80.93]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 89.062309 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 1sh85v.surdstc5dbrmp1s2
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo \
--discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
如上,命令輸出Your Kubernetes control-plane has initialized successfully!
及其它提示,告訴我們初始化控制面板結點成功。
注意:
-
如果不使用
--image-repository
選項指定阿里雲鏡像,可能會報類似如下錯誤failed to pull image "k8s.gcr.io/kube-apiserver:v1.20.5": output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) , error: exit status 1
-
因為使用
flannel
網絡插件,必須指定--pod-network-cidr
配置選項,否則名為coredns-xxxxxxxxxx-xxxxx
的Pod
無法啟動,一直處於ContainerCreating
狀態,查看詳細信息,可見類似如下錯誤信息networkPlugin cni failed to set up pod "coredns-7f89b7bc75-9vrrl_kube-system" network: open /run/flannel/subnet.env: no such file or directory
-
--pod-network-cidr
選項參數,即Pod
網絡不能和宿主主機網絡相同,否則安裝flannel
插件后會導致路由重復,進而導致XShell
等工具無法ssh
宿主機,如下:實踐宿主主機網絡
10.118.80.0/24
,網卡接口ens33
--pod-network-cidr=10.118.80.0/24
-
另外,需要特別注意的是,``--pod-network-cidr
的選項參數,必須和
kube-flannel.yml文件中的
net-conf.json.Network鍵值保持一致(本例中,鍵值如下所示,為
10.244.0.0/16,所以運行
kubeadm init命令時,
--pod-network-cidr選項參數值設置為
10.244.0.0/16`)# cat kube-flannel.yml|grep -E "^\s*\"Network" "Network": "10.244.0.0/16",
初次實踐時,設置
--pod-network-cidr=10.1.15.0/24
,未修改kube-flannel.yml
中Network
鍵值,新加入集群的結點,無法自動獲取pod cidr
,如下# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-psts8 0/1 CrashLoopBackOff 62 15h ...略 # kubectl -n kube-system logs kube-flannel-ds-psts8 ...略 E0325 01:03:08.190986 1 main.go:292] Error registering network: failed to acquire lease: node "k8snode1" pod cidr not assigned W0325 01:03:08.192875 1 reflector.go:424] github.com/coreos/flannel/subnet/kube/kube.go:300: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding I0325 01:03:08.193782 1 main.go:371] Stopping shutdownHandler...
后面嘗試修改
kube-flannel.yml
中``net-conf.json.Network鍵值為
10.1.15.0/24還是一樣的提示(先下載
kube-flannel.yml`,然后進行配置修改,再安裝網絡插件)針對上述
node "xxxxxx" pod cidr not assigned
的問題,網上也有臨時解決方案(筆者未驗證),即為結點手動分配podCIDR
,命令如下:kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'
參照輸出提示,為了讓非root用戶也可以正常執行kubectl
,運行以下命令
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config
可選的,如果是root
用戶,可運行以下命令
export KUBECONFIG=/etc/kubernetes/admin.conf
記錄kubeadm init
輸出中的kubeadm join
,后面需要用該命令添加結點到集群中
token用於控制面板結點和加入集群的結點之間的相互認證。需要安全保存,因為任何擁有該token的人都可以添加認證結點到集群中。 可用 kubeadm token
展示,創建和刪除該token
。命令詳情參考kubeadm reference guide.
安裝Pod
網絡插件
**必須基於Pod
網絡發布一個 Container Network Interface (CNI) ,以便Pod之間可相互通信。Pod
網絡安裝之前,不會啟動Cluster DNS (CoreDNS) **
- 注意Pod 網絡不能和主機網絡重疊,如果重疊,會出問題(如果發現網絡發現網絡插件的首選Pod網絡與某些主機網絡之間發生沖突,則應考慮使用合適的CIDR塊,然后在執行
kubeadm init
時,增加--pod-network-cidr
選項替換網絡插件YAML中的網絡配置. - 默認的,
kubeadm
設置集群強制使用 RBAC (基於角色訪問控制)。確保Pod網絡插件及用其發布的任何清單支持RBAC
- 如果讓集群使用
IPv6
--dual-stack
,或者僅single-stack IPv6
網絡,確保往插件支持IPv6
.CNI
v0.6.0中添加了IPv6的支持。
好些項目使用CNI提供提供Kubernetes
網絡支持,其中一些也支持網絡策略,以下是實現了Kubernetes
網絡模型的插件列表查看地址:
可在控制面板結點機上或者擁有kubeconfig
憑據的結點機上通過執行下述命令安裝一個Pod
網絡插件,該插件直接以daemonset
的方式安裝,並且會把配置文件寫入/etc/cni/net.d
目錄:
kubectl apply -f <add-on.yaml>
flannel
網絡插件安裝
手動發布flannel(Kubernetes
v1.17+)
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
參考連接:https://github.com/flannel-io/flannel#flannel
每個集群只能安裝一個Pod
網絡,Pod
網絡安裝完成后,可通過執行kubectl get pods --all-namespaces
命令,查看命令輸出中coredns-xxxxxxxxxx-xxx Pod
是否處於Running
來判斷網絡是否正常
查看flannel
子網環境配置信息
# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
flannel
網絡插件安裝完成后,宿主機上會自動增加兩個虛擬網卡:cni0
和 flannel.1
# ifconfig -a
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255
inet6 fe80::705d:43ff:fed6:80c9 prefixlen 64 scopeid 0x20<link>
ether 72:5d:43:d6:80:c9 txqueuelen 1000 (Ethernet)
RX packets 312325 bytes 37811297 (36.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 356346 bytes 206539626 (196.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
inet6 fe80::42:e1ff:fec3:8b6a prefixlen 64 scopeid 0x20<link>
ether 02:42:e1:c3:8b:6a txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3 bytes 266 (266.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.118.80.93 netmask 255.255.255.0 broadcast 10.118.80.255
inet6 fe80::6ff9:dbee:6b27:1315 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:d3:3b:ef txqueuelen 1000 (Ethernet)
RX packets 2092903 bytes 1103282695 (1.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 969483 bytes 253273828 (241.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.0 netmask 255.255.255.255 broadcast 10.244.0.0
inet6 fe80::a49a:2ff:fe38:3e4b prefixlen 64 scopeid 0x20<link>
ether a6:9a:02:38:3e:4b txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 30393748 bytes 5921348235 (5.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30393748 bytes 5921348235 (5.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
重新初始化控制面板結點
實踐過程中因選項配置不對,在網絡插件安裝后才發現需要,需要重新執行kubeadm init
命令。具體實踐操作如下:
# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "localhost.localdomain" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
#
執行完上述命令后,需要重新執行 初始化控制面板結點操作,並且重新安裝網絡插件
遇到的問題總結
重新執行kubeadm init
命令后,執行kubectl get pods --all-namespaces
查看Pod
狀態,發現coredns-xxxxxxxxxx-xxxxxx
狀態為ContainerCreating
,如下
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7f89b7bc75-pxvdx 0/1 ContainerCreating 0 8m33s
kube-system coredns-7f89b7bc75-v4p57 0/1 ContainerCreating 0 8m33s
kube-system etcd-localhost.localdomain 1/1 Running 0 8m49s
...略
執行kubectl describe pod coredns-7f89b7bc75-pxvdx -n kube-system
命令查看對應Pod
詳細信息,發現如下錯誤:
Warning FailedCreatePodSandBox 98s (x4 over 103s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "04434c63cdf067e698a8a927ba18e5013d2a1a21afa642b3cddedd4ff4592178" network for pod "coredns-7f89b7bc75-pxvdx": networkPlugin cni failed to set up pod "coredns-7f89b7bc75-pxvdx_kube-system" network: failed to set bridge addr: "cni0" already has an IP address different from 10.1.15.1/24
如下,查看網卡信息,發現 cni0已分配了IP
地址(網絡插件上次分配的),導致本次網絡插件給它設置IP
失敗。
# ifconfig -a
cni0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 10.118.80.1 netmask 255.255.255.0 broadcast 10.118.80.255
inet6 fe80::482d:65ff:fea6:32fd prefixlen 64 scopeid 0x20<link>
ether 4a:2d:65:a6:32:fd txqueuelen 1000 (Ethernet)
RX packets 267800 bytes 16035849 (15.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 116238 bytes 10285959 (9.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
...略
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.1.15.0 netmask 255.255.255.255 broadcast 10.1.15.0
inet6 fe80::a49a:2ff:fe38:3e4b prefixlen 64 scopeid 0x20<link>
ether a6:9a:02:38:3e:4b txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
...略
解決方法如下,刪除配置錯誤的cni0網卡,刪除網卡后會自動重建,然后就好了
$ sudo ifconfig cni0 down
$ sudo ip link delete cni0
控制面板結點Toleration
(可選)
默認的,出於安全考慮,集群不會在控制面板結點機上調度(schedule
) Pod
。如果希望在控制面板結點機上調度Pod
,比如用於開發的單機Kubernetes
集群,需要運行以下命令
kubectl taint nodes --all node-role.kubernetes.io/master- # 移除所有Labels以node-role.kubernetes.io/master打頭的結點的污點(Taints)
實踐如下
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready control-plane,master 63m v1.20.5
# kubectl taint nodes --all node-role.kubernetes.io/master-
node/localhost.localdomain untainted
添加結點到集群
修改新結點的hostname
# hostname
localhost.localdomain
# hostname k8sNode1
以上通過命令修改主機名僅臨時生效,為了避免重啟失效,需要編輯/etc/hostname
文件,替換默認的localhost.localdomain
為目標名稱(例中為k8sNode
),如果不添加,后續操作會遇到一下錯誤
[WARNING Hostname]: hostname "k8sNode1" could not be reached
[WARNING Hostname]: hostname "k8sNode1": lookup k8sNode1 on 223.5.5.5:53: read udp 10.118.80.94:33293->223.5.5.5:53: i/o timeout
修改/ect/hosts
配置,增加結點機hostname
到結點機IP
(例中為 10.118.80.94
)的映射,如下
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.118.80.94 k8sNode1
ssh
登錄目標結點機,切換至root
用戶(如果非root
用戶登錄),然后運行控制面板機器上執行kubeadm init
命令輸出的kubeadm join
命令,錄入:
kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>
可在控制面板機上通過運行一下命令查看已有且未過期token
# kubeadm token list
如果沒有token
,可在控制面板機上通過以下命令重新生成token
# kubeadm token create
實踐如下
# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
控制面板節點機即master
機器上查看是否新增結點
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8snode1 NotReady <none> 74s v1.20.5
localhost.localdomain Ready control-plane,master 7h24m v1.20.5
如上,新增了一個k8snode1結點
遇到問題總結
問題1:運行]kubeadm join
時報錯,如下
# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "ap4vvq"
To see the stack trace of this error execute with --v=5 or higher
解決方法:
token過期,運行kubeadm token create
命令重新生成token
問題1:運行]kubeadm join
時報錯,如下
# kubeadm join 10.118.80.93:6443 --token pa0gxw.4vx2wud1e7e0rzbx --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: cluster CA found in cluster-info ConfigMap is invalid: none of the public keys "sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f" are pinned
To see the stack trace of this error execute with --v=5 or higher
解決方法:
discovery-token-ca-cert-hash
失效,運行以下命令,重新獲取discovery-token-ca-cert-hash
值
# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f
使用輸出的hash
值
--discovery-token-ca-cert-hash sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f
問題2: cni config uninitialized
錯誤問題
通過k8s自帶UI查看新加入結點狀態為KubeletNotReady
,提示信息如下,
[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage]
解決方法: 重新安裝CNI網絡插件(實踐時采用了虛擬機,可能是因為當時使用的快照沒包含網絡插件),然后重新清理結點,最后再重新加入結點
# CNI_VERSION="v0.8.2"
# mkdir -p /opt/cni/bin
# curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-amd64-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz
清理
如果在集群中使用一次性服務器進行測試,則可以直接關閉這些服務器,不需要進行進一步的清理。可以使用kubectl config delete cluster
刪除對集群的本地引用(筆者未試過)。
但是,如果您想更干凈地清理集群,則應該首先清空結點數據,確保節點數據被清空,然后再刪除結點
移除結點
控制面板結點機上的操作
先在控制面板結點機上運行以下命令,告訴控制面板結點機器強制刪除待刪除結點數據
kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets
實踐如下:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8snode1 Ready <none> 82m v1.20.5
localhost.localdomain Ready control-plane,master 24h v1.20.5
# kubectl drain k8snode1 --delete-emptydir-data --force --ignore-daemonsets
node/k8snode1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4xqcc, kube-system/kube-proxy-c7qzs
evicting pod default/nginx-deployment-64859b8dcc-v5tcl
evicting pod default/nginx-deployment-64859b8dcc-qjrld
evicting pod default/nginx-deployment-64859b8dcc-rcvc8
pod/nginx-deployment-64859b8dcc-rcvc8 evicted
pod/nginx-deployment-64859b8dcc-qjrld evicted
pod/nginx-deployment-64859b8dcc-v5tcl evicted
node/k8snode1 evicted
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready control-plane,master 24h v1.20.5
目標結點機上的操作
登錄到目標結點機上,執行以下命令
# kubeadm reset
上述命令不會重置、清理iptables
、IPVS
表,如果需要重置iptables
還需要手動運行以下命令:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
如果需要重置IPVS
,必須運行以下命令。
ipvsadm -C
注意:如果無特殊需求,不要去重置網絡
刪除結點配置文件
# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
控制面板結點機上的操作
通過執行命令刪除結點kubectl delete node <node name>
###刪除未刪除的pod
# kubectl delete pod kube-flannel-ds-4xqcc -n kube-system --force
# kubectl delete pod kube-proxy-c7qzs -n kube-system --force
# kubectl delete node k8snode1
node "k8snode1" deleted
刪除后,如果需要重新加入結點,可通過 kubeadm join
攜帶適當參數運行加入
清理控制面板
可以在控制面板結點機上,使用kubeadm reset
命令。點擊查看 kubeadm reset
命令參考
參考連接
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/