Kubernetes 使用kubeadm創建集群


實踐環境

CentOS-7-x86_64-DVD-1810

Docker 19.03.9

Kubernetes version: v1.20.5

開始之前

1台Linux操作或更多,兼容運行deb,rpm

確保每台機器2G內存或以上

確保當控制面板的結點機,其CPU核數為雙核或以上

確保集群中的所有機器網絡互連

目標

  • 安裝一個Kubernetes集群控制面板
  • 基於集群安裝一個Pod network以便集群之間可以相互通信

安裝指導

安裝Docker

安裝過程略

注意,安裝docker時,需要指Kubenetes支持的版本(參見如下),如果安裝的docker版本過高導致,會提示以下問題

WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03

安裝docker時指定版本

sudo yum install docker-ce-19.03.9 docker-ce-cli-19.03.9 containerd.io

如果沒有安裝docker,運行kubeadm init時會提示以下問題

cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH

[preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH

安裝kubeadm

如果沒有安裝的話,先安裝kubeadm,如果已安裝,可通過apt-get update && apt-get upgradeyum update命令更新kubeadm最新版

注意:更新kubeadm過程中,kubelet每隔幾秒中就會重啟,這個是正常現象。

其它前置操作

關閉防火牆

# systemctl stop firewalld && systemctl disable firewalld

運行上述命令停止並禁用防火牆,否則運行kubeadm init時會提示以下問題

[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly

修改/etc/docker/daemon.json文件

編輯/etc/docker/daemon.json文件,添加以下內容

{
"exec-opts":["native.cgroupdriver=systemd"]
}

然后執行systemctl restart docker命令重啟docker

如果不執行以上操作,運行kubeadm init時會提示以下問題

[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

安裝socatconntrack等依賴軟件包

# yum install socat conntrack-tools

如果按未安裝上述依賴包,運行kubeadm init時會提示以下問題

[WARNING FileExisting-socat]: socat not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:`
[ERROR FileExisting-conntrack]: conntrack not found in system path`

設置net.ipv4.ip_forward值為1

設置net.ipv4.ip_forward值為1,具體如下

# sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1  

說明:net.ipv4.ip_forward如果為0,則表示禁止轉發數據包,為1則表示允許轉發數據包,如果net.ipv4.ip_forward值不為1,運行kubeadm init時會提示以下問題

ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1

以上配置臨時生效,為了避免重啟機器后失效,進行如下設置

# echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

注意:網上有推薦以下方式進行永久配置的,但是筆者試過,實際不起作用

# echo "sysctl -w net.ipv4.ip_forward=1" >> /etc/rc.local 
# chmod +x /etc/rc.d/rc.local

設置net.bridge.bridge-nf-call-iptables值為1

做法參考 net.ipv4.ip_forward設置

注意:上文操作,在每個集群結點都要實施一次

初始化控制面板結點

控制面板組件運行的機器,稱之為控制面板結點,包括 etcd (集群數據庫) 和 API Server (供 kubectl 命令行工具調用)

  1. (推薦)如果打算升級單個控制面板kubeadm集群為高可用版(high availability),應該為kubeadm init指定--control-plane-endpoint參數選項以便為所有控制面板結點設置共享endpont。該endpont可以是DNS名稱或者本地負載均衡IP地址。

  2. 選擇一個網絡插件,並確認該插件是否需要傳遞參數給 kubeadm init,這取決於你所選插件,比如使用flannel,就必須為kubeadm init指定--pod-network-cidr參數選項

  3. (可選)1.14版本開始, kubeadm會自動檢測容器運行時,如果需要使用不同的容器運行時,或者有多於1個容器運行時的情況下,需要為kubeadm init指定--cri-socket參數選項

  4. (可選)除非指定了其它的,kubeadm使用與默認網關關聯的網絡接口為指定控制面板結點API服務器設置advertise地址。如果需要指定其它的網絡接口,需要為kubeadm init指定apiserver-advertise-address=<ip-address>參數選項。發布IPV6 Kubernetes集群,需要為kubeadm init指定--apiserver-advertise-address參數選項,以設置IPv6地址,形如 --apiserver-advertise-address=fd00::101

  5. (可選)運行kubeadm init之前,先運行kubeadm config images pull,以確認可連接到gcr.io容器鏡像注冊中心

如下,帶參數運行kubeadm init以便初始化控制面板結點機,運行該命令時會先執行一系列的預檢,以確保機器滿足運行kubernetes。如果預檢發現錯誤,則自動退出程序,否則繼續執行,下載並安裝集群控制面板組件。這可能會花費幾分鍾

# kubeadm init --image-repository=registry.aliyuncs.com/google_containers --kubernetes-version stable  --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost.localdomain] and IPs [10.96.0.1 10.118.80.93]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 89.062309 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 1sh85v.surdstc5dbrmp1s2
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo \     
    --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a

如上,命令輸出Your Kubernetes control-plane has initialized successfully!及其它提示,告訴我們初始化控制面板結點成功。

注意:

  1. 如果不使用--image-repository選項指定阿里雲鏡像,可能會報類似如下錯誤

    failed to pull image "k8s.gcr.io/kube-apiserver:v1.20.5": output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    , error: exit status 1
    
  2. 因為使用flannel網絡插件,必須指定--pod-network-cidr配置選項,否則名為 coredns-xxxxxxxxxx-xxxxxPod無法啟動,一直處於ContainerCreating狀態,查看詳細信息,可見類似如下錯誤信息

    networkPlugin cni failed to set up pod "coredns-7f89b7bc75-9vrrl_kube-system" network: open /run/flannel/subnet.env: no such file or directory
    
  3. --pod-network-cidr選項參數,即Pod網絡不能和宿主主機網絡相同,否則安裝flannel插件后會導致路由重復,進而導致XShell等工具無法ssh宿主機,如下:

    實踐宿主主機網絡 10.118.80.0/24,網卡接口 ens33

    --pod-network-cidr=10.118.80.0/24

  4. 另外,需要特別注意的是,``--pod-network-cidr的選項參數,必須和kube-flannel.yml文件中的net-conf.json.Network鍵值保持一致(本例中,鍵值如下所示,為10.244.0.0/16,所以運行kubeadm init命令時,--pod-network-cidr選項參數值設置為10.244.0.0/16`)

    # cat kube-flannel.yml|grep -E "^\s*\"Network"
          "Network": "10.244.0.0/16",
    

    初次實踐時,設置--pod-network-cidr=10.1.15.0/24,未修改kube-flannel.ymlNetwork鍵值,新加入集群的結點,無法自動獲取pod cidr,如下

    # kubectl get pods --all-namespaces
    NAMESPACE              NAME                                            READY   STATUS             RESTARTS   AGE
    kube-system   kube-flannel-ds-psts8                           0/1     CrashLoopBackOff   62         15h
    ...略
    # kubectl -n kube-system logs kube-flannel-ds-psts8
    ...略
    E0325 01:03:08.190986       1 main.go:292] Error registering network: failed to acquire lease: node "k8snode1" pod cidr not assigned
    W0325 01:03:08.192875       1 reflector.go:424] github.com/coreos/flannel/subnet/kube/kube.go:300: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
    I0325 01:03:08.193782       1 main.go:371] Stopping shutdownHandler...
    

    后面嘗試修改kube-flannel.yml中``net-conf.json.Network鍵值為10.1.15.0/24還是一樣的提示(先下載kube-flannel.yml`,然后進行配置修改,再安裝網絡插件)

    針對上述 node "xxxxxx" pod cidr not assigned的問題,網上也有臨時解決方案(筆者未驗證),即為結點手動分配podCIDR,命令如下:

    kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'
    

參照輸出提示,為了讓非root用戶也可以正常執行kubectl,運行以下命令

# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config

可選的,如果是root用戶,可運行以下命令

export KUBECONFIG=/etc/kubernetes/admin.conf

記錄kubeadm init輸出中的kubeadm join,后面需要用該命令添加結點到集群

token用於控制面板結點和加入集群的結點之間的相互認證。需要安全保存,因為任何擁有該token的人都可以添加認證結點到集群中。 可用 kubeadm token展示,創建和刪除該token。命令詳情參考kubeadm reference guide.

安裝Pod網絡插件

**必須基於Pod網絡發布一個 Container Network Interface (CNI) ,以便Pod之間可相互通信。Pod網絡安裝之前,不會啟動Cluster DNS (CoreDNS) **

  • 注意Pod 網絡不能和主機網絡重疊,如果重疊,會出問題(如果發現網絡發現網絡插件的首選Pod網絡與某些主機網絡之間發生沖突,則應考慮使用合適的CIDR塊,然后在執行kubeadm init時,增加--pod-network-cidr選項替換網絡插件YAML中的網絡配置.
  • 默認的, kubeadm 設置集群強制使用 RBAC (基於角色訪問控制)。確保Pod網絡插件及用其發布的任何清單支持RBAC
  • 如果讓集群使用IPv6--dual-stack,或者僅single-stack IPv6 網絡,確保往插件支持IPv6. CNI v0.6.0中添加了IPv6的支持。

好些項目使用CNI提供提供Kubernetes網絡支持,其中一些也支持網絡策略,以下是實現了Kubernetes網絡模型的插件列表查看地址:

https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model

可在控制面板結點機上或者擁有kubeconfig 憑據的結點機上通過執行下述命令安裝一個Pod網絡插件,該插件直接以daemonset的方式安裝,並且會把配置文件寫入/etc/cni/net.d目錄:

kubectl apply -f <add-on.yaml>

flannel網絡插件安裝

手動發布flannel(Kubernetes v1.17+)

# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

參考連接:https://github.com/flannel-io/flannel#flannel

每個集群只能安裝一個Pod網絡,Pod網絡安裝完成后,可通過執行kubectl get pods --all-namespaces命令,查看命令輸出中coredns-xxxxxxxxxx-xxx Pod是否處於Running來判斷網絡是否正常

查看flannel子網環境配置信息

# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

flannel網絡插件安裝完成后,宿主機上會自動增加兩個虛擬網卡:cni0flannel.1

# ifconfig -a
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::705d:43ff:fed6:80c9  prefixlen 64  scopeid 0x20<link>
        ether 72:5d:43:d6:80:c9  txqueuelen 1000  (Ethernet)
        RX packets 312325  bytes 37811297 (36.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 356346  bytes 206539626 (196.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:e1ff:fec3:8b6a  prefixlen 64  scopeid 0x20<link>
        ether 02:42:e1:c3:8b:6a  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 266 (266.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.118.80.93  netmask 255.255.255.0  broadcast 10.118.80.255
        inet6 fe80::6ff9:dbee:6b27:1315  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:d3:3b:ef  txqueuelen 1000  (Ethernet)
        RX packets 2092903  bytes 1103282695 (1.0 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 969483  bytes 253273828 (241.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 10.244.0.0
        inet6 fe80::a49a:2ff:fe38:3e4b  prefixlen 64  scopeid 0x20<link>
        ether a6:9a:02:38:3e:4b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 30393748  bytes 5921348235 (5.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30393748  bytes 5921348235 (5.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

重新初始化控制面板結點

實踐過程中因選項配置不對,在網絡插件安裝后才發現需要,需要重新執行kubeadm init命令。具體實踐操作如下:

# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "localhost.localdomain" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
# 

執行完上述命令后,需要重新執行 初始化控制面板結點操作,並且重新安裝網絡插件

遇到的問題總結

重新執行kubeadm init命令后,執行kubectl get pods --all-namespaces查看Pod狀態,發現coredns-xxxxxxxxxx-xxxxxx 狀態為ContainerCreating,如下

# kubectl get pods --all-namespaces
NAMESPACE     NAME                                            READY   STATUS              RESTARTS   AGE
kube-system   coredns-7f89b7bc75-pxvdx                        0/1     ContainerCreating   0          8m33s
kube-system   coredns-7f89b7bc75-v4p57                        0/1     ContainerCreating   0          8m33s
kube-system   etcd-localhost.localdomain                      1/1     Running             0          8m49s
...略

執行kubectl describe pod coredns-7f89b7bc75-pxvdx -n kube-system命令查看對應Pod詳細信息,發現如下錯誤:

Warning  FailedCreatePodSandBox  98s (x4 over 103s)    kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "04434c63cdf067e698a8a927ba18e5013d2a1a21afa642b3cddedd4ff4592178" network for pod "coredns-7f89b7bc75-pxvdx": networkPlugin cni failed to set up pod "coredns-7f89b7bc75-pxvdx_kube-system" network: failed to set bridge addr: "cni0" already has an IP address different from 10.1.15.1/24

如下,查看網卡信息,發現 cni0已分配了IP地址(網絡插件上次分配的),導致本次網絡插件給它設置IP失敗。

# ifconfig -a
cni0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.118.80.1  netmask 255.255.255.0  broadcast 10.118.80.255
        inet6 fe80::482d:65ff:fea6:32fd  prefixlen 64  scopeid 0x20<link>
        ether 4a:2d:65:a6:32:fd  txqueuelen 1000  (Ethernet)
        RX packets 267800  bytes 16035849 (15.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 116238  bytes 10285959 (9.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

...略
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.1.15.0  netmask 255.255.255.255  broadcast 10.1.15.0
        inet6 fe80::a49a:2ff:fe38:3e4b  prefixlen 64  scopeid 0x20<link>
        ether a6:9a:02:38:3e:4b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0
...略

解決方法如下,刪除配置錯誤的cni0網卡,刪除網卡后會自動重建,然后就好了

$ sudo ifconfig cni0 down    
$ sudo ip link delete cni0

控制面板結點Toleration(可選)

默認的,出於安全考慮,集群不會在控制面板結點機上調度(schedule) Pod。如果希望在控制面板結點機上調度Pod,比如用於開發的單機Kubernetes集群,需要運行以下命令

kubectl taint nodes --all node-role.kubernetes.io/master- # 移除所有Labels以node-role.kubernetes.io/master打頭的結點的污點(Taints)

實踐如下

# kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
localhost.localdomain   Ready    control-plane,master   63m   v1.20.5
# kubectl taint nodes --all node-role.kubernetes.io/master-
node/localhost.localdomain untainted

添加結點到集群

修改新結點的hostname

# hostname
localhost.localdomain
# hostname k8sNode1

以上通過命令修改主機名僅臨時生效,為了避免重啟失效,需要編輯/etc/hostname文件,替換默認的localhost.localdomain為目標名稱(例中為k8sNode),如果不添加,后續操作會遇到一下錯誤

[WARNING Hostname]: hostname "k8sNode1" could not be reached
	[WARNING Hostname]: hostname "k8sNode1": lookup k8sNode1 on 223.5.5.5:53: read udp 10.118.80.94:33293->223.5.5.5:53: i/o timeout

修改/ect/hosts配置,增加結點機hostname到結點機IP(例中為 10.118.80.94)的映射,如下

# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.118.80.94   k8sNode1

ssh登錄目標結點機,切換至root用戶(如果非root用戶登錄),然后運行控制面板機器上執行kubeadm init命令輸出的kubeadm join命令,錄入:

kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>

可在控制面板機上通過運行一下命令查看已有且未過期token

# kubeadm token list

如果沒有token,可在控制面板機上通過以下命令重新生成token

# kubeadm token create

實踐如下

# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo     --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

控制面板節點機即master機器上查看是否新增結點

# kubectl get nodes
NAME                    STATUS     ROLES                  AGE     VERSION
k8snode1                NotReady   <none>                 74s     v1.20.5
localhost.localdomain   Ready      control-plane,master   7h24m   v1.20.5

如上,新增了一個k8snode1結點

遇到問題總結

問題1:運行]kubeadm join時報錯,如下

# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo     --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "ap4vvq"
To see the stack trace of this error execute with --v=5 or higher

解決方法:

token過期,運行kubeadm token create命令重新生成token

問題1:運行]kubeadm join時報錯,如下

# kubeadm join 10.118.80.93:6443 --token pa0gxw.4vx2wud1e7e0rzbx  --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: cluster CA found in cluster-info ConfigMap is invalid: none of the public keys "sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f" are pinned
To see the stack trace of this error execute with --v=5 or higher

解決方法:

discovery-token-ca-cert-hash失效,運行以下命令,重新獲取discovery-token-ca-cert-hash

# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f

使用輸出的hash

--discovery-token-ca-cert-hash sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f

問題2: cni config uninitialized錯誤問題

通過k8s自帶UI查看新加入結點狀態為KubeletNotReady,提示信息如下,

[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage]

解決方法: 重新安裝CNI網絡插件(實踐時采用了虛擬機,可能是因為當時使用的快照沒包含網絡插件),然后重新清理結點,最后再重新加入結點

# CNI_VERSION="v0.8.2"
# mkdir -p /opt/cni/bin
# curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-amd64-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz

清理

如果在集群中使用一次性服務器進行測試,則可以直接關閉這些服務器,不需要進行進一步的清理。可以使用kubectl config delete cluster刪除對集群的本地引用(筆者未試過)。

但是,如果您想更干凈地清理集群,則應該首先清空結點數據,確保節點數據被清空,然后再刪除結點

移除結點

控制面板結點機上的操作

先在控制面板結點機上運行以下命令,告訴控制面板結點機器強制刪除待刪除結點數據

kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets

實踐如下:

# kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
k8snode1                Ready    <none>                 82m   v1.20.5
localhost.localdomain   Ready    control-plane,master   24h   v1.20.5
# kubectl drain k8snode1 --delete-emptydir-data --force --ignore-daemonsets
node/k8snode1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4xqcc, kube-system/kube-proxy-c7qzs
evicting pod default/nginx-deployment-64859b8dcc-v5tcl
evicting pod default/nginx-deployment-64859b8dcc-qjrld
evicting pod default/nginx-deployment-64859b8dcc-rcvc8
pod/nginx-deployment-64859b8dcc-rcvc8 evicted
pod/nginx-deployment-64859b8dcc-qjrld evicted
pod/nginx-deployment-64859b8dcc-v5tcl evicted
node/k8snode1 evicted
# kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
localhost.localdomain   Ready    control-plane,master   24h   v1.20.5
目標結點機上的操作

登錄到目標結點機上,執行以下命令

# kubeadm reset

上述命令不會重置、清理iptablesIPVS表,如果需要重置iptables還需要手動運行以下命令:

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

如果需要重置IPVS,必須運行以下命令。

ipvsadm -C

注意:如果無特殊需求,不要去重置網絡

刪除結點配置文件

# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
控制面板結點機上的操作

通過執行命令刪除結點kubectl delete node <node name>

###刪除未刪除的pod
# kubectl delete pod kube-flannel-ds-4xqcc -n kube-system --force
# kubectl delete pod kube-proxy-c7qzs -n kube-system --force
# kubectl delete node k8snode1
node "k8snode1" deleted

刪除后,如果需要重新加入結點,可通過 kubeadm join 攜帶適當參數運行加入

清理控制面板

可以在控制面板結點機上,使用kubeadm reset 命令。點擊查看 kubeadm reset 命令參考

參考連接

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM