Taints和Tolerations(污點和容忍)
Taint需要與Toleration配合使用,讓pod避開那些不合適的node。在node上設置一個或多個Taint后,除非pod明確聲明能夠容忍這些“污點”,否則無法在這些node上運行。Toleration是pod的屬性,讓pod能夠(注意,只是能夠,而非必須)運行在標注了Taint的node上。
默認情況下,所有的應用pod都不會運行在有污點的節點上
[root@docker-server1 deployment]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-674bd96f74-8d7ml 0/1 Pending 0 38m <none> <none> <none> <none> goproxy 1/1 Running 1 3d12h 10.244.1.21 192.168.132.132 <none> <none> hello-deployment-5fdb46d67c-dqnnh 1/1 Running 0 25h 10.244.1.25 192.168.132.132 <none> <none> hello-deployment-5fdb46d67c-s68tf 1/1 Running 0 25h 10.244.2.15 192.168.132.133 <none> <none> hello-deployment-5fdb46d67c-x5nwl 1/1 Running 0 25h 10.244.1.24 192.168.132.132 <none> <none> init-demo 1/1 Running 1 3d11h 10.244.1.23 192.168.132.132 <none> <none> mysql-5d4695cd5-kzlms 1/1 Running 0 23h 10.244.1.28 192.168.132.132 <none> <none> nginx 2/2 Running 21 3d14h 10.244.2.14 192.168.132.133 <none> <none> nginx-volume 1/1 Running 1 3d11h 10.244.1.19 192.168.132.132 <none> <none> wordpress-6cbb67575d-b9md5 1/1 Running 0 23h 10.244.0.10 192.168.132.131 <none> <none>
1 打污點
給192.168.132.132打上污點
[root@docker-server1 deployment]# kubectl taint node 192.168.132.132 ingress=enable:NoExecute
node/192.168.132.132 tainted
[root@docker-server1 deployment]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-674bd96f74-8d7ml 0/1 Pending 0 44m <none> <none> <none> <none> hello-deployment-5fdb46d67c-gw2t6 1/1 Running 0 37s 10.244.2.18 192.168.132.133 <none> <none> hello-deployment-5fdb46d67c-s68tf 1/1 Running 0 25h 10.244.2.15 192.168.132.133 <none> <none> hello-deployment-5fdb46d67c-vzb4f 1/1 Running 0 37s 10.244.2.16 192.168.132.133 <none> <none> mysql-5d4695cd5-v6btl 0/1 ContainerCreating 0 37s <none> 192.168.132.133 <none> <none> nginx 2/2 Running 22 3d14h 10.244.2.14 192.168.132.133 <none> <none> wordpress-6cbb67575d-b9md5 1/1 Running 0 23h 10.244.0.10 192.168.132.131 <none> <none>
2 檢查污點機器的pods狀態
132節點已經沒有pod運行
當配置有taint和label,taint比lable具有更高的優先級,拒絕優先,比如
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-79669b846b-588cs 0/1 Pending 0 3m51s <none> <none> <none> <none>
標簽時打向192.168.132.132,但是有污點,拒絕優先,但是其他節點沒有匹配標簽,就會一直pengding
3 容忍污點配置
[root@docker-server1 deployment]# vim /yamls/ingress/nginx-controller.yaml
nodeSelector: ingress: enable tolerations: - key: "ingress" operator: "Equal" value: "enable" effect: "NoExecute"
[root@docker-server1 deployment]# kubectl apply -f /yamls/ingress/nginx-controller.yaml
namespace/ingress-nginx unchanged configmap/nginx-configuration unchanged configmap/tcp-services unchanged configmap/udp-services unchanged serviceaccount/nginx-ingress-serviceaccount unchanged clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole unchanged role.rbac.authorization.k8s.io/nginx-ingress-role unchanged rolebinding.rbac.authorization.k8s.io/nginx-ingress-role-nisa-binding unchanged clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding unchanged deployment.apps/nginx-ingress-controller configured limitrange/ingress-nginx configured
4 查看機器pods狀態
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-79669b846b-588cs 0/1 Pending 0 12m <none> <none> <none> <none> nginx-ingress-controller-dd4864d55-2tlk2 0/1 Running 0 3s 192.168.132.132 192.168.132.132 <none> <none>
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-dd4864d55-2tlk2 1/1 Running 0 80s 192.168.132.132 192.168.132.132 <none> <none>
ingress的pod運行在132上,ingress專機專用,機器獨占
5 設置污點示例
kubectl taint node [node] key=value[effect] 其中[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ] NoSchedule :一定不能被調度。 PreferNoSchedule:盡量不要調度。 NoExecute:不僅不會調度,還會驅逐Node上已有的Pod。
kubectl taint node 192.168.132.132 ingress=enable:NoExecute 最后的就是設置污點的級別
一個節點可以設置多個污點,但是pod在容忍無污點的時候,也必須時容忍所有的污點才能運行在這個節點上,有一個污點不能容忍,也不會運行在該節點上
上面的例子中effect的取值為NoSchedule,下面對effect的值作下簡單說明:
-
NoSchedule: 如果一個pod沒有聲明容忍這個Taint,則系統不會把該Pod調度到有這個Taint的node上
-
PreferNoSchedule:NoSchedule的軟限制版本,如果一個Pod沒有聲明容忍這個Taint,則系統會盡量避免把這個pod調度到這一節點上去,但不是強制的。
-
NoExecute:定義pod的驅逐行為,以應對節點故障。NoExecute這個Taint效果對節點上正在運行的pod有以下影響:
-
沒有設置Toleration的Pod會被立刻驅逐
-
配置了對應Toleration的pod,如果沒有為tolerationSeconds賦值,則會一直留在這一節點中
-
配置了對應Toleration的pod且指定了tolerationSeconds值,則會在指定時間后驅逐
-
從kubernetes 1.6版本開始引入了一個alpha版本的功能,即把節點故障標記為Taint(目前只針對node unreachable及node not ready,相應的NodeCondition "Ready"的值為Unknown和False)。激活TaintBasedEvictions功能后(在--feature-gates參數中加入TaintBasedEvictions=true),NodeController會自動為Node設置Taint,而狀態為"Ready"的Node上之前設置過的普通驅逐邏輯將會被禁用。注意,在節點故障情況下,為了保持現存的pod驅逐的限速設置,系統將會以限速的模式逐步給node設置Taint,這就能防止在一些特定情況下(比如master暫時失聯)造成的大量pod被驅逐的后果。這一功能兼容於tolerationSeconds,允許pod定義節點故障時持續多久才被逐出。
-
6 節點多污點設置
系統允許在同一個node上設置多個taint,也可以在pod上設置多個Toleration。Kubernetes調度器處理多個Taint和Toleration能夠匹配的部分,剩下的沒有忽略掉的Taint就是對Pod的效果了。下面是幾種特殊情況:
-
如果剩余的Taint中存在effect=NoSchedule,則調度器不會把該pod調度到這一節點上。
-
如果剩余的Taint中沒有NoSchedule的效果,但是有PreferNoSchedule效果,則調度器會嘗試不會pod指派給這個節點
-
如果剩余Taint的效果有NoExecute的,並且這個pod已經在該節點運行,則會被驅逐;如果沒有在該節點運行,也不會再被調度到該節點上。
為192.168.132.132再打一個污點
[root@docker-server1 deployment]# kubectl taint node 192.168.132.132 ingress=enable:NoSchedule
node/192.168.132.132 tainted
只要有一個不一樣,就會被認為時新的污點
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-dd4864d55-2tlk2 1/1 Running 0 47h 192.168.132.132 192.168.132.132 <none> <none>
因為這個污點時,NoSchedule,是盡量不要調度,所以運行的不會被驅逐,但是殺掉pod后,就不會再起來
[root@docker-server1 deployment]# kubectl delete pods nginx-ingress-controller-dd4864d55-2tlk2 -n ingress-nginx
pod "nginx-ingress-controller-dd4864d55-2tlk2" deleted
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-dd4864d55-tkk6n 0/1 Pending 0 22s <none> <none> <none> <none>
如果需要重新running,需要再為這個污點配置容忍
8 pods容忍多污點配置
[root@docker-server1 deployment]# vim /yamls/ingress/nginx-controller.yaml
[root@docker-server1 deployment]# kubectl apply -f /yamls/ingress/nginx-controller.yaml
namespace/ingress-nginx unchanged configmap/nginx-configuration unchanged configmap/tcp-services unchanged configmap/udp-services unchanged serviceaccount/nginx-ingress-serviceaccount unchanged clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole unchanged role.rbac.authorization.k8s.io/nginx-ingress-role unchanged rolebinding.rbac.authorization.k8s.io/nginx-ingress-role-nisa-binding unchanged clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding unchanged deployment.apps/nginx-ingress-controller configured limitrange/ingress-nginx configured
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-7487db85f9-tmsdq 0/1 Running 0 2s 192.168.132.132 192.168.132.132 <none> <none> nginx-ingress-controller-dd4864d55-tkk6n 0/1 Pending 0 30m <none> <none> <none> <none>
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-7487db85f9-tmsdq 1/1 Running 0 39s 192.168.132.132 192.168.132.132 <none> <none>
9 配置容忍的Exists
當容忍多污點時,使用Existed,就只需要指定key,不需要指定key的值
[root@docker-server1 deployment]# kubectl apply -f /yamls/ingress/nginx-controller.yaml
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-66fb449f6f-8pb29 0/1 Pending 0 3s <none> <none> <none> <none> nginx-ingress-controller-7487db85f9-tmsdq 1/1 Running 0 33h 192.168.132.132 192.168.132.132 <none> <none>
新啟的機器是pending
[root@docker-server1 deployment]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS 192.168.132.131 Ready master 7d5h v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.132.131,kubernetes.io/os=linux,node-role.kubernetes.io/master= 192.168.132.132 Ready <none> 7d5h v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,ingress=enable,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.132.132,kubernetes.io/os=linux 192.168.132.133 Ready <none> 7d5h v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.132.133,kubernetes.io/os=linux
10 查看污點
查看機器打的污點
[root@docker-server1 deployment]# kubectl describe node 192.168.132.132
Name: 192.168.132.132 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux ingress=enable kubernetes.io/arch=amd64 kubernetes.io/hostname=192.168.132.132 kubernetes.io/os=linux Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"22:69:dd:55:70:87"} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 192.168.132.132 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 09 Jan 2020 13:31:58 -0500 Taints: ingress=enable:NoExecute #兩個污點 ingress=enable:NoSchedule Unschedulable: false Lease: HolderIdentity: 192.168.132.132 AcquireTime: <unset> RenewTime: Thu, 16 Jan 2020 19:11:13 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Thu, 16 Jan 2020 19:06:55 -0500 Sun, 12 Jan 2020 06:45:19 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 16 Jan 2020 19:06:55 -0500 Sun, 12 Jan 2020 06:45:19 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 16 Jan 2020 19:06:55 -0500 Sun, 12 Jan 2020 06:45:19 -0500 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 16 Jan 2020 19:06:55 -0500 Sun, 12 Jan 2020 06:45:19 -0500 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.132.132 Hostname: 192.168.132.132 Capacity: cpu: 4 ephemeral-storage: 49250820Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7990132Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 45389555637 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7887732Ki pods: 110 System Info: Machine ID: 817ad910bace4109bda4f5dc5c709092 System UUID: 88884D56-86A7-4238-F2D9-5802E163FD11 Boot ID: 9dd1778e-168b-4296-baa2-d28d2839fab1 Kernel Version: 3.10.0-1062.4.1.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.5 Kubelet Version: v1.17.0 Kube-Proxy Version: v1.17.0 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 Non-terminated Pods: (2 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- ingress-nginx nginx-ingress-controller-7487db85f9-tmsdq 100m (2%) 0 (0%) 90Mi (1%) 0 (0%) 33h kube-system kube-proxy-7xgt9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7d5h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 100m (2%) 0 (0%) memory 90Mi (1%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) Events: <none>
查看pengding的容器信息
[root@docker-server1 deployment]# kubectl describe pods nginx-ingress-controller-66fb449f6f-8pb29 -n ingress-nginx
Name: nginx-ingress-controller-66fb449f6f-8pb29 Namespace: ingress-nginx Priority: 0 Node: <none> Labels: app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx pod-template-hash=66fb449f6f Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container nginx-ingress-controller prometheus.io/port: 10254 prometheus.io/scrape: true Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/nginx-ingress-controller-66fb449f6f Containers: nginx-ingress-controller: Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master Ports: 80/TCP, 443/TCP Host Ports: 80/TCP, 443/TCP Args: /nginx-ingress-controller --configmap=$(POD_NAMESPACE)/nginx-configuration --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services --udp-services-configmap=$(POD_NAMESPACE)/udp-services --publish-service=$(POD_NAMESPACE)/ingress-nginx --annotations-prefix=nginx.ingress.kubernetes.io Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=10s period=10s #success=1 #failure=3 Readiness: http-get http://:10254/healthz delay=0s timeout=10s period=10s #success=1 #failure=3 Environment: POD_NAME: nginx-ingress-controller-66fb449f6f-8pb29 (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-l89pw (ro) Conditions: Type Status PodScheduled False Volumes: nginx-ingress-serviceaccount-token-l89pw: Type: Secret (a volume populated by a Secret) SecretName: nginx-ingress-serviceaccount-token-l89pw Optional: false QoS Class: Burstable Node-Selectors: ingress=enable Tolerations: ingress:NoExecute ingress:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match node selector. Warning FailedScheduling <unknown> default-scheduler 0/3 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match node selector.
是因為沒有多余的ports使用,使用hosts模式,第一台機器占據端口沒有釋放
殺掉第一個pods
[root@docker-server1 deployment]# kubectl delete pods nginx-ingress-controller-7487db85f9-tmsdq -n ingress-nginx
[root@docker-server1 deployment]# kubectl get pods -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-66fb449f6f-8pb29 1/1 Running 0 13m 192.168.132.132 192.168.132.132 <none> <none>
pods以及你個正常屬於running狀態,問題解決
ingress的應用場景
1 節點獨占
如果想要拿出一部分節點,專門給特定的應用使用,則可以為節點添加這樣的Taint:
kubectl taint nodes nodename dedicated=groupName:NoSchedule
然后給這些應用的pod加入相應的toleration,則帶有合適toleration的pod就會被允許同使用其他節點一樣使用有taint的節點。然后再將這些node打上指定的標簽,再通過nodeSelector或者親和性調度的方式,要求這些pod必須運行在指定標簽的節點上。
2 具有特殊硬件設備的節點
在集群里,可能有一小部分節點安裝了特殊的硬件設備,比如GPU芯片。用戶自然會希望把不需要占用這類硬件的pod排除在外。以確保對這類硬件有需求的pod能夠順利調度到這些節點上。可以使用下面的命令為節點設置taint:
kubectl taint nodes nodename special=true:NoSchedule kubectl taint nodes nodename special=true:PreferNoSchedule
然后在pod中利用對應的toleration來保障特定的pod能夠使用特定的硬件。然后同樣的,我們也可以使用標簽或者其他的一些特征來判斷這些pod,將其調度到這些特定硬件的服務器上。
3 應對節點故障
之前說到,在節點故障時,可以通過TaintBasedEvictions功能自動將節點設置Taint,然后將pod驅逐。但是在一些場景下,比如說網絡故障造成的master與node失聯,而這個node上運行了很多本地狀態的應用即使網絡故障,也仍然希望能夠持續在該節點上運行,期望網絡能夠快速恢復,從而避免從這個node上被驅逐。Pod的Toleration可以這樣定義:
tolerations: - key: "node.alpha.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 6000
對於Node未就緒狀態,可以把key設置為node.alpha.kubernetes.io/notReady。
如果沒有為pod指定node.alpha.kubernetes.io/noReady的Toleration,那么Kubernetes會自動為pod加入tolerationSeconds=300的node.alpha.kubernetes.io/notReady類型的toleration。
同樣,如果沒有為pod指定node.alpha.kubernetes.io/unreachable的Toleration,那么Kubernetes會自動為pod加入tolerationSeconds=300的node.alpha.kubernetes.io/unreachable類型的toleration。
這些系統自動設置的toleration用於在node發現問題時,能夠為pod確保驅逐前再運行5min。這兩個默認的toleration由Admission Controller "DefaultTolerationSeconds"自動加入。
博主聲明:本文的內容來源主要來自譽天教育晏威老師,由本人實驗完成操作驗證,需要的博友請聯系譽天教育(http://www.yutianedu.com/),獲得官方同意或者晏老師(https://www.cnblogs.com/breezey/)本人同意即可轉載,謝謝!