K8s的高級調度方式-親和度和污點
1 默認的scheduler的調度過程:
- 預選策略:從所有節點當中選擇基本符合選擇條件的節點。
- 優選函數:在眾多符合基本條件的節點中使用優選函數,計算節點各自的得分,通過比較進行排序。
- 從最高得分的節點中隨機選擇出一個作為Pod運行的節點。
可以通過自己的預設來影響預選、優選過程,從而實現符合我們期望的調度結果。
2 影響調度方式:
- 節點選擇器:NodeSelector,甚至可以設置nodename來選擇節點本身。
- 親和性調度:NodeAffinity(節點親和性)、podAffinity(Pod親和性)、PodAntiAffinity(Pod的反親和性)
- 污點和容忍度:Taint、toleration
3 節點選擇器:NodeSelector
如果我們期望把Pod調度到某一個特定的節點上,可以通過設定Pod.spec.nodeName給定node名稱實現。我們可以給一部分node打上特有標簽,在pod.spec.nodeSelector中匹配這些標簽。可以極大的縮小預選范圍。
給node添加標簽:
kubectl label nodes NODE_NAME key1=value1...keyN=valueN
如:在node01上打上標簽為app=frontend,而在pod上設置NodeSelector為這個標簽,則此Pod只能運行在存在此標簽的節點上。
若沒有node存在此標簽,則Pod無法被調度,即為Pending狀態。
我們先給一個node打上標簽
[root@k8s-master ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8s-master Ready master 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master= k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node= k8s-node-2 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node= [root@k8s-master ~]# [root@k8s-master ~]# [root@k8s-master ~]# kubectl label nodes k8s-node-1 disk=ssd node/k8s-node-1 labeled [root@k8s-master ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8s-master Ready master 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master= k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node= k8s-node-2 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node= [root@k8s-master ~]# kubectl get nodes --show-labels|grep ssd k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node= [root@k8s-master ~]#
# cat nodeSelector.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 nodeSelector: disk: ssd #如果nodeSelector中指定的標簽節點都沒有,該pod就會處於Pending狀態(預選失敗)
[root@k8s-master schedule]# kubectl create -f nodeSelector.yaml pod/nginx-pod created [root@k8s-master schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-pod 1/1 Running 0 6s [root@k8s-master schedule]# kubectl describe pod nginx-pod | grep Node Node: k8s-node-1/10.6.76.23 Node-Selectors: disk=ssd [root@k8s-master schedule]#
4 節點親和度調度nodeAffinity
requiredDuringSchedulingIgnoredDuringExecution 硬親和性 必須滿足親和性。
preferredDuringSchedulingIgnoredDuringExecution 軟親和性 能滿足最好,不滿足也沒關系。
4.1 硬親和性
matchExpressions : 匹配表達式,這個標簽可以指定一段,例如pod中定義的key為zone,operator為In(包含那些),values為 foo和bar。就是在node節點中包含foo和bar的標簽中調度
matchFields : 匹配字段 和上面的意思 不過他可以不定義標簽值,可以定義
選擇在 node 有 zone 標簽值為 foo 或 bbb 值的節點上運行 pod
[root@k8s-master ~]# kubectl get nodes --show-labels| grep zone k8s-node-1 Ready node 46d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=,zone=foo [root@k8s-master ~]#
# cat node-affinity-1.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-hello-deployment namespace: labels: app: nginx-hello spec: replicas: 2 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bbb containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESSGATES nginx-hello-deployment-d457bd7bc-fsjjn 1/1 Running 0 2m34s 10.254.1.124 k8s-node-1 <none> <none> nginx-hello-deployment-d457bd7bc-ntb8h 1/1 Running 0 2m34s 10.254.1.123 k8s-node-1 <none> <none> nginx-pod 1/1 Running 0 58m 10.254.1.120 k8s-node-1 <none> <none> [root@k8s-master schedule]#
我們發現都按標簽 分配到node1 上面了,我們把標簽改一下,讓pod匹配不上
# cat node-affinity-1.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-hello-deployment namespace: labels: app: nginx-hello spec: replicas: 2 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo-no - bbb-no containers: - name: nginx-hello image: nginx ports: - containerPort: 80
#查看(沒有zone這個標簽value值匹配不上,所以會Pending
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-hello-deployment-6c96b5675f-8jqnx 0/1 Pending 0 43s <none> <none> <none> <none> nginx-hello-deployment-6c96b5675f-lbnsw 0/1 Pending 0 43s <none> <none> <none> <none> nginx-pod 1/1 Running 0 60m 10.254.1.120 k8s-node-1 <none> <none> [root@k8s-master schedule]#
4.2 軟親和
nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (軟親和,選擇條件匹配多的,就算都不滿足條件,還是會生成pod)
# cat node-affinity-1.yaml apiVersion: apps/v1kind: Deployment metadata: name: nginx-hello-deployment namespace: labels: app: nginx-hellospec: replicas: 2 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: zone operator: In values: - foo-no - bbb-no weight: 60 #匹配相應nodeSelectorTerm相關聯的權重,1-100 containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-hello-deployment-98654dc57-cvvlb 1/1 Running 0 15s 10.254.1.125 k8s-node-1 <none> <none> nginx-hello-deployment-98654dc57-mglbx 1/1 Running 0 20s 10.254.2.90 k8s-node-2 <none> <none> nginx-pod 1/1 Running 0 72m 10.254.1.120 k8s-node-1 <none> <none> [root@k8s-master schedule]#
5 pod親和度podAffinity
Pod親和性場景,我們的k8s集群的節點分布在不同的區域或者不同的機房,當服務A和服務B要求部署在同一個區域或者同一機房的時候,我們就需要親和性調度了。
labelSelector : 選擇跟那組Pod親和
namespaces : 選擇哪個命名空間
topologyKey : 指定節點上的哪個鍵
5.1 按labelSelector標簽親和
讓兩個POD標簽處於一處
# cat pod-affinity.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment-pod-affinity namespace: labels: app: nginx-hello spec: replicas: 1 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: podAffinity: #preferredDuringSchedulingIgnoredDuringExecution: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #標簽鍵名,上面pod定義 operator: In #In表示在 values: - nginx #app標簽的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod處於同一位置 #此pod應位於同一位置(親和力)或不位於同一位置(反親和力),與pods匹配指定名稱空間中的labelSelector,其中co-located定義為在標簽值為的節點上運行,key topologyKey匹配任何選定pod的任何節點在跑 containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master ~]# kubectl get pod -o wide| grep nginx nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 6s 10.254.2.92 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-5566c6d4fd-2tnrq 1/1 Running 0 6s 10.254.2.93 k8s-node-2 <none> <none> [root@k8s-master ~]#
5.2 podAntiAffinity反親和
讓pod和某個pod不處於同一node,和上面相反)
# cat pod-affinity.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment-pod-affinity namespace: labels: app: nginx-hello spec: replicas: 1 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: #podAffinity: podAntiAffinity: #就改了這里 requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #標簽鍵名,上面pod定義 operator: In #In表示在 values: - nginx #app1標簽的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod處於同一位置 #此pod應位於同一位置(親和力)或不位於同一位置(反親和力),與pods匹配指定名稱空間中的labelSelector,其中co-located定義為在標簽值為的節點上運行,key topologyKey匹配任何選定pod的任何節點在跑 containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master ~]# kubectl apply -f a.yaml deployment.extensions/nginx-deployment unchanged deployment.apps/nginx-deployment-pod-affinity configured [root@k8s-master ~]# kubectl get pod -o wide| grep nginx nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 68s 10.254.2.92 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-5566c6d4fd-2tnrq 1/1 Running 0 68s 10.254.2.93 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-86bdf6996b-fdb8f 0/1 ContainerCreating 0 4s <none> k8s-node-1 <none> <none> [root@k8s-master ~]# [root@k8s-master ~]# [root@k8s-master ~]# kubectl get pod -o wide| grep nginx nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 73s 10.254.2.92 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-86bdf6996b-fdb8f 1/1 Running 0 9s 10.254.1.56 k8s-node-1 <none> <none> [root@k8s-master ~]#
6 污點調度
https://www.cnblogs.com/klvchen/p/10025205.html
taints and tolerations 允許將某個節點做標記,以使得所有的pod都不會被調度到該節點上。但是如果某個pod明確制定了 tolerates 則可以正常調度到被標記的節點上。
# 可以使用命令行為 Node 節點添加 Taints:
kubectl taint nodes node1 key=value:NoSchedule
operator可以定義為:
Equal:表示key是否等於value,默認
Exists:表示key是否存在,此時無需定義value
tain 的 effect 定義對 Pod 排斥效果:
NoSchedule:僅影響調度過程,對現存的Pod對象不產生影響;
NoExecute:既影響調度過程,也影響顯著的Pod對象;不容忍的Pod對象將被驅逐
PreferNoSchedule: 表示盡量不調度
#查看污點
[root@k8s-master schedule]# kubectl describe node k8s-master |grep Taints Taints: node-role.kubernetes.io/master:PreferNoSchedule [root@k8s-master schedule]#
#給node1打上污點
#kubectl taint node k8s-node-1 node-type=production:NoSchedule [root@k8s-master schedule]# kubectl describe node k8s-node-1 |grep Taints Taints: node-type=production:NoSchedule [root@k8s-master schedule]#
# cat deploy.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged
#pod都運行在node-2上
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-6f6d9b887f-j5nmz 1/1 Running 0 83s 10.254.2.94 k8s-node-2 <none> <none> nginx-deployment-6f6d9b887f-wjfpp 1/1 Running 0 83s 10.254.2.93 k8s-node-2 <none> <none> [root@k8s-master schedule]#
#給node2打上污點
[root@k8s-master schedule]# kubectl delete deployments nginx-deployment deployment.extensions "nginx-deployment" deleted [root@k8s-master schedule]# kubectl get pods -o wide No resources found. [root@k8s-master schedule]# kubectl taint node k8s-node-2 node-type=production:NoSchedule node/k8s-node-2 tainted [root@k8s-master schedule]# kubectl describe node k8s-node-2 |grep Taints Taints: node-type=production:NoSchedule [root@k8s-master schedule]# [root@k8s-master schedule]# kubectl apply -f deploy.yaml deployment.extensions/nginx-deployment created
#結果pod都運行在master上了
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-6f6d9b887f-ck6pd 1/1 Running 0 15s 10.254.0.48 k8s-master <none> <none> nginx-deployment-6f6d9b887f-gdwm6 1/1 Running 0 15s 10.254.0.49 k8s-master <none> <none> [root@k8s-master schedule]#
#master也打上污點
[root@k8s-master schedule]# kubectl taint node k8s-master node-type=production:NoSchedule node/k8s-master tainted [root@k8s-master schedule]# kubectl delete deployments nginx-deployment deployment.extensions "nginx-deployment" deleted [root@k8s-master schedule]# kubectl apply -f deploy.yaml deployment.extensions/nginx-deployment created [root@k8s-master schedule]#
#沒有節點可以啟動pod
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-6f6d9b887f-mld4v 0/1 Pending 0 5s <none> <none> <none> <none> nginx-deployment-6f6d9b887f-q4nfj 0/1 Pending 0 5s <none> <none> <none> <none> [root@k8s-master schedule]#
#不能容忍污點
[root@k8s-master schedule]# kubectl describe pod nginx-deployment-6f6d9b887f-mld4v |tail -1 Warning FailedScheduling 51s (x6 over 3m29s) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
#定義Toleration(容忍)
# cat deploy.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80 tolerations: - key: "node-type" # #之前定義的污點名 operator: "Equal" #Exists,如果node-type污點在,就能容忍,Equal精確 value: "production" #污點值 effect: "NoSchedule" #效果
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged
#兩個pod均衡調度到兩個node
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-565dd6b94d-4cdhz 1/1 Running 0 32s 10.254.1.130 k8s-node-1 <none> <none> nginx-deployment-565dd6b94d-fqzm7 1/1 Running 0 32s 10.254.2.95 k8s-node-2 <none> <none> [root@k8s-master schedule]#
#刪除污點
[root@k8s-master schedule]# kubectl describe nodes k8s-master |grep Taints Taints: node-role.kubernetes.io/master:PreferNoSchedule [root@k8s-master schedule]# kubectl describe nodes k8s-node-1 |grep Taints Taints: node-type=production:NoSchedule [root@k8s-master schedule]# kubectl taint node k8s-node-1 node-type- node/k8s-node-1 untainted [root@k8s-master schedule]# kubectl describe nodes k8s-node-1 |grep Taints Taints: <none> [root@k8s-master schedule]#