一、k8s調度流程
1、(預選)先排除完全不符合pod運行要求的節點
2、(優先)根據一系列算法,算出node的得分,最高沒有相同的,就直接選擇
3、上一步有相同的話,就隨機選一個
二、調度方式
1、node(運行在那些node上)
2、pod選擇(當需要運行在某個pod在一個節點上(pod親和性),或不要pod和某個pod運行在一起(pod反親和性))
3、污點 (pod是否能容忍污點,能則能調度到該節點,不能容忍則無法調度到該節點,如果存在則驅離pod),可以定義容忍時間
三、常用的預選機制
調度器: 預選策略:(一部分) CheckNodeCondition:#檢查節點是否正常(如ip,磁盤等) GeneralPredicates HostName:#檢查Pod對象是否定義了pod.spec.hostname PodFitsHostPorts:#pod要能適配node的端口 pods.spec.containers.ports.hostPort(指定綁定在節點的端口上) MatchNodeSelector:#檢查節點的NodeSelector的標簽 pods.spec.nodeSelector PodFitsResources:#檢查Pod的資源需求是否能被節點所滿足 NoDiskConflict: #檢查Pod依賴的存儲卷是否能滿足需求(默認未使用) PodToleratesNodeTaints:#檢查Pod上的spec.tolerations可容忍的污點是否完全包含節點上的污點; PodToleratesNodeNoExecuteTaints:#不能執行(NoExecute)的污點(默認未使用) CheckNodeLabelPresence:#檢查指定的標簽再上節點是否存在 CheckServiceAffinity:#將相同services相同的pod盡量放在一起(默認未使用) MaxEBSVolumeCount: #檢查EBS(AWS存儲)存儲卷的最大數量 MaxGCEPDVolumeCount #GCE存儲最大數 MaxAzureDiskVolumeCount: #AzureDisk 存儲最大數 CheckVolumeBinding: #檢查節點上已綁定或未綁定的pvc NoVolumeZoneConflict: #檢查存儲卷對象與pod是否存在沖突 CheckNodeMemoryPressure:#檢查節點內存是否存在壓力過大 CheckNodePIDPressure: #檢查節點上的PID數量是否過大 CheckNodeDiskPressure: #檢查內存、磁盤IO是否過大 MatchInterPodAffinity: #檢查節點是否能滿足pod的親和性或反親和性
四、常用的優選函數
LeastRequested:#空閑量越高得分越高 (cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2 BalancedResourceAllocation:#CPU和內存資源被占用率相近的勝出; NodePreferAvoidPods: #節點注解信息“scheduler.alpha.kubernetes.io/preferAvoidPods” TaintToleration:#將Pod對象的spec.tolerations列表項與節點的taints列表項進行匹配度檢查,匹配條目越,得分越低; SeletorSpreading:#標簽選擇器分散度,(與當前pod對象通選的標簽,所選其它pod越多的得分越低) InterPodAffinity:#遍歷pod對象的親和性匹配項目,項目越多得分越高 NodeAffinity: #節點親和性 、 MostRequested: #空閑量越小得分越高,和LeastRequested相反 (默認未啟用) NodeLabel: #節點是否存在對應的標簽 (默認未啟用) ImageLocality:#根據滿足當前Pod對象需求的已有鏡像的體積大小之和(默認未啟用)
五、高級調度設置方式
1、nodeSelector選擇器
#查看標簽
[root@k8s-m ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-m Ready master 120d v1.11.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master=
node1 Ready <none> 120d v1.11.2 app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1
#使用nodeSelector選擇器,選擇disk=ssd的node
#查看
[root@k8s-m schedule]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-pod 1/1 Running 0 49s 10.244.1.92 node1 <none>
[root@k8s-m schedule]# cat my-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
nodeSelector:
disk: ssd
#如果nodeSelector中指定的標簽節點都沒有,該pod就會處於Pending狀態(預選失敗)
2、affinity
2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (軟親和,選擇條件匹配多的,就算都不滿足條件,還是會生成pod)
#使用
[root@k8s-m schedule]# cat my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: test_node1 #標簽鍵名
operator: In #In表示在
values:
- k8s-node1 #test_node1標簽的值
- test1 #test_node1標簽的值
weight: 60 #匹配相應nodeSelectorTerm相關聯的權重,1-100
##查看(不存在這個標簽,但是還是創建bin運行了)
[root@k8s-m schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity-pod 1/1 Running 0 16s
2.2、requiredDuringSchedulingIgnoredDuringExecution (硬親和,類似nodeSelector,硬性需求,如果不滿足條件不會調度pod,都不滿足則Pending)
[root@k8s-m schedule]# cat my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test_node1 #標簽鍵名
operator: In #In表示在
values:
- k8s-node1 #test_node1標簽的值
- test1 #test_node1標簽的值
#查看(沒有test_node1這個標簽,所以會Pending)
[root@k8s-m schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity-pod 0/1 Pending 0 4s
六、pod的親和與反親和性
1、podAffinity:(讓pod和某個pod處於同一地方(同一地方不一定指同一node節點,根據個人使用的標簽定義))
#使用(讓affinity-pod和my-pod1處於同一處)
[root@k8s-m schedule]# cat my-affinity-pod2.yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod1
labels:
app1: my-pod1
spec:
containers:
- name: my-pod1
image: nginx
ports:
- name: http
containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app1 #標簽鍵名,上面pod定義
operator: In #In表示在
values:
- my-pod1 #app1標簽的值
topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod處於同一位置 #此pod應位於同一位置(親和力)或不位於同一位置(反親和力),與pods匹配指定名稱空間中的labelSelector,其中co-located定義為在標簽值為的節點上運行,key topologyKey匹配任何選定pod的任何節點在跑
#查看
[root@k8s-m schedule]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
affinity-pod 1/1 Running 0 54s 10.244.1.98 node1 <none>
my-pod1 1/1 Running 0 54s 10.244.1.97 node1 <none>
2、podAntiAffinity(讓pod和某個pod不處於同一node,和上面相反)
[root@k8s-m schedule]# cat my-affinity-pod3.yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod1
labels:
app1: my-pod1
spec:
containers:
- name: my-pod1
image: nginx
ports:
- name: http
containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: affinity-pod
labels:
app: my-pod
spec:
containers:
- name: affinity-pod
image: nginx
ports:
- name: http
containerPort: 80
affinity:
podAntiAffinity: #就改了這里
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app1 #標簽鍵名,上面pod定義
operator: In #In表示在
values:
- my-pod1 #app1標簽的值
topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod不處於同一位置
#查看(我自有一台node,所有是Pending狀態)
[root@k8s-m schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity-pod 0/1 Pending 0 1m
my-pod1 1/1 Running 0 1m
七、污點調度
taint的effect定義對Pod排斥效果:
NoSchedule:#僅影響調度過程,對現存的Pod對象不產生影響;
NoExecute:#既影響調度過程,也影響現在的Pod對象;不容忍的Pod對象將被驅逐;
PreferNoSchedule: #當沒合適地方運行pod了,也會找地方運行pod
1、查看並管理污點
#查看node污點(Taints) [root@k8s-m schedule]# kubectl describe node k8s-m |grep Taints Taints: node-role.kubernetes.io/master:NoSchedule [root@k8s-m schedule]# kubectl describe node node1 |grep Taints Taints: <none> #管理污點taint kubectl taint node -h #打污點(給node打標簽) kubectl taint node node1 node-type=PreferNoSchedule:NoSchedule #查看 [root@k8s-m schedule]# kubectl describe node node1 |grep Taints Taints: node-type=PreferNoSchedule:NoSchedule #刪除污點 [root@k8s-m ~]# kubectl taint node node1 node-type- node/node1 untainted #查看 [root@k8s-m ~]# kubectl describe node node1 |grep Taints aints: <none>
2、使用污點
#創建pod
[root@k8s-m ~]# cat mypod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
#查看pod(Pinding了)
[root@k8s-m ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-pod 0/1 Pending 0 32s
#不能容忍污點
[root@k8s-m ~]# kubectl describe pod nginx-pod|tail -1
Warning FailedScheduling 3s (x22 over 1m) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
###使用
[root@k8s-m ~]# cat mypod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
tolerations: #容忍的污點
- key: "node-type" #之前定義的污點名
operator: "Equal" #Exists,如果node-type污點在,就能容忍,Equal精確
value: "PreferNoSchedule" #污點值
effect: "NoSchedule" #效果
#tolerationSeconds: 3600 #如果被驅逐的話,容忍時間,只能是effect為tolerationSeconds或NoExecute定義
#查看(已經調度了)
[root@k8s-m ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-pod 1/1 Running 0 3m 10.244.1.100 node1 <none>
