k8s調度器、預選策略及調度方式


一、k8s調度流程

1、(預選)先排除完全不符合pod運行要求的節點
2、(優先)根據一系列算法,算出node的得分,最高沒有相同的,就直接選擇
3、上一步有相同的話,就隨機選一個

 

二、調度方式

1、node(運行在那些node上)
2、pod選擇(當需要運行在某個pod在一個節點上(pod親和性),或不要pod和某個pod運行在一起(pod反親和性))
3、污點 (pod是否能容忍污點,能則能調度到該節點,不能容忍則無法調度到該節點,如果存在則驅離pod),可以定義容忍時間

 

三、常用的預選機制

調度器:
預選策略:(一部分)

CheckNodeCondition:#檢查節點是否正常(如ip,磁盤等)
GeneralPredicates
	HostName:#檢查Pod對象是否定義了pod.spec.hostname
	PodFitsHostPorts:#pod要能適配node的端口 pods.spec.containers.ports.hostPort(指定綁定在節點的端口上)
	MatchNodeSelector:#檢查節點的NodeSelector的標簽  pods.spec.nodeSelector
	PodFitsResources:#檢查Pod的資源需求是否能被節點所滿足
NoDiskConflict: #檢查Pod依賴的存儲卷是否能滿足需求(默認未使用)
PodToleratesNodeTaints:#檢查Pod上的spec.tolerations可容忍的污點是否完全包含節點上的污點;
PodToleratesNodeNoExecuteTaints:#不能執行(NoExecute)的污點(默認未使用)
CheckNodeLabelPresence:#檢查指定的標簽再上節點是否存在
CheckServiceAffinity:#將相同services相同的pod盡量放在一起(默認未使用)
MaxEBSVolumeCount: #檢查EBS(AWS存儲)存儲卷的最大數量
MaxGCEPDVolumeCount #GCE存儲最大數
MaxAzureDiskVolumeCount: #AzureDisk 存儲最大數
CheckVolumeBinding: #檢查節點上已綁定或未綁定的pvc
NoVolumeZoneConflict: #檢查存儲卷對象與pod是否存在沖突
CheckNodeMemoryPressure:#檢查節點內存是否存在壓力過大
CheckNodePIDPressure:  #檢查節點上的PID數量是否過大
CheckNodeDiskPressure: #檢查內存、磁盤IO是否過大
MatchInterPodAffinity:  #檢查節點是否能滿足pod的親和性或反親和性

  

四、常用的優選函數

LeastRequested:#空閑量越高得分越高
(cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2
BalancedResourceAllocation:#CPU和內存資源被占用率相近的勝出;
NodePreferAvoidPods:  #節點注解信息“scheduler.alpha.kubernetes.io/preferAvoidPods”
TaintToleration:#將Pod對象的spec.tolerations列表項與節點的taints列表項進行匹配度檢查,匹配條目越,得分越低;

SeletorSpreading:#標簽選擇器分散度,(與當前pod對象通選的標簽,所選其它pod越多的得分越低)
InterPodAffinity:#遍歷pod對象的親和性匹配項目,項目越多得分越高
NodeAffinity: #節點親和性 、
MostRequested: #空閑量越小得分越高,和LeastRequested相反 (默認未啟用)
NodeLabel:    #節點是否存在對應的標簽 (默認未啟用)
ImageLocality:#根據滿足當前Pod對象需求的已有鏡像的體積大小之和(默認未啟用)

  

五、高級調度設置方式

1、nodeSelector選擇器

#查看標簽
[root@k8s-m ~]# kubectl get  nodes --show-labels
NAME      STATUS    ROLES     AGE       VERSION   LABELS
k8s-m     Ready     master    120d      v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master=
node1     Ready     <none>    120d      v1.11.2   app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1

#使用nodeSelector選擇器,選擇disk=ssd的node


#查看
[root@k8s-m schedule]# kubectl  get pod  -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP            NODE      NOMINATED NODE
nginx-pod                1/1       Running             0          49s       10.244.1.92   node1     <none>
[root@k8s-m schedule]# cat my-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  nodeSelector:
    disk: ssd

#如果nodeSelector中指定的標簽節點都沒有,該pod就會處於Pending狀態(預選失敗)

  

2、affinity

2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (軟親和,選擇條件匹配多的,就算都不滿足條件,還是會生成pod)

#使用
[root@k8s-m schedule]# cat  my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: test_node1 #標簽鍵名
            operator: In #In表示在
            values:
            - k8s-node1 #test_node1標簽的值
            - test1     #test_node1標簽的值
        weight: 60 #匹配相應nodeSelectorTerm相關聯的權重,1-100

##查看(不存在這個標簽,但是還是創建bin運行了)
[root@k8s-m schedule]# kubectl  get pod  
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             1/1       Running             0          16s

  

2.2、requiredDuringSchedulingIgnoredDuringExecution (硬親和,類似nodeSelector,硬性需求,如果不滿足條件不會調度pod,都不滿足則Pending)

[root@k8s-m schedule]# cat my-affinity-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: test_node1 #標簽鍵名
            operator: In #In表示在
            values:
            - k8s-node1 #test_node1標簽的值
            - test1     #test_node1標簽的值

			
#查看(沒有test_node1這個標簽,所以會Pending)
[root@k8s-m schedule]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             0/1       Pending             0          4s

  

六、pod的親和與反親和性

1、podAffinity:(讓pod和某個pod處於同一地方(同一地方不一定指同一node節點,根據個人使用的標簽定義))

#使用(讓affinity-pod和my-pod1處於同一處)
[root@k8s-m schedule]# cat my-affinity-pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: my-pod1
  labels: 
    app1: my-pod1
     
spec:
  containers:
  - name: my-pod1
    image: nginx
    ports:
    - name: http
      containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app1 #標簽鍵名,上面pod定義
            operator: In #In表示在
            values:
            - my-pod1 #app1標簽的值
        topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod處於同一位置     #此pod應位於同一位置(親和力)或不位於同一位置(反親和力),與pods匹配指定名稱空間中的labelSelector,其中co-located定義為在標簽值為的節點上運行,key topologyKey匹配任何選定pod的任何節點在跑
#查看
[root@k8s-m schedule]# kubectl  get pod   -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP            NODE      NOMINATED NODE
affinity-pod             1/1       Running             0          54s       10.244.1.98   node1     <none>
my-pod1                  1/1       Running             0          54s       10.244.1.97   node1     <none>

  

2、podAntiAffinity(讓pod和某個pod不處於同一node,和上面相反)

[root@k8s-m schedule]# cat  my-affinity-pod3.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: my-pod1
  labels: 
    app1: my-pod1
     
spec:
  containers:
  - name: my-pod1
    image: nginx
    ports:
    - name: http
      containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    podAntiAffinity:  #就改了這里
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app1 #標簽鍵名,上面pod定義
            operator: In #In表示在
            values:
            - my-pod1 #app1標簽的值
        topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一樣代表pod不處於同一位置   

#查看(我自有一台node,所有是Pending狀態)
[root@k8s-m schedule]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             0/1       Pending             0          1m
my-pod1                  1/1       Running             0          1m

  

七、污點調度

taint的effect定義對Pod排斥效果:
NoSchedule:#僅影響調度過程,對現存的Pod對象不產生影響;
NoExecute:#既影響調度過程,也影響現在的Pod對象;不容忍的Pod對象將被驅逐;
PreferNoSchedule: #當沒合適地方運行pod了,也會找地方運行pod

 

1、查看並管理污點

#查看node污點(Taints)
[root@k8s-m schedule]# kubectl  describe  node  k8s-m |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

[root@k8s-m schedule]# kubectl  describe  node  node1 |grep Taints
Taints:             <none>

#管理污點taint
kubectl  taint node  -h

#打污點(給node打標簽)
kubectl  taint    node node1 node-type=PreferNoSchedule:NoSchedule 
#查看
[root@k8s-m schedule]# kubectl  describe  node  node1 |grep Taints
Taints:             node-type=PreferNoSchedule:NoSchedule
#刪除污點
[root@k8s-m ~]# kubectl taint node node1 node-type-
node/node1 untainted
#查看
[root@k8s-m ~]# kubectl describe node node1 |grep Taints
aints:             <none>

  

2、使用污點

#創建pod
[root@k8s-m ~]# cat  mypod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80

#查看pod(Pinding了)
[root@k8s-m ~]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
nginx-pod                0/1       Pending             0          32s

#不能容忍污點
[root@k8s-m ~]# kubectl  describe pod nginx-pod|tail  -1
  Warning  FailedScheduling  3s (x22 over 1m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.


###使用
[root@k8s-m ~]# cat mypod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  tolerations: #容忍的污點
  - key: "node-type" #之前定義的污點名
    operator: "Equal" #Exists,如果node-type污點在,就能容忍,Equal精確
    value: "PreferNoSchedule" #污點值
    effect: "NoSchedule" #效果
    #tolerationSeconds: 3600  #如果被驅逐的話,容忍時間,只能是effect為tolerationSeconds或NoExecute定義

	
#查看(已經調度了)
[root@k8s-m ~]# kubectl  get pod  -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP             NODE      NOMINATED NODE
nginx-pod                1/1       Running             0          3m        10.244.1.100   node1     <none>

  

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM