第十五章 Kubernetes調度器

本文轉載自查看原文 2019-11-21 14:34 279 Linux-運維之k8s篇

一、簡介

Scheduler 是 kubernetes 的調度器，主要的任務是把定義的 pod 分配到集群的節點上。聽起來非常簡單，但有很多要考慮的問題：

①　公平：如何保證每個節點都能被分配資源

②　資源高效利用：集群所有資源最大化被使用

③　效率：調度的性能要好，能夠盡快地對大批量的 pod 完成調度工作

④　靈活：允許用戶根據自己的需求控制調度的邏輯

Scheduler 是作為單獨的程序運行的，啟動之后會一直堅挺 API Server，獲取PodSpec.NodeName為空的 pod，對每個 pod 都會創建一個 binding（必須遵守的），表明該 pod 應該放到哪個節點上

二、調度過程

調度分為幾個部分：

1) 首先是過濾掉不滿足條件的節點，這個過程稱為predicate（預選）；

2) 然后對通過的節點按照優先級排序，這個是priority（優選）；

3) 最后從中選擇優先級最高的節點。

如果中間任何一步驟有錯誤，就直接返回錯誤（先預選，后優選）

Predicate（預選）有一系列的算法可以使用：

①　PodFitsResources：節點上剩余的資源是否大於 pod 請求的資源

②　PodFitsHost：如果 pod 指定了 NodeName，檢查節點名稱是否和 NodeName 匹配

③　PodFitsHostPorts：節點上已經使用的 port 是否和 pod 申請的 port 沖突

④　PodSelectorMatches：過濾掉和 pod 指定的 label 不匹配的節點

⑤　NoDiskConflict：已經 mount 的 volume 和 pod 指定的 volume 不沖突，除非它們都是只讀

如果在 predicate 過程中沒有合適的節點，pod 會一直在pending狀態（pending：等待），不斷重試調度，直到有節點滿足條件。經過這個步驟，如果有多個節點滿足條件，就繼續 priorities 過程：按照優先級大小對節點排序

優先級由一系列鍵值對組成，鍵是該優先級項的名稱，值是它的權重（該項的重要性）。這些優先級選項包括：

LeastRequestedPriority：通過計算 CPU 和 Memory 的使用率來決定權重，使用率越低權重越高。換句話說，這個優先級指標傾向於資源使用比例更低的節點

BalancedResourceAllocation：節點上 CPU 和 Memory 使用率越接近，權重越高。這個應該和上面的一起使用，不應該單獨使用

ImageLocalityPriority：傾向於已經有要使用鏡像的節點，鏡像總大小值越大，權重越高

通過算法對所有的優先級項目和權重進行計算，得出最終的結果

三、自定義調度器

除了 kubernetes 自帶的調度器，你也可以編寫自己的調度器。通過spec:schedulername參數指定調度器的名字，可以為 pod 選擇某個調度器進行調度。比如下面的 pod 選擇my-scheduler進行調度，而不是默認的default-scheduler：

apiVersion: v1

kind: Pod

metadata:

labels:

spec:

schedulername: my-scheduler

containers:

- name: pod-with-second-annotation-container

image: gcr.io/google_containers/pause:2.0

四、節點親和性（pod與node的親和性）

pod.spec.nodeAffinity

preferredDuringSchedulingIgnoredDuringExecution（優先執行計划）：軟策略

requiredDuringSchedulingIgnoredDuringExecution（要求執行計划）：硬策略

preferred：首選，較喜歡

required：需要，必修

requiredDuringSchedulingIgnoredDuringExecution

#節點硬策略。排除node02，只能在node01上運行

apiVersion: v1

kind: Pod

metadata:

labels:

app: node-affinity-pod

spec:

containers:

- name: with-node-affinity

image: hub.atguigu.com/library/myapp:v1

affinity: #親和性

nodeAffinity: #node親和性

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: kubernetes.io/hostname

operator: NotIn #鍵值運算關系，NotIn:label的值不在某個列表中

values:

- k8s-node02

Terms：條件

matchExpressions：匹配表達式

operator：操作人員

preferredDuringSchedulingIgnoredDuringExecution

#r軟策略

apiVersion: v1

kind: Pod

metadata:

labels:

app: node-affinity-pod

spec:

containers:

- name: with-node-affinity

image: hub.atguigu.com/library/myapp:v1

affinity:

nodeAffinity:

preferredDuringSchedulingIgnoredDuringExecution:

- weight: 1 #權重，權重越大越親和(多個軟策略的情況)

preference: matchExpressions:

- key: source

operator: In

values:

- qikqiak

查看：kubectl get node --show-labels

合體

apiVersion: v1

kind: Pod

metadata:

labels:

app: node-affinity-pod

spec:

containers:

- name: with-node-affinity

image: hub.atguigu.com/library/myapp:v1

affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: kubernetes.io/hostname

operator: NotIn

values: - k8s-node02

preferredDuringSchedulingIgnoredDuringExecution:

- weight: 1

preference:

matchExpressions:

- key: source

operator: In

values:

- qikqiak

鍵值運算關系

①　In：label 的值在某個列表中

②　NotIn：label 的值不在某個列表中

③　Gt：label 的值大於某個值

④　Lt：label 的值小於某個值

⑤　Exists：某個 label 存在

⑥　DoesNotExist：某個 label 不存在

五、Pod 親和性（pod與pod之間的親和性）

pod.spec.affinity.podAffinity/podAntiAffinity

l preferredDuringSchedulingIgnoredDuringExecution：軟策略

l requiredDuringSchedulingIgnoredDuringExecution：硬策略

apiVersion: v1

kind: Pod

metadata:

labels:

app: pod-3

spec:

containers:

- name: pod-3

image: hub.atguigu.com/library/myapp:v1

affinity:

podAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector: matchExpressions:

- key: app

operator: In

values: - pod-1

topologyKey: kubernetes.io/hostname

podAntiAffinity:

preferredDuringSchedulingIgnoredDuringExecution:

- weight: 1

podAffinityTerm:

labelSelector:

matchExpressions:

- key: app

operator: In

values:

- pod-2

topologyKey: kubernetes.io/hostname

親和性/反親和性調度策略比較如下：

六、Taint 和 Toleration

節點親和性，是pod的一種屬性（偏好或硬性要求），它使pod被吸引到一類特定的節點。Taint 則相反，它使節點能夠排斥一類特定的 pod

Taint 和 toleration 相互配合，可以用來避免 pod 被分配到不合適的節點上。每個節點上都可以應用一個或多個taint ，這表示對於那些不能容忍這些 taint 的 pod，是不會被該節點接受的。如果將 toleration 應用於 pod上，則表示這些 pod 可以（但不要求）被調度到具有匹配 taint 的節點上

1、污點 ( Taint ) 的組成

使用kubectl taint命令可以給某個 Node 節點設置污點，Node 被設置上污點之后就和 Pod 之間存在了一種相斥的關系，可以讓 Node 拒絕 Pod 的調度執行，甚至將 Node 已經存在的 Pod 驅逐出去每個污點的組成如下：

key=value:effect

每個污點有一個 key 和 value 作為污點的標簽，其中 value 可以為空，effect 描述污點的作用。當前 taint effect 支持如下三個選項：

NoSchedule:K8Snode添加這個effecf類型污點，新的不能容忍的pod不能再調度過來，但是老的運行在node上不受影響

NoExecute：K8Snode添加這個effecf類型污點，新的不能容忍的pod不能調度過來，老的pod也會被驅逐

PreferNoSchedule：pod會嘗試將pod分配到該節點

2、污點的設置、查看和去除

#查看節點污點
kubectl describe node node-name

# 設置污點
kubectl taint nodes node1 key1=value1:NoSchedule
# 節點說明中，查找 Taints 字段
kubectl describe pod pod-name 
# 去除污點
kubectl taint nodes node1 key1=value1:NoSchedule-

pod設置容忍一個污點

apiVersion: apps/v1Beta1
kind: Deployment
metadata:
  name: nginx-deploy
spec:
  replicas: 1
    selector:
      matchLabels:
        app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        images: nginx:laste
        ports:
        - containerPort: 80       
    tolerations:  #containers同級
    - key: "key1"          #能容忍的污點key
      operator: "Equal"    #Equal等於表示key=value ， Exists不等於，表示當值不等於下面value正常
      value: "value1"      #值
      effect: "NoExecute"  #effect策略，見上面
      tolerationSeconds: 3600  #原始的pod多久驅逐，注意只有effect: "NoExecute"才能設置，不然報錯

七、容忍(Tolerations)

設置了污點的 Node 將根據 taint 的 effect：NoSchedule、PreferNoSchedule、NoExecute 和 Pod 之間產生互斥的關系，Pod 將在一定程度上不會被調度到 Node 上。但我們可以在 Pod 上設置容忍 ( Toleration ) ，意思是設置了容忍的 Pod 將可以容忍污點的存在，可以被調度到存在污點的 Node 上

pod.spec.tolerations

tolerations:

- key: "key1"

operator: "Equal"

value: "value1"

effect: "NoSchedule"

tolerationSeconds: 3600

- key: "key1"

operator: "Equal"

value: "value1"

effect: "NoExecute"

- key: "key2"

operator: "Exists"

effect: "NoSchedule"

l 其中 key, vaule, effect 要與 Node 上設置的 taint 保持一致

l operator 的值為 Exists 將會忽略 value 值

l tolerationSeconds 用於描述當 Pod 需要被驅逐時可以在 Pod 上繼續保留運行的時間

1、當不指定 key 值時，表示容忍所有的污點 key：

tolerations:

- operator: "Exists"

2、當不指定 effect 值時，表示容忍所有的污點作用

tolerations:

- key: "key"

operator: "Exists"

3、有多個 Master 存在時，防止資源浪費，可以如下設置

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule

配置節點故障后Pod重新調度的時間

例如下面的配置文件，對於notReady和unreachable狀態的節點，其上的Pod等待300秒，如果仍未恢復，則會停止執行。

八、指定調度節點

1、Pod.spec.nodeName 將 Pod 直接調度到指定的 Node 節點上，會跳過 Scheduler 的調度策略，該匹配規則是強制匹配

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

spec:

replicas: 7

template:

metadata:

labels:

app: myweb

spec:

nodeName: k8s-node01

containers:

- name: myweb

image: hub.atguigu.com/library/myapp:v1

ports:

- containerPort: 80

2、Pod.spec.nodeSelector：通過 kubernetes 的 label-selector 機制選擇節點，由調度器調度策略匹配 label，而后調度 Pod 到目標節點，該匹配規則屬於強制約束

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

spec:

replicas: 2

template:

metadata:

labels:

app: myweb

spec:

nodeSelector:

type: backEndNode1

containers:

- name: myweb

image: harbor/tomcat:8.5-jre8

ports:

- containerPort: 80

鏈接：

https://www.cnblogs.com/cocowool/p/taints_and_tolerations.html

https://www.bilibili.com/video/av66617940/?p=58

NoExecute：K8Snode添加這個effecf類型污點，新的不能容忍的pod不能調度過來，老的pod也會被驅逐

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 第十五章——自編碼器（Autoencoders）算法導論第十五章第十五章：指針類型第十五章：動態QML 第十五章：進程間通信【WPF學習】第十五章 WPF事件第十五章 dubbo結果緩存機制第十五章鏈路聚合基本原理及其基本配置【WPF學習】第三十五章資源字典算法導論第十五章動態規划