RC(ReplicationController)只能選擇一個標簽,RS(ReplicaSet)可選擇多個標簽,例如APPTest發布了v1和v2兩個版本,並希望副本數為3,可同時包含v1和v2兩個版本的Pod
selector:
matchLabels:
version: v2
matchExpressions:
- {key: version, operator: IN, values: [v1,v2]
1.Deployment或RC/RS:全自動調度
Deployment或RC/RS功能:自動完成一個容器應用的多份副本部署、版本更新、回滾,以及持續維持指定的副本數
# nginx-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: test spec: replicas: 3 # 這個RS創建三個副本 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80
2.nodeSelector:定向調度
將Pod調度到指定的Node上,可通過Lable和pod的nodeSelector屬性
1.首先通過kubectl label命令給目標node打標簽 kubectl label nodes <node-name> <label-key>=<label-value> 例子: kubectl label nodes work01 zone=frontend # 為work01節點打上了一個zone=frontend的標簽,表明他時“frontend”節點 2.然后在Pod定義文件中加入nodeSelector設備 apiVersion: v1 kind: ReplicaSet metadata: name: nginx-test labels: name: nginx-test spec: replicas: 1 selector: name: nginx-test template: metadata: labels: nginx-test spec: conditions: - name: nginx image: nginx:latest port: - containerPort: 80 nodeSelector: # 調度到擁有zone=frontend簽的node zone: frontend
3.nodeAffinity:Node親和性調度
包含兩種節點親和性表達:
1)requiredDuringSchedulingIgnoredDuringExecution
必須滿足指定規則才能調度Pod到Node上,是硬限制
2)preferredDuringSchedulingIgnoredDuringExecutionpr
調度Pod到Node上按指定規則的優先級,但不強求,是軟限制,多個優先級可設置weight權重值,以定義執行的先后順序
注:IgnoredDuringExecutionpr含有是,如果一個Pod所在的節點在Pod運行期間標簽發生變更,不再符合該Pod的節點親和性需求,則系統將忽略Node上的Label變化,該Pod繼續在該節點運行
apiVersion: v1 kind: Pod metadata: name: busybox-test spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/arch operator: In values: - amd64 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disk-type operator: In values: - sshd containers: - name: busybox-test image: busybox:latest operator選項包括IN/NotIn/Exists/DoesNotExist/Gt/Lt運算關系
IN:label的值在某個列表中
NotIN:label的值不在某個列表中
Exists:某個label存在
DoesNotExist:某個label不存在
Gt:label的值大於某個值
Lt:label的值小於某個值
- 如果同是定義nodeSelector和nodeAffinity則必須兩個條件都滿足才能調度到Node上
- 如果nodeAffinity中有多個nodeSelectorTerms,則其中一個條件匹配即可調度Pod
- 如果nodeSelectorTerms中有多個matchExpressions,則一個節點滿足所有matchExpressions條件才能調度Pod
4.podAffinity:Pod親和與互斥調度策略
親和與互斥調度策略是通過X軸和Y軸定義的條件互相親和或互斥進行調度Pod的,Node的標簽定義在X軸,Pod匹配條件定義在Y軸
X軸定義值:
可用節點名、機架、區域概念定義node,這個值是topologyKey值,其值包括kubernetes.io/hostname、failure-domain.beta.kubernets.io/zone(通常是同一IDC的不同區域)和failure-domain.beta.kubernets.io/region(通常是不同位置的IDC)三個選項
Y軸定義值:
- Pod親和與互斥的條件設置值包括requiredDuringSchedulingIgnoredDuringExecution和preferredDuringSchedulingIgnoredDuringExecution兩個
- Pod間的親和性在spec.affinity字段下的podAffinity中定義
- Pod間的互斥性在spec.affinity字段下的podAntiAffinity中定義
例一,Pod親和性調度(podAffinity)
創建第一個Pod,定義親和條件,標簽是security=A1,app=busybox apiVersion: v1 kind: Pod metadata: name: busybox-one labels: security: "A1" app: "busybox" spec: containers: - name: busybox-one image: busybox:latest 創建第二個Pod,定義親和條件,標簽是security=A1,topologKey值為"kubernetes.io/hostname" apiVersion: v1 kind: Pod metadata: name: busybox-two spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - A1 topologyKey: kubernetes.io/hostname containers: - name: busybox-two image: busybox:latest
使用kubectl get pods -o wide命令查看,兩個Pod運行在同一個node上
例二,Pod互斥性調度(podAntiAffinity)
創建第一個Pod,定義標簽是security=A1,app=busybox apiVersion: v1 kind: Pod metadata: name: busybox-one labels: security: "A1" app: "busybox" spec: containers: - name: busybox-test image: busybox:latest 創建第二個Pod,定義親和標簽是security=A1,互斥條件topologKey值為"failure-domain.beta.kubernets.io/zone" apiVersion: v1 kind: Pod metadata: name: busybox-two spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - A1 topologyKey: failure-domain.beta.kubernets.io/zone podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - bosybox topologyKey: kubernetes.io/hostname containers: - name: busybox-two image: busybox:latest
使用kubectl get pods -o wide命令查看,兩個Pod運行在同一個zone里,但不在同一個node上
注:
- requiredDuringSchedulingIgnoredDuringExecution中定義的topologyKey值不能為空
- preferredDuringSchedulingIgnoredDuringExecution中定義的topologyKey可為空,但空值被解釋為kubernetes.io/hostname、failure-domain.beta.kubernets.io/zone和failure-domain.beta.kubernets.io/region三者的組合
- podAffinity親和性也可使用namespace值,如果namespace設置為"",表示所有namespace
5.Taints和Tolerations(污點和容忍)
Taints需和Toleration配合使用,可讓node拒絕pod的運行,可以一個node上設置一個或多個taint,除非pod聲明能容忍這些污點,否則無法在這些node上運行,tolerations是pod的屬性,讓pod能運行在標注了taint的node上
例:kubectl taint命令設置work01的taint信息為不參與調度,Pod上聲明tolerations可容忍work01的污點,並在其上運行
kubectl taint nodes work01 key=value:NoSchedule key的value值可設置為NoSchedule/PreferNoSchedule/NoExecute NoSchedule:調度器不會把pod調度到這個node,硬限制 PreferNoSchedule:調度器嘗試不把pod調度到這個node,軟限制 NoExecute:沒有設置tolerations的pod被驅逐;配置了tolerations的pod,沒有tolerationSeconds
則一只運行;配置了配置了tolerations的pod,且指定了tolerationSeconds在指定時間后驅逐; tolerations: - key: "key" # 此值設置需與taint的key設置一致 operator: "Equal" # 與value值相等 value: "value" effect: "NoSchedule" # 此值設置需與taint的value設置一致 或者 tolerations: - key: "key" operator: "Exists" # 表示無須指定value effect: "NoSchedule" # 此值如設置為PreferNoSchedule,則表示軟限制
如果不指定operator,則默認值為Equal 空的key配合Exists能夠匹配所有的鍵和值 空的effect匹配所有的effect值
K8s處理多個Taint和Toleration的順序:先列出節點中所有的Taint,然后忽略Pod的toleration能夠匹配的部分,剩下未忽略的taint就對pod起限制效果
例:對一個node設置多個taint,在一個pod里設置多個toleration
kubectl taint nodes work01 key1=value1:NoSchedule kubectl taint nodes work01 key1=value1:NoExecute kubectl taint nodes work01 key1=value2:NoSchedule tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule" - key: "key1" operator: "Equal" value: "value1" effect: "NoExecute" tolerationSeconds: 3600 # pod可以在taint添加到node后還能在這個node上運行3600s后被驅逐
如果該pod已經在該node上運行時設置第3個taint,它不會被驅逐,因為pod可容忍前兩個taint
1)如果想拿出一部分節點專供特定應用使用,可將這些節點設置為獨立節點
kubectl taint nodes <nodename> dedicated=<groupname>:NoSchedule
然后把這些應用的pod加入對應的toleration,如此有合適toleration的pod就會被允許使用該有taint設置的節點
2)將對有特定硬件需求的pod調度到有特殊硬件節點
kubectl taint nodes <nodename> special=true:NoSchedule kubectl taint nodes <nodename> special=true:PreferNoSchedule
然后在需要特定硬件的pod加入對應的toleration,如此有合適toleration的pod就會被允許使用有該taint設置的節點
6.Pod Priority Preemption:Pod優先級調度
搶占調度策略分為:Eviction(驅逐)和Preemption(搶占)
Eviction:是kubelet執行的行為,當一個node發生資源不足時,該節點的kubelet進程會根據Pod優先級、資源申請量和實際使用量等信息決定驅逐哪些pod,pod優先級相同時,資源占用量最大的pod會被首先驅逐。
Preemption:是scheduler執行的行為,當一個新pod因資源無法滿足而不能不調度時,scheduler會優先驅逐低優先級的pod。
示例:
首先定義PriorithClass,它不屬於任何namespace apiVersion: scheduling.k8s.io/v1beta1 kind: PriorithClass metadata: name: high-priority # 優先級類別 value: 50000 # 數字越大優先級越高 globalDefault: false 在pod中引用優先級的類別 apiVersion: v1 kind: Pod metadata: name: busybox labels: env: test spec: containers: - name: busybox image: busybox:latest priorityClassName: high-priority
注:高優先級的pod在調度過程中,初始預調度N節點上優先級低的pod在驅逐過程中,如果有新節點能滿足高優先級pod的需求,就會把它調度到新節點上,不非得調度到初始預判的N節點上,如N節點在驅逐低優先級pod時出現了比預調度pod更高優先級的pod,則會優先調度優先級最高的pod
7.Job批處理調度
Job批處理任務分為三種工作模式:
a、Job Template Expansion模式:一個Job對象的對應一個批處理的work item(工作項)
首先定義一個Job模版job.yaml.txt apiVersion: batch/v1 kind: Job metadata: name: work-item-$ITEM labels: jobgroup: jobexample spec: template: metadata: name: jobexample labels: jobgroup: jobexample spec: containers: - name: busybox image: busybox:latest command: ["sh", "-c", "echo the item $ITEM" && sleep 3"] restartPolicy: Never 生成3個對應的Job定義文件並創建Job # for i in ont two three > do > cat job.yaml.txt | sed "s\/$ITEM/$i/" >./jobs/job-$i.yaml > done # ls jobs job-one.yaml job-two.yaml job-three.yaml # kubectl create -f jobs # kubectl get jobs -l jobgroup=jobexample
b、Quene with Pod Per Work Item模式:一個任務隊列存放work item,一個job對象作為consumer去完成這些work item,Job會啟動N個Pod,每個Pod對應一 個work item
c、Queue with Variable Pod Count模式:和Quene with Pod Per Work Item模式相似,但此模式Job啟動的Pod數量是可變的
8.CronJob:定時任務
CronJob格式
Minuts | Hours | Day of Month | Month | Day of Week | Year |
示例:
# cron.yaml 創建一個名為hello的cronjob apiVersion: batch/v1 kind: CronJob metadata: name: hello spec: schedule: "*/10 * * * *" # 每隔10分鍾執行一次 jobTemplate: spec: template: spec: containers: - name: hello image: busybox:latest command: ["sh", "-c", "date;echo Hello from the K8s cluster"] restartPolicy: OnFailure # 每隔10分鍾執行以下命令查看任務狀態 kubectl get cronjob hello kubectl get jobs --watch # 使用以下命令刪除名為hello的cronjob kubectl delete cronjob hello
9.Init Container(初始化容器)
在啟動容器之前做初始化操作,如:關聯組件正確運行(數據庫🔥某個后台服務);基於環境變量或配置模版生成配置文件;從遠程數據庫獲取本地所需配置,或將自身注冊到某個中央數據庫中;下載依賴包,或對系統進行一些預配置操作。
示例:在啟動Nginx容器前,通過初始化容器,使用busybox為nginx創建一個index.html的主頁文件
apiVersion: v1 kind: Pod metadata: name: nginx annotations: spec: initContainers: # 使用busybox從百度首頁下載一個index.html文件作為nginx初始化主頁 - name: CreateHtml image: busybox:latest command: - wget - "-O" - "/website/index.html" - https://www.baidu.com volumeMounts: - name: website mountPath: "/website" containers: - name: nginx image: nginx:latest ports: - containerPort: 80 volumeMounts: - name: website mountPath: /usr/share/nginx/html dnsPolicy: Default volumes: - name: website emptyDir: {}
- 如果設置了多事init container將按順序逐個運行,所有init container都運行完后,才會開始創建和運行應用容器
- 在init container的定義中可以設置資源限制、Volume的使用和安全策略等,在多個init container都定義了資源限制時,則取最大的值作為所有init container的資源限制值
- init container不能設置readinessProbe探針