三種調度粘性,主要根據官方文檔說明:
NodeSelector(定向調度)、NodeAffinity(Node親和性)、PodAffinity(Pod親和性)。
1. nodeSelector
提供簡單的pod部署限制,pod選擇一個或多個node的label部署。
① 給node添加label
kubectl label nodes <node-name> <label-key>=<label-value>
② 為pod添加nodeSelector機制
apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: disktype: ssd
③ 部署pod
2. nodeAffinity
該功能是nodeSelector的改進,現在處於beta階段。
主要的改進有以下幾點:
- 語法更多樣(不僅支持“AND”,)
- 不僅可以指定硬條件,還支持軟條件
- 支持pod親和性
當nodeAffinity成熟的時候,nodeSelector會被廢棄。
requiredDuringSchedulingIgnoredDuringExecution #硬性強制
preferredDuringSchedulingIgnoredDuringExecution #軟性配置
IgnoredDuringExecution 表示 ,如果一個pod所在的節點 在Pod運行期間其標簽發生了改變,不再符合該Pod的節點親和性需求,則系統將忽略Node上Label的變化,該pod繼續在該節點上運行。
如果同時設置了nodeSelector和nodeAffinity,則需要同時滿足才能成為候選者node。
下面看一個例子:
① 該pod只部署在具有label kubernetes.io/e2e-az-name=e2e-az1,kubernetes.io/e2e-az-name=e2e-az2的node上;且會優先選擇具有label another-node-label-key= another-node-label-value的node,當然如果沒有滿足該條件的node,該pod也會部署在其它node上。
② operator支持In, NotIn, Exists, DoesNotExist, Gt, Lt。可以使用NotIn和DoesNotExist實現node的反親和性,或者使用pod taints與tolerations實現。
③ 如果設置了多個nodeSelectorTerms,則只需要匹配其中一個就可以成為候選者node。
④ 如果設置了多個matchExpressions,則需要全部匹配才能成為候選者node。
⑤ weight取值范圍是1-100,對於有多個軟條件的情況時,將匹配了改條件的weight相加,取最大的值為最優先候選者node。
# cat pods/pod-with-node-affinity.yaml
pods/pod-with-node-affinity.yaml
apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: #hard條件必須匹配 nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In #支持In, NotIn, Exists, DoesNotExist, Gt, Lt values: - e2e-az1 - e2e-az2 preferredDuringSchedulingIgnoredDuringExecution: #soft條件優先匹配 - weight: 1 #取值范圍1-100 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value containers: - name: with-node-affinity image: k8s.gcr.io/pause:2.0
3. Inter-pod affinity and anti-affinity (beta feature)
pod親和性與反親和性是根據pod的label挑選scheduler的候選者node,而不是根據node的label。
pod親和性只在一個namespace生效,因為pod具有namespace,所以pod親和性設置隱含了namespace。
topologyKey指示作用域,使用node的label的一個key值表示。
還可以使用一個namespaces列表限定schedulerr調度時查找的pod限定,namespaces放在labelSelector和topologyKey同一層,如:
podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: appname operator: In values: - dbpool-server topologyKey: kubernetes.io/hostname namespaces: #這樣只會查找poa-ea和pletest下面的pod,而不是全部 - poa-ea - pletest
注意:Inter-pod affinity and anti-affinity需要消耗大量計算資源,會增加調度時間。如果node數量超過幾百台的時候不建議使用。
注意:Pod反親和性需要制定topologyKey
下面看一個例子:
① 出於安全考慮,requiredDuringSchedulingIgnoredDuringExecution的anti-affinity,topologyKey不允許為空;
② For requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity, the admission controller LimitPodHardAntiAffinityTopology was introduced to limit topologyKey to kubernetes.io/hostname. If you want to make it available for custom topologies, you may modify the admission controller, or simply disable it.
③ For preferredDuringSchedulingIgnoredDuringExecution pod anti-affinity, empty topologyKey is interpreted as “all topologies” (“all topologies” here is now limited to the combination of kubernetes.io/hostname, failure-domain.beta.kubernetes.io/zone and failure-domain.beta.kubernetes.io/region).
pods/pod-with-pod-affinity.yaml
apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: failure-domain.beta.kubernetes.io/zone podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: kubernetes.io/hostname containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0
4. 使用案例
需求:有一個web-server有3個實例,該web-server會使用到redis做為緩存。先需要將redis調度到和web-server同一個node。
① 部署redis,label app=store保證redis和web-server部署到相同的node
apiVersion: apps/v1 kind: Deployment metadata: name: redis-cache spec: selector: matchLabels: app: store replicas: 3 template: metadata: labels: app: store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: redis-server image: redis:3.2-alpine
② 部署web-server,與redis部署到一起,但是web-server之間不部署到一起。
apiVersion: apps/v1 kind: Deployment metadata: name: web-server spec: selector: matchLabels: app: web-store replicas: 3 template: metadata: labels: app: web-store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-store topologyKey: "kubernetes.io/hostname" podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: web-app image: nginx:1.12-alpine
5. 參考資料
http://blog.51cto.com/newfly/2066630
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity