Kubernetes对Pod调度指定Node以及Node的Taint 和 Toleration

本文转载自查看原文 2021-11-23 21:03 875 K8S

Taint 和 Toleration

1. 在不同机房
2. 在不同城市
3. 不一样的配置，比如ssd盘，cpu类型

实现方法
1、NodeSelect
比如在拥有某些特性的机器打上标签
Gpu-server: true
ssd-server: true
Normal-server: true

2、污点和容忍
Taint在一类服务器打上污点，让不能容忍这个污点的pod不能部署在打了污点的服务器上。

节点亲和性，是 pod 的一种属性（偏好或硬性要求），它使 pod 被吸引到一类特定的节点。Taint 则相反，它使节点能够排斥一类特定的 pod

Taint 和 toleration 相互配合，可以用来避免 pod 被分配到不合适的节点上。每个节点上都可以应用一个或多个 taint ，这表示对于那些不能容忍这些 taint 的 pod，是不会被该节点接受的。如果将 toleration 应用于 pod 上，则表示这些 pod 可以（但不要求）被调度到具有匹配 taint 的节点上

污点(Taint)

污点 ( Taint ) 的组成

使用 kubectl taint 命令可以给某个 Node 节点设置污点，Node 被设置上污点之后就和 Pod 之间存在了一种相斥的关系，可以让 Node 拒绝 Pod 的调度执行，甚至将 Node 已经存在的 Pod 驱逐出去

每个污点有一个 key 和 value 作为污点的标签，其中 value 可以为空，effect 描述污点的作用。

如果node有多个taint，pdo需要容忍多有的key:value effcet才可以调度到这个节点。

当前 taint effect 支持如下三个选项：

NoSchedule：只有拥有和这个 taint 相匹配的 toleration 的 pod 才能够被分配到这个节点。

PreferNoSchedule：系统会尽量避免将 pod 调度到存在其不能容忍 taint 的节点上，但这不是强制的。

NoExecute ：任何不能忍受这个 taint 的 pod 都会马上被驱逐，任何可以忍受这个 taint 的 pod 都不会被驱逐。Pod可指定属性 tolerationSeconds 的值，表示pod 还能继续在节点上运行的时间。

tolerations:
- key: "key1" operator: "Equal" value: "value1" effect: "NoExecute" tolerationSeconds: 3600 // pod 还能在这个节点上继续运行这个指定的时间长度

给节点增加一个taint(污点)，它的key是<key>，value是<value>，effect是NoSchedule

kubectl taint nodes <node_name> <key>=<value>:NoSchedule

删除节点上的taint

kubectl taint nodes node1 key:NoSchedule-

容忍(Tolerations)

设置了污点的 Node 将根据 taint 的 effect：NoSchedule、PreferNoSchedule、NoExecute 和 Pod 之间产生互斥的关系，Pod 将在一定程度上不会被调度到 Node 上。但我们可以在 Pod 上设置容忍 ( Toleration ) ，意思是设置了容忍的 Pod 将可以容忍污点的存在，可以被调度到存在污点的 Node 上

例如，在 Pod Spec 中定义 pod 的 toleration：
operator:Equal 会比较key和value
operator:Exists 只要含有key就会容忍该污点

tolerations:
- key: "key" operator: "Equal" value: "value" # 精确匹配 effect: "NoSchedule"

tolerations:
- key: "key" operator: "Exists" effect: "NoSchedule" # 只匹配key和effect

容忍所有含污点的node

tolerations:
- operator: "Exists"

容忍所有key相同的，忽视effect

tolerations:
- key: "key" operator: "Exists"

有多个 Master 存在时，防止资源浪费，可以如下设置

kubectl taint nodes Node-Name node-role.kubernetes.io/master=:PreferNoSchedule

2.3. 使用场景

2.3.1. 专用节点

kubectl taint nodes <nodename> dedicated=<groupName>:NoSchedule

先给Node添加taint，然后给Pod添加相对应的 toleration，则该Pod可调度到taint的Node，也可调度到其他节点。

如果想让Pod只调度某些节点且某些节点只接受对应的Pod，则需要在Node上添加Label（例如：dedicated=groupName），同时给Pod的nodeSelector添加对应的Label。

2.3.2. 特殊硬件节点

如果某些节点配置了特殊硬件（例如CPU），希望不使用这些特殊硬件的Pod不被调度该Node，以便保留必要资源。即可给Node设置taint和label，同时给Pod设置toleration和label来使得这些Node专门被指定Pod使用。

kubectl taint
kubectl taint nodes nodename special=true:NoSchedule

或者

kubectl taint nodes nodename special=true:PreferNoSchedule

2.3.3. 基于taint驱逐

effect 值 NoExecute ，它会影响已经在节点上运行的 pod，即根据策略对Pod进行驱逐。

如果 pod 不能忍受effect 值为 NoExecute 的 taint，那么 pod 将马上被驱逐
如果 pod 能够忍受effect 值为 NoExecute 的 taint，但是在 toleration 定义中没有指定 tolerationSeconds，则 pod 还会一直在这个节点上运行。
如果 pod 能够忍受effect 值为 NoExecute 的 taint，而且指定了 tolerationSeconds，则 pod 还能在这个节点上继续运行这个指定的时间长度。

指定pod到指定的node上

1. 获取到该节点的label信息

]# kubectl get node -A --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master Ready master 326d v1.17.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
node1 Ready <none> 326d v1.17.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
node2 Ready <none> 326d v1.17.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

2. 也可通过自己设置label

]# kubectl label nodes node1 project=linux40
node/node1 labeled

]# kubectl get node -A --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master Ready master 326d v1.17.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
node1 Ready <none> 326d v1.17.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux,project=linux40
node2 Ready <none> 326d v1.17.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux

3. 在配置文件spec下面添加

spec:
nodeSelector:
project: linux40

4. 删除自定义node label

]# kubectl  label nodes node1 project-
node/node1 labeled

应用yaml文件后发现部分pod处于pending状态，于是describe了一下pod发现Events：提示 Insufficient cpu,意思是说一个匹配的node节点cpu资源不足导致pod未调度成功。

Pod.spec.nodeName

将 Pod 直接调度到指定的 Node 节点上，会跳过 Scheduler 的调度策略，该匹配规则是强制匹配

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 7
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeName: k8s-node01    //强制调度到node01
      containers:
      - name: myweb
        image: wangyanglinux/myapp:v1
        ports:
        - containerPort: 80

Pod.spec.nodeSelector

通过 kubernetes 的 label-selector 机制选择节点，由调度器调度策略匹配 label，而后调度 Pod 到目标节点，该匹配规则属于强制约束

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeSelector:
        type: backEndNode1   //node标签
      containers:
      - name: myweb
        image: harbor/tomcat:8.5-jre8
        ports:
        - containerPort: 80

查看pod详情

[root@master-1 ~]# kubectl  describe pod -n linux39         magedu-nginx-deployment-84c4cb9fdd-ssw27
Name:         magedu-nginx-deployment-84c4cb9fdd-ssw27
Namespace:    linux39
Priority:     0
Node:         node1/192.168.64.113
Start Time:   Sat, 07 May 2022 20:03:17 +0800
Labels:       app=magedu-nginx-selector
              pod-template-hash=84c4cb9fdd
Annotations:  cni.projectcalico.org/podIP: 100.66.209.206/32
              cni.projectcalico.org/podIPs: 100.66.209.206/32
Status:       Running
IP:           100.66.209.206
IPs:
  IP:           100.66.209.206
Controlled By:  ReplicaSet/magedu-nginx-deployment-84c4cb9fdd
Containers:
  magedu-nginx-container:
    Container ID:   docker://58619204579f6522eeead49823659334d9e9feb58f9e2712b49a693cc9c14cc8
    Image:          nginx-web:v1
    Image ID:       docker://sha256:8874c3873369c14572f5cfd9e9ee49bb012eca927f1ac4024c787caf7a3bcb38
    Ports:          80/TCP, 443/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Sun, 15 May 2022 21:38:42 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 15 May 2022 21:38:12 +0800
      Finished:     Sun, 15 May 2022 21:38:27 +0800
    Ready:          True
    Restart Count:  6
    Limits:
      cpu:     1
      memory:  512Mi
    Requests:
      cpu:     200m
      memory:  246Mi
    Environment:
      password:  123456
      age:       20
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bpkjb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-bpkjb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bpkjb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s   # 当节点宕机不可用，驱逐等待时常
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s  # 当节点无法访问驱逐时常
Events:          <none>

Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

# 当节点宕机不可用，驱逐等待时间，比如因为节点网络状态震荡导致未及时通过kubelet将节点状态报告给apiserver，那么master认为节点nodeready，给节点打不可调度污点，任何不能容忍这个污点的pod 马上被驱逐，加上缓冲时间可以降低因为节点震荡导致的暂时性失联。

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s # 当节点无法访问驱逐时常

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 【K8s概念】污点（Taint）和容忍度（Toleration） Kubernetes Pod调度失败问题(Insufficient pods) kubernetes Pod资源调度之污点和容忍 kubernetes Pod资源调度之优先(抢占)调度浅入kubernetes(3)：namespace、node、pod k8s调度-指定node kubernetes 利用label标签来绑定到特定node运行pod Kubernetes 同一个 Node 节点内的 Pod 不能通过 Service 互访 Kubernetes调整Node节点快速驱逐pod的时间 Kubernetes之POD