k8s之資源限制以及探針檢查


k8s之資源限制以及探針檢查

目錄

一、資源限制

1. 資源限制的使用

當定義Pod時可以選擇性地為每個容器設定所需要的資源數量。最常見的可設定資源是CPU和內存大小,以及其他類型的資源。

2. reuqest資源(請求)和limit資源(約束)

當為Pod中的容器指定了request資源時,調度器就使用該信息來決定將Pod調度到哪個節點上。當還為容器指定了limit資源時,kubelet就會確保運行的容器不會使用超出所設的limit資源量。kubelet還會為容器預留所設的request資源量,供該容器使用。
如果Pod所在的節點具有足夠的可用資源,容器可以使用超過所設置的request資源量。不過,容器不可以使用超出所設置的limit資源量。
如果給容器設置了內存的limit值,但未設置內存的request值,Kubernetes會自動為其設置與內存limit相匹配的request值。類似的,如果給容器設置了CPU的limit值但未設置CPU的request值,則Kubernetes自動為其設置CPU的request值,並使之與CPU的limit值匹配。

3. 官網示例

https://kubernetes.io/zh/docs/concepts/configuration/manage-resources-containers/

4. Pod和容器的資源請求和限制

定義創建容器時預分配的CPU資源
spec.containers[].resources.requests.cpu
定義創建容器時預分配的內存資源
spec.containers[].resources.requests.memory
定義創建容器時預分配的巨頁資源
spec.containers[].resources.requests.hugepages-<size>
定義cpu的資源上限
spec.containers[].resources.limits.cpu
定義內存的資源上限
spec.containers[].resources.limits.memory
定義巨頁的資源上限
spec.containers[].resources.limits.hugepages-<size>

5. 資源類型

CPU 和內存都是資源類型。每種資源類型具有其基本單位。 CPU 表達的是計算處理能力,其單位是 Kubernetes CPUs。 內存的單位是字節。 如果你使用的是 Kubernetes v1.14 或更高版本,則可以指定巨頁(Huge Page)資源。 巨頁是 Linux 特有的功能,節點內核在其中分配的內存塊比默認頁大小大得多。
例如,在默認頁面大小為 4KiB 的系統上,你可以指定約束 hugepages-2Mi: 80Mi。 如果容器嘗試分配 40 個 2MiB 大小的巨頁(總共 80 MiB ),則分配請求會失敗。
說明:
你不能過量使用 hugepages- * 資源。 這與 memory 和 cpu 資源不同。

6. CPU資源單位

CPU資源的request和limit以cpu為單位。kubernetes中的一個cpu相當於1個vCPU(1個超線程)。
Kubernetes也支持帶小數CPU的請求。spec.containers[].resources.requests.cpu為0.5的容器能夠獲得一個cpu的一半CPU資源(類似於cgroup對CPU資源的時間分片)。表達式0.1等價於表達式100m(毫核),表示每1000毫秒內容器可以使用的CPU時間總量為0.1*1000毫秒。

7. 內存資源單位

內存的request和limit以字節為單位。可以用證書表示,
也可以用以10為底數的指數的單位(E、P、T、G、M、K)來表示,
或者以2為底數的指數的單位(Ei、Pi、Ti、Gi、Mi、Ki)來表示。
如1KB=103=1000,1MB=106=1000000=1000KB,1GB=10^9=1000000000=1000MB
1KiB=210=1024,1MiB=220=1048576=1024KiB

PS:在買硬盤的時候,操作系統報的數量要比產品標出或商家號稱的小一些,主要原因是標出的是以MB、GB為單位的,1GB就是1,000,000,000Byte,而操作系統是以2進制為處理單位的,因此檢查硬盤容量時是以MiB、GiB為單位,1GiB=2^30=1,073,741,824Byte。相比較而言,1GiB要比1GB多出73,741,824Byte,所以檢測實際結果要比標出的少一些,單位越大,兩者的差值也就越大。

8. 官方文檔示例

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
  - name: app
    image: images.my-company.example/app:v4
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  - name: log-aggregator
    image: images.my-company.example/log-aggregator:v6
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

此例子中 Pod 有兩個 Container。每個 Container 的請求為 0.25 cpu 和 64MiB(226 字節)內存, 每個容器的資源約束為 0.5 cpu 和 128MiB 內存。 你可以認為該 Pod 的資源請求為 0.5 cpu 和 128 MiB 內存,資源限制為 1 cpu 和 256MiB 內存。

9. 資源限制實操

9.1 編寫yaml資源配置清單

[root@master ~]# mkdir /opt/test
[root@master ~]# cd !$
cd /opt/test
[root@master test]# vim test1.yaml

apiVersion: v1
kind: Pod
metadata:
  name: test1
spec:
  containers:
  - name: web
    image: nginx
    env:
    - name: WEB_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  - name: db
    image: mysql
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

9.2 釋放內存(node節點,以node01為例)

由於mysql對於內存的使用要求比較高,因此需要先檢查內存的可用空間是否能夠滿足mysql的正常運行,若剩余內存不夠,可對其進行釋放操作。

9.2.1 查看內存

free -mH

[root@node01 ~]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           1.9G        1.0G         86M         26M        870M        663M
Swap:            0B          0B          0B

內存總量為1.9G,實際使用1G,因此可有內存應該為0.9G左右。
但是由於有870M的內存被用於緩存,導致了free僅為86M。
86M剩余可用內存顯然是不夠用的,因此需要釋放緩存。

9.2.2 手動釋放緩存

echo [1\2\3] > /proc/sys/vm/drop_caches

[root@node01 ~]# cat /proc/sys/vm/drop_caches
0
[root@node01 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@node01 ~]# free -mh
              total        used        free      shared  buff/cache   available
Mem:           1.9G        968M        770M         26M        245M        754M
Swap:            0B          0B          0B

0:0是系統默認值,默認情況下表示不釋放內存,由操作系統自動管理
1:釋放頁緩存
2:釋放dentries和inodes
3:釋放所有緩存
注意:
如果因為是應用有像內存泄露、溢出的問題,從swap的使用情況是可以比較快速可以判斷的,但free上面反而比較難查看。相反,如果在這個時候,我們告訴用戶,修改系統的一個值,“可以”釋放內存,free就大了。用戶會怎么想?不會覺得操作系統“有問題”嗎?所以說,既然核心是可以快速清空buffer或cache,也不難做到(這從上面的操作中可以明顯看到),但核心並沒有這樣做(默認值是0),我們就不應該隨便去改變它。
一般情況下,應用在系統上穩定運行了,free值也會保持在一個穩定值的,雖然看上去可能比較小。當發生內存不足、應用獲取不到可用內存、OOM錯誤等問題時,還是更應該去分析應用方面的原因,如用戶量太大導致內存不足、發生應用內存溢出等情況,否則,清空buffer,強制騰出free的大小,可能只是把問題給暫時屏蔽了。

9.3 創建資源

kubectl apply -f tets1.yaml

[root@master test]# kubectl apply -f test1.yaml 
pod/test1 created

9.4 跟蹤查看pod狀態

kubectl get pod -o wide -w

[root@master test]# kubectl get pod -o wide -w
NAME    READY   STATUS              RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
test1   0/2     ContainerCreating   0          4s    <none>   node01   <none>           <none>
test1   2/2     Running             0          18s   10.244.1.55   node01   <none>           <none>
test1   1/2     OOMKilled           0          21s   10.244.1.55   node01   <none>           <none>
test1   2/2     Running             1          37s   10.244.1.55   node01   <none>           <none>
test1   1/2     OOMKilled           1          40s   10.244.1.55   node01   <none>           <none>
......

OOM(OverOfMemory)表示服務的運行超過了我們所設定的約束值。
Ready:2/2,status:Running說明該pod已成功創建並運行,但運行過程中發生OOM問題被kubelet殺死並重新拉起新的pod。

9.5 查看容器日志

kubectl logs test1 -c web

[root@master test]# kubectl logs test1 -c web
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/11/06 08:31:23 [notice] 1#1: using the "epoll" event method
2021/11/06 08:31:23 [notice] 1#1: nginx/1.21.3
2021/11/06 08:31:23 [notice] 1#1: built by gcc 8.3.0 (Debian 8.3.0-6) 
2021/11/06 08:31:23 [notice] 1#1: OS: Linux 3.10.0-693.el7.x86_64
2021/11/06 08:31:23 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2021/11/06 08:31:23 [notice] 1#1: start worker processes
2021/11/06 08:31:23 [notice] 1#1: start worker process 31
2021/11/06 08:31:23 [notice] 1#1: start worker process 32

nginx啟動正常,接下來查看mysql日志
kubectl logs test1 -c mysql

[root@master test]# kubectl logs test1 -c db
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.27-1debian10 started.
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.27-1debian10 started.
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Initializing database files
2021-11-06T08:38:44.274783Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.27) initializing of server in progress as process 41
2021-11-06T08:38:44.279965Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-11-06T08:38:44.711420Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-11-06T08:38:45.777355Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-11-06T08:38:45.777389Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-11-06T08:38:45.898121Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
/usr/local/bin/docker-entrypoint.sh: line 191:    41 Killed                  "$@" --initialize-insecure --default-time-zone=SYSTEM

鎖定問題容器為mysql

9.6 刪除pod

kubectl delete -f test1

[root@master test]# kubectl delete -f test1.yaml 
pod "test1" deleted

9.7 修改yaml配置資源清單,提高mysql資源限制

[root@master test]# vim test1.yaml 

apiVersion: v1
kind: Pod
metadata:
  name: test1
spec:
  containers:
  - name: web
    image: nginx
    env:
    - name: WEB_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  - name: db
    image: mysql
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "512Mi"
        cpu: "0.5"
      limits:
        memory: "1024Mi"
        cpu: "1"

9.8 再次創建資源

kubectl apply -f test1.yaml

[root@master test]# kubectl apply -f test1.yaml 
pod/test1 created

9.9 跟蹤查看pod狀態

kubectl get pod -o wide -w

[root@master test]# kubectl get pod -o wide -w
NAME    READY   STATUS              RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
test1   0/2     ContainerCreating   0          12s   <none>   node01   <none>           <none>
test1   2/2     Running             0          18s   10.244.1.56   node01   <none>           <none>

9.10 查看pod詳細信息

kubectl describe pod test1

[root@master test]# kubectl describe pod test1
......
Containers:
  web:
    Container ID:   docker://caf5bef54f878ebba32728b5e43743e36bbdf1457973f3ca130c98de5e1803d3
    Image:          nginx
......
#nginx資源限制
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:     250m
      memory:  64Mi
#nginx環境變量
    Environment:
      WEB_ROOT_PASSWORD:  password
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7lsdx (ro)
  db:
    Container ID:   docker://2574f2bd02d9d7fc5bb0d2b74582b0bece3d8bd37d1d7ff3148ae8109df49367
    Image:          mysql
......
#mysql資源限制
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     500m
      memory:  512Mi
#mysql環境變量
    Environment:
      MYSQL_ROOT_PASSWORD:  password
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7lsdx (ro)
......
#pod創建過程/事件記錄
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  105s  default-scheduler  Successfully assigned default/test1 to node01
  Normal  Pulling    104s  kubelet, node01    Pulling image "nginx"
  Normal  Pulled     103s  kubelet, node01    Successfully pulled image "nginx"
  Normal  Created    103s  kubelet, node01    Created container web
  Normal  Started    103s  kubelet, node01    Started container web
  Normal  Pulling    103s  kubelet, node01    Pulling image "mysql"
  Normal  Pulled     88s   kubelet, node01    Successfully pulled image "mysql"
  Normal  Created    88s   kubelet, node01    Created container db
  Normal  Started    88s   kubelet, node01    Started container db

9.11 查看node資源使用

[root@master test]# kubectl describe node node01
......
  Namespace                  Name                           CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
  ---------                  ----                           ------------  ----------   ---------------  -------------  ---
  default                    test1                          750m (37%)    1500m (75%)  576Mi (30%)      1152Mi (61%)   10m
  kube-system                coredns-bccdc95cf-qrlbp        100m (5%)     0 (0%)       70Mi (3%)        170Mi (9%)     4d21h
  kube-system                kube-flannel-ds-amd64-6927f    100m (5%)     100m (5%)    50Mi (2%)        50Mi (2%)      4d21h
  kube-system                kube-proxy-hjqfc               0 (0%)        0 (0%)       0 (0%)           0 (0%)         4d21h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                950m (47%)   1600m (80%)
  memory             696Mi (36%)  1372Mi (72%)
  ephemeral-storage  0 (0%)       0 (0%)
Events:              <none>

node01的配置為2C2G。
CPU Requests分析:
nginx的requests為250m,mysql的requests為500m,因此node01的CPU Requests為750m,在node01的兩個核中使用占比為37%。
CPU Limits分析:
nginx到的limit為500m,mysql的limit為1,因此node01到的CPU Limits為1500m,在node01的兩個核中使用占比為75%。
Memory Requests分析:
nginx的requests為64Mi,mysql的requests為512Mi,因此node01的內存Requests為576Mi,在node01的2G內存中使用占比為30%。
Memory Limits分析:
nginx的limits為128Mi,mysql的limit為1Gi,因此node01的1152Mi,在node01的2G內存中使用占比為61%。

二、健康檢查

1. 健康檢查的定義


健康檢查又稱為探針(Probe),是由kubelet對容器執行的定期診斷。

2. 探針的三種規則

2.1 livenessProbe存活探針

判斷容器是否正在運行。如果探測失敗,則kubelet會殺死容器,並且容器將根據restartPolicy來設置Pod狀態,如果容器不提供存活探針,則默認狀態為Success。

2.2 readinessProbe就緒探針

判斷容器是否准備好接受請求。如果探測失敗,端點控制器將從與Pod匹配的所有service endpoints中剔除刪除該Pod的IP地址。初始延遲之前的就緒狀態默認為Failure。如果容器不提供就緒探針,則默認狀態為Success。

2.3 startupProbe啟動探針(1.17版本新增)

判斷容器內的應用程序是否已啟動,主要針對於不能確定具體啟動時間的應用。如果匹配了startupProbe探測,則在startupProbe狀態為Success之前,其他所有探針都處於無效狀態,直到它成功后其他探針才起作用。如果startupProbe失敗,kubelet將殺死容器,容器將根據restartPolicy來重啟。如果容器沒有配置startupProbe,則默認狀態為Success。

2.4 同時定義

以上三種規則可同時定義。在readinessProbe檢測成功之前,Pod的running狀態是不會變成ready狀態的。

3. Probe支持的三種檢測方法

3.1 exec

在容器內執行執行命令,如果容器退出時返回碼為0則認為診斷成功。

3.2 tcpSocket

對指定端口上的容器的IP地址進行TCP檢查(三次握手)。如果端口打開,則診斷被認為是成功的。

3.3 httpGet

對指定的端口和路徑上的容器的IP地址執行httpGet請求。如果響應的狀態碼大於等於200且小於400(2xx和3xx),則診斷被認為是成功的。

4. 探測結果

每次探測都將獲得以下三種結果之一:
● 成功:容器通過了診斷
● 失敗:容器未通過診斷
● 未知:診斷失敗,因此不會采取任何行動

5. 官方文檔

文檔鏈接:
https://kubernetes.io/zh/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

6. exec方式

6.1 官方示例1

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

initalDeploySeconds:指定kubelet在執行第一次探測前應該等待5秒,即第一次探測是在容器啟動后的第6秒才開始執行。默認是0秒,最小值是0。
periodSeconds:指定了kubelet應該每5秒執行一次存活探測。默認是10秒,最小值是1
補充:
failureThreshold:當探測失敗時,Kubernetes將在放棄之前重試的次數。存活探測情況下的放棄就意味着重新啟動容器,就緒探測情況下的放棄Pod會被打上未就緒的標簽。默認值是3,最小值是1。
timeoutSeconds:探測的超時后等待多少秒。默認值是1秒,最小值是1。(在Kubernetes 1.20版本之前,exec探針會忽略timeoutSeconds,探針會無限期地持續運行,甚至可能超過所配置的限期,直到返回結果為止。)

在這個配置文件中,可以看到 Pod 中只有一個容器。 periodSeconds 字段指定了 kubelet 應該每 5 秒執行一次存活探測。 initialDelaySeconds 字段告訴 kubelet 在執行第一次探測前應該等待 5 秒。 kubelet 在容器內執行命令 cat /tmp/healthy 來進行探測。 如果命令執行成功並且返回值為 0,kubelet 就會認為這個容器是健康存活的。 如果這個命令返回非 0 值,kubelet 會殺死這個容器並重新啟動它。

6.2 編寫yaml資源配置清單

[root@master test]# vim exec.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  namespace: default
spec:
  containers:
  - name: liveness-exec-container
    image: busybox
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh","-c","touch /tmp/live; sleep 30; rm -rf /tmp/live; sleep 3600"]
    livenessProbe:
      exec:
        command: ["test","-e","/tmp/live"]
      initialDelaySeconds: 1
      periodSeconds: 3

在這個配置文件中,可以看到Pod只有一個容器。
容器中的command字段表示創建一個/tmp/live文件后休眠30秒,休眠結束后刪除該文件,並休眠10分鍾。
僅使用livenessProbe存活探針,並使用exec檢查方式,對/tmp/live文件進行存活檢測。
initialDelaySeconds字段表示kubelet在執行第一次探測前應該等待1秒。
periodSeconds字段表示kubelet每隔3秒執行一次存活探測。

6.3 創建資源

kubectl create -f exec.yaml

[root@master test]# kubectl create -f exec.yaml
pod/liveness-exec created

6.4 跟蹤查看pod狀態

kubectl get pod -o wide -w

[root@master ~]# kubectl get pod -o wide -w
liveness-exec   0/1   Pending   0     0s    <none>   <none>   <none>   <none>
liveness-exec   0/1   Pending   0     0s    <none>   node01   <none>   <none>
liveness-exec   0/1   ContainerCreating   0     0s    <none>   node01   <none>   <none>
liveness-exec   1/1   Running             0     2s    10.244.1.62   node01   <none>   <none>
liveness-exec   1/1   Running             1     68s   10.244.1.62   node01   <none>   <none>

發現68秒時容器重啟

6.5 查看pod事件描述

kubectl describe pod liveness-exec

[root@master test]# kubectl describe pod liveness-exec

......
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  67s                default-scheduler  Successfully assigned default/liveness-exec to node01
  Normal   Started    66s                kubelet, node01    Started container liveness-exec-container
  Warning  Unhealthy  30s (x3 over 36s)  kubelet, node01    Liveness probe failed:
  Normal   Killing    30s                kubelet, node01    Container liveness-exec-container failed liveness probe, will be restarted
  Normal   Pulled     0s (x2 over 67s)   kubelet, node01    Container image "busybox" already present on machine
  Normal   Created    0s (x2 over 67s)   kubelet, node01    Created container liveness-exec-container

在容器啟動37秒時,健康檢查三次失敗(倒推第一次檢查在31秒),kubelet啟動了killing程序並在67秒時拉取鏡像創建新的容器,在68秒時完成第一次容器重啟

7. httpGet方式

7.1 官網示例2

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3

在這個配置文件中,可以看到Pod只有一個容器。initialDealySeconds字段告訴kubelet再執行第一次探測前應該等待3秒。preiodSeconds字段指定了kubelet每隔3秒執行一次存活探測。kubelet會向容器內運行的服務(服務會監聽8080端口)發送一個認為容器是健康存活的。如果處理程序返回失敗代碼,則kubelet會殺死這個容器並且重新啟動它。
任何大於或等於200並且小於400的返回代碼標示成功,其他返回代碼都標示失敗。

7.2 編寫yaml資源配置清單

[root@master test]# vim httpget.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-httpget
  namespace: default
spec:
  containers:
  - name: liveness-httpget-container
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: nginx
      containerPort: 80
    livenessProbe:
      httpGet:
        port: nginx
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
      timeoutSeconds: 10

7.3 創建資源

kubectl create -f httpget.yaml

[root@master test]# kubectl create -f httpget.yaml 
pod/liveness-httpget created

kubectl get pod

[root@master test]# kubectl get pod
NAME               READY   STATUS    RESTARTS   AGE
liveness-httpget   1/1     Running   0          6s

7.4 刪除Pod的index.html文件

kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html

[root@master test]# kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html

7.5 查看pod狀態

kubectl get pod -w

[root@master test]# kubectl get pod -w
NAME               READY   STATUS    RESTARTS   AGE
liveness-httpget   1/1     Running   0          5m35s
liveness-httpget   1/1     Running   1          5m37s

容器發生重啟

7.6 查看容器事件

kubectl describe pod liveness-httpget

[root@master ~]# kubectl describe pod liveness-httpget
......
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  5m47s                default-scheduler  Successfully assigned default/liveness-httpget to node01
  Normal   Pulled     11s (x2 over 5m46s)  kubelet, node01    Container image "nginx" already present on machine
  Normal   Created    11s (x2 over 5m46s)  kubelet, node01    Created container liveness-httpget-container
  Normal   Started    11s (x2 over 5m46s)  kubelet, node01    Started container liveness-httpget-container
  Warning  Unhealthy  11s (x3 over 17s)    kubelet, node01    Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    11s                  kubelet, node01    Container liveness-httpget-container failed liveness probe, will be restarted

重啟原因是HTTP探測得到的狀態返回碼是404,HTTP probe failed with statuscode: 404。
重啟完成后,不會再次重啟,因為重新拉取的鏡像中包含了index.html文件。

8. tcpSocket方式

8.1 官方示例

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

這個例子同時使用readinessProbe和livenessProbe探測。kubelet會在容器啟動5秒后發送第一個readiness探測。這會嘗試連接goproxy容器的8080端口。如果探測成功,kubelet將繼續每隔10秒運行一次檢測。除了readinessProbe探測,這個配置包括了一個livenessProbe探測。kubelet會在容器啟動15秒后進行第一次livenessProbe探測。就像readinessProbe探測一樣,會嘗試連接goproxy容器的8080端口。如果livenessProbe探測失敗,這個容器會被重新啟動。

8.2 編寫yaml資源配置清單

[root@master test]# vim tcpsocket.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket
spec:
  containers:
  - name: liveness-tcpsocket-container
    image: nginx
    livenessProbe:
      initialDelaySeconds: 5
      timeoutSeconds: 1
      tcpSocket:
        port: 8080
      periodSeconds: 3

8.3 創建資源

kubectl apply -f tcpsocket.yaml

[root@master test]# kubectl apply -f tcpsocket.yaml 
pod/liveness-tcpsocket created

8.4 跟蹤查看pod狀態

kubectl get pod -w

[root@master test]# kubectl get pod -w
NAME                 READY   STATUS              RESTARTS   AGE
liveness-tcpsocket   0/1     ContainerCreating   0          6s
liveness-tcpsocket   1/1     Running             0          17s
liveness-tcpsocket   1/1     Running             1          44s
liveness-tcpsocket   1/1     Running             2          71s

pod異常重啟

8.5 查看pod事件

kubectl describe pod liveness-tcpsocket

[root@master test]# kubectl describe pod liveness-tcpsocket

......
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  93s                default-scheduler  Successfully assigned default/liveness-tcpsocket to node01
  Normal   Pulled     23s (x3 over 77s)  kubelet, node01    Successfully pulled image "nginx"
  Normal   Created    23s (x3 over 77s)  kubelet, node01    Created container liveness-tcpsocket-container
  Normal   Started    23s (x3 over 77s)  kubelet, node01    Started container liveness-tcpsocket-container
  Normal   Pulling    11s (x4 over 92s)  kubelet, node01    Pulling image "nginx"
  Warning  Unhealthy  11s (x9 over 71s)  kubelet, node01    Liveness probe failed: dial tcp 10.244.1.65:8080: connect: connection refused
  Normal   Killing    11s (x3 over 65s)  kubelet, node01    Container liveness-tcpsocket-container failed liveness probe, will be restarted

重啟原因是nginx使用的默認端口為80,8080端口的健康檢查被拒絕訪問

8.5 刪除pod

kubectl delete -f tcpsocket.yaml

8.6 修改tcpSocket端口

[root@master test]# vim tcpsocket.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket
spec:
  containers:
  - name: liveness-tcpsocket-container
    image: nginx
    livenessProbe:
      initialDelaySeconds: 5
      timeoutSeconds: 1
      tcpSocket:
#修改端口為80
        port: 80
      periodSeconds: 3

8.7 再次創建資源

kubectl apply -f tcpsocket.yaml

[root@master test]# kubectl apply -f tcpsocket.yaml 
pod/liveness-tcpsocket created

8.8 跟蹤查看pod狀態

kubectl get pod -o wide -w

[root@master ~]# kubectl get pod -o wide -w
NAME                 READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
liveness-tcpsocket   1/1     Running   0          21s   10.244.1.66   node01   <none>           <none>

啟動正常,並未出現重啟

8.9 查看pod事件

kubectl describe pod liveness-tcpsocket

[root@master test]# kubectl describe pod liveness-tcpsocket

......
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  33s   default-scheduler  Successfully assigned default/liveness-tcpsocket to node01
  Normal  Pulling    32s   kubelet, node01    Pulling image "nginx"
  Normal  Pulled     17s   kubelet, node01    Successfully pulled image "nginx"
  Normal  Created    17s   kubelet, node01    Created container liveness-tcpsocket-container
  Normal  Started    17s   kubelet, node01    Started container liveness-tcpsocket-container

啟動正常

9. readinessProbe就緒探針1

9.1 編寫yaml資源配置清單

[root@master test]# vim readiness-httpget.yaml

apiVersion: v1
kind: Pod
metadata:
  name: readiness-httpget
  namespace: default
spec:
  containers:
  - name: readiness-httpget-container
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: 80
#注意,這里設置個錯誤地址
        path: /index1.html
      initialDelaySeconds: 1
      periodSeconds: 3
    livenessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
      timeoutSeconds: 10

9.2 創建資源

kubectl apply -f readiness-httpget.yaml

[root@master test]# kubectl apply -f readiness-httpget.yaml
pod/readiness-httpget created

9.3 查看pod狀態

kubectl get pod

[root@master test]# kubectl get pod
NAME                READY   STATUS    RESTARTS   AGE
readiness-httpget   0/1     Running   0          25s

STATUS為Running,但無法進入READY狀態

9.4 查看pod事件

kubectl describe pod readiness-httpget

[root@master test]# kubectl describe pod readiness-httpget
......
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  119s                 default-scheduler  Successfully assigned default/readiness-httpget to node01
  Normal   Pulled     119s                 kubelet, node01    Container image "nginx" already present on machine
  Normal   Created    119s                 kubelet, node01    Created container readiness-httpget-container
  Normal   Started    119s                 kubelet, node01    Started container readiness-httpget-container
  Warning  Unhealthy  54s (x22 over 117s)  kubelet, node01    Readiness probe failed: HTTP probe failed with statuscode: 404

異常原因為readinessProbe檢測的狀態返回值為404,kubelet阻止pod進入READY狀態

9.5 查看日志

kubectl logs readiness-httpget

[root@master test]# kubectl logs readiness-httpget
......
2021/11/07 16:40:41 [error] 32#32: *164 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.1.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.1.68:80"
10.244.1.1 - - [07/Nov/2021:16:40:41 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
10.244.1.1 - - [07/Nov/2021:16:40:43 +0000] "GET /index.html HTTP/1.1" 200 615 "-" "kube-probe/1.15" "-"

9.6 為容器創建index1.html

kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html

[root@master test]# kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html

9.7 查看容器狀態

kubectl get pod

[root@master test]# kubectl get pod
NAME                READY   STATUS    RESTARTS   AGE
readiness-httpget   1/1     Running   0          15m

10. readinessProbe就緒探針2

10.1 編寫yaml資源配置清單

[root@master test]# cat readiness-multi-nginx.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx1
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx2
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx3
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
spec:
#service通過selector綁定到nginx集群中
  selector:
    app: nginx
  type: ClusterIP
  ports:
  - name: http
    port: 80
    targetPort: 80

10.2 創建資源

kubectl apply -f readiness-multi-nginx.yaml

[root@master test]# kubectl apply -f readiness-multi-nginx.yaml
pod/nginx1 created
pod/nginx2 created
pod/nginx3 created
service/nginx-svc created

10.3 查看pod,service狀態

kubectl get pod,svc -o wide

[root@master test]# kubectl get pod,svc -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
pod/nginx1   1/1     Running   0          22s   10.244.1.69   node01   <none>           <none>
pod/nginx2   1/1     Running   0          22s   10.244.2.31   node02   <none>           <none>
pod/nginx3   1/1     Running   0          22s   10.244.1.70   node01   <none>           <none>

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE    SELECTOR
service/kubernetes   ClusterIP   10.1.0.1      <none>        443/TCP   3d3h   <none>
service/nginx-svc    ClusterIP   10.1.177.18   <none>        80/TCP    22s    app=nginx

運行成功

10.4 刪除nginx1中的index.html

kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html

[root@master test]# kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html

10.5 查看pod狀態

kubectl get pod -o wide -w

[root@master test]# kubectl get pod -o wide -w
NAME     READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
nginx1   1/1     Running   0          3m41s   10.244.1.69   node01   <none>           <none>
nginx2   1/1     Running   0          3m41s   10.244.2.31   node02   <none>           <none>
nginx3   1/1     Running   0          3m41s   10.244.1.70   node01   <none>           <none>
nginx1   0/1     Running   0          3m43s   10.244.1.69   node01   <none>           <none>

nginx1的READY狀態變為0/1

10.6 查看pod事件

kubectl describe pod nginx1

[root@master test]# kubectl describe pod nginx1
......
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  4m13s             default-scheduler  Successfully assigned default/nginx1 to node01
  Normal   Pulled     4m12s             kubelet, node01    Container image "nginx" already present on machine
  Normal   Created    4m12s             kubelet, node01    Created container nginx
  Normal   Started    4m12s             kubelet, node01    Started container nginx
  Warning  Unhealthy  0s (x9 over 40s)  kubelet, node01    Readiness probe failed: HTTP probe failed with statuscode: 404

由於httpGet檢測到的狀態返回碼為404,所以readinessProbe失敗,kubelet將其設定為noready狀態。

10.7 查看service詳情

kubectl describe svc nginx-svc

[root@master test]# kubectl describe svc nginx-svc
Name:              nginx-svc
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-svc","namespace":"default"},"spec":{"ports":[{"name":"http"...
Selector:          app=nginx
Type:              ClusterIP
IP:                10.1.177.18
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         10.244.1.70:80,10.244.2.31:80
Session Affinity:  None
Events:            <none>

nginx1被剔除出了service的終端列表

10.8 查看終端

kubectl get endpoints

[root@master test]# kubectl get endpoints
NAME         ENDPOINTS                       AGE
kubernetes   192.168.122.10:6443             3d3h
nginx-svc    10.244.1.70:80,10.244.2.31:80   9m34s

終端中無nginx1

三. 啟動、退出動作

1. 編寫yaml資源配置清單

[root@master test]# vim post.yaml

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-test
spec:
  containers:
  - name: lifecycle-test-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","echo Hello from the postStart handler >> /var/log/nginx/message"]
      preStop:
        exec:
          command: ["/bin/sh","-c","echo Hello from the postStop handler >> /var/log/nginx/message"]
    volumeMounts:
    - name: message-log
      mountPath: /var/log/nginx/
      readOnly: false
  initContainers:
  - name: init-nginx
    image: nginx
    command: ["/bin/sh","-c","echo 'Hello initContainers' >> /var/log/nginx/message"]
    volumeMounts: 
    - name: message-log
      mountPath: /var/log/nginx/
      readOnly: false
  volumes:
  - name: message-log
    hostPath:
      path: /data/volumes/nginx/log/
      type: DirectoryOrCreate

2. 創建資源

kubectl apply -f post.yaml

[root@master test]# kubectl apply -f post.yaml
pod/lifesycle-test created

3. 跟蹤查看pod狀態

kubectl get pod -o wide -w

[root@master test]# kubectl get pod -o wide -w
NAME             READY   STATUS     RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
lifecycle-test   0/1     Init:0/1   0          5s    <none>   node01   <none>           <none>
lifecycle-test   0/1     PodInitializing   0          17s   10.244.1.73   node01   <none>           <none>
lifecycle-test   1/1     Running           0          19s   10.244.1.73   node01   <none>           <none>

4. 查看pod事件

kubectl describe po lifecycle-test

[root@master test]# kubectl describe po lifecycle-test
......
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  46s   default-scheduler  Successfully assigned default/lifecycle-test to node01
  Normal  Pulling    45s   kubelet, node01    Pulling image "nginx"
  Normal  Pulled     30s   kubelet, node01    Successfully pulled image "nginx"
  Normal  Created    30s   kubelet, node01    Created container init-nginx
  Normal  Started    30s   kubelet, node01    Started container init-nginx
  Normal  Pulling    29s   kubelet, node01    Pulling image "nginx"
  Normal  Pulled     27s   kubelet, node01    Successfully pulled image "nginx"
  Normal  Created    27s   kubelet, node01    Created container lifecycle-test-container
  Normal  Started    27s   kubelet, node01    Started container lifecycle-test-container

5. 查看容器日志

kubectl exec -it lifecycle-test -- cat /var/log/nginx/message

[root@master test]# kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
Hello initContainers
Hello from the postStart handler

由上可知,init容器先執行,然后當一個主容器啟動后,kubernetes將立即發送postStart事件。

6. 關閉容器后查看節點掛載文件

kubectl delete -f post.yaml

[root@master test]# kubectl delete -f post.yaml
pod "lifecycle-test" deleted

node01節點

[root@node01 ~]# cat /data/volumes/nginx/log/message 
Hello initContainers
Hello from the postStart handler
Hello from the postStop handler

由上可知,當在容器被終結之前,kubernetes將發送一個preStop事件。

7. 重新創建資源,查看容器日志

kubectl apply -f post.yaml

[root@master test]# kubectl apply -f post.yaml
pod/lifesycle-test created

kubectl exec -it lifecycle-test -- cat /var/log/nginx/message

[root@master test]# kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
Hello initContainers
Hello from the postStart handler
Hello from the postStop handler
Hello initContainers
Hello from the postStart handler

四、總結

1. 探針

探針分為3種
1.livenessProbe(存活探針)∶判斷容器是否正常運行,如果失敗則殺掉容器(不是pod),再根據重啟策略是否重啟容器
2.readinessProbe(就緒探針)∶判斷容器是否能夠進入ready狀態,探針失敗則進入noready狀態,並從service的endpoints中剔除此容器
3.startupProbe∶判斷容器內的應用是否啟動成功,在success狀態前,其它探針都處於無效狀態

2. 檢查方式

檢查方式分為3種
1.exec∶使用 command 字段設置命令,在容器中執行此命令,如果命令返回狀態碼為0,則認為探測成功
2.httpget∶通過訪問指定端口和url路徑執行http get訪問。如果返回的http狀態碼為大於等於200且小於400則認為成功
3.tcpsocket∶通過tcp連接pod(IP)和指定端口,如果端口無誤且tcp連接成功,則認為探測成功

3. 常用的探針可選參數

常用的探針可選參數有4個
1.initialDelaySeconds∶ 容器啟動多少秒后開始執行探測
2.periodSeconds∶探測的周期頻率,每多少秒執行一次探測
3.failureThreshold∶探測失敗后,允許再試幾次
4.timeoutSeconds ∶ 探測等待超時的時間


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM