k8s之資源限制以及探針檢查
一、資源限制
1. 資源限制的使用
當定義Pod時可以選擇性地為每個容器設定所需要的資源數量。最常見的可設定資源是CPU和內存大小,以及其他類型的資源。
2. reuqest資源(請求)和limit資源(約束)
當為Pod中的容器指定了request資源時,調度器就使用該信息來決定將Pod調度到哪個節點上。當還為容器指定了limit資源時,kubelet就會確保運行的容器不會使用超出所設的limit資源量。kubelet還會為容器預留所設的request資源量,供該容器使用。
如果Pod所在的節點具有足夠的可用資源,容器可以使用超過所設置的request資源量。不過,容器不可以使用超出所設置的limit資源量。
如果給容器設置了內存的limit值,但未設置內存的request值,Kubernetes會自動為其設置與內存limit相匹配的request值。類似的,如果給容器設置了CPU的limit值但未設置CPU的request值,則Kubernetes自動為其設置CPU的request值,並使之與CPU的limit值匹配。
3. 官網示例
https://kubernetes.io/zh/docs/concepts/configuration/manage-resources-containers/
4. Pod和容器的資源請求和限制
定義創建容器時預分配的CPU資源 | |
spec.containers[].resources.requests.cpu | |
定義創建容器時預分配的內存資源 | |
spec.containers[].resources.requests.memory | |
定義創建容器時預分配的巨頁資源 | |
spec.containers[].resources.requests.hugepages-<size> | |
定義cpu的資源上限 | |
spec.containers[].resources.limits.cpu | |
定義內存的資源上限 | |
spec.containers[].resources.limits.memory | |
定義巨頁的資源上限 | |
spec.containers[].resources.limits.hugepages-<size> |
5. 資源類型
CPU 和內存都是資源類型。每種資源類型具有其基本單位。 CPU 表達的是計算處理能力,其單位是 Kubernetes CPUs。 內存的單位是字節。 如果你使用的是 Kubernetes v1.14 或更高版本,則可以指定巨頁(Huge Page)資源。 巨頁是 Linux 特有的功能,節點內核在其中分配的內存塊比默認頁大小大得多。
例如,在默認頁面大小為 4KiB 的系統上,你可以指定約束 hugepages-2Mi: 80Mi。 如果容器嘗試分配 40 個 2MiB 大小的巨頁(總共 80 MiB ),則分配請求會失敗。
說明:
你不能過量使用 hugepages- * 資源。 這與 memory 和 cpu 資源不同。
6. CPU資源單位
CPU資源的request和limit以cpu為單位。kubernetes中的一個cpu相當於1個vCPU(1個超線程)。
Kubernetes也支持帶小數CPU的請求。spec.containers[].resources.requests.cpu為0.5的容器能夠獲得一個cpu的一半CPU資源(類似於cgroup對CPU資源的時間分片)。表達式0.1等價於表達式100m(毫核),表示每1000毫秒內容器可以使用的CPU時間總量為0.1*1000毫秒。
7. 內存資源單位
內存的request和limit以字節為單位。可以用證書表示,
也可以用以10為底數的指數的單位(E、P、T、G、M、K)來表示,
或者以2為底數的指數的單位(Ei、Pi、Ti、Gi、Mi、Ki)來表示。
如1KB=103=1000,1MB=106=1000000=1000KB,1GB=10^9=1000000000=1000MB
1KiB=210=1024,1MiB=220=1048576=1024KiB
PS:在買硬盤的時候,操作系統報的數量要比產品標出或商家號稱的小一些,主要原因是標出的是以MB、GB為單位的,1GB就是1,000,000,000Byte,而操作系統是以2進制為處理單位的,因此檢查硬盤容量時是以MiB、GiB為單位,1GiB=2^30=1,073,741,824Byte。相比較而言,1GiB要比1GB多出73,741,824Byte,所以檢測實際結果要比標出的少一些,單位越大,兩者的差值也就越大。
8. 官方文檔示例
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: frontend | |
spec: | |
containers: | |
- name: app | |
image: images.my-company.example/app:v4 | |
env: | |
- name: MYSQL_ROOT_PASSWORD | |
value: "password" | |
resources: | |
requests: | |
memory: "64Mi" | |
cpu: "250m" | |
limits: | |
memory: "128Mi" | |
cpu: "500m" | |
- name: log-aggregator | |
image: images.my-company.example/log-aggregator:v6 | |
resources: | |
requests: | |
memory: "64Mi" | |
cpu: "250m" | |
limits: | |
memory: "128Mi" | |
cpu: "500m" |
此例子中 Pod 有兩個 Container。每個 Container 的請求為 0.25 cpu 和 64MiB(226 字節)內存, 每個容器的資源約束為 0.5 cpu 和 128MiB 內存。 你可以認為該 Pod 的資源請求為 0.5 cpu 和 128 MiB 內存,資源限制為 1 cpu 和 256MiB 內存。
9. 資源限制實操
9.1 編寫yaml資源配置清單
[root@master ~]# mkdir /opt/test | |
[root@master ~]# cd !$ | |
cd /opt/test | |
[root@master test]# vim test1.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: test1 | |
spec: | |
containers: | |
- name: web | |
image: nginx | |
env: | |
- name: WEB_ROOT_PASSWORD | |
value: "password" | |
resources: | |
requests: | |
memory: "64Mi" | |
cpu: "250m" | |
limits: | |
memory: "128Mi" | |
cpu: "500m" | |
- name: db | |
image: mysql | |
env: | |
- name: MYSQL_ROOT_PASSWORD | |
value: "password" | |
resources: | |
requests: | |
memory: "64Mi" | |
cpu: "250m" | |
limits: | |
memory: "128Mi" | |
cpu: "500m" |
9.2 釋放內存(node節點,以node01為例)
由於mysql對於內存的使用要求比較高,因此需要先檢查內存的可用空間是否能夠滿足mysql的正常運行,若剩余內存不夠,可對其進行釋放操作。
9.2.1 查看內存
free -mH
[root@node01 ~]# free -mh | |
total used free shared buff/cache available | |
Mem: 1.9G 1.0G 86M 26M 870M 663M | |
Swap: 0B 0B 0B |
內存總量為1.9G,實際使用1G,因此可有內存應該為0.9G左右。
但是由於有870M的內存被用於緩存,導致了free僅為86M。
86M剩余可用內存顯然是不夠用的,因此需要釋放緩存。
9.2.2 手動釋放緩存
echo [1\2\3] > /proc/sys/vm/drop_caches
[root@node01 ~]# cat /proc/sys/vm/drop_caches | |
0 | |
[root@node01 ~]# echo 3 > /proc/sys/vm/drop_caches | |
[root@node01 ~]# free -mh | |
total used free shared buff/cache available | |
Mem: 1.9G 968M 770M 26M 245M 754M | |
Swap: 0B 0B 0B |
0:0是系統默認值,默認情況下表示不釋放內存,由操作系統自動管理
1:釋放頁緩存
2:釋放dentries和inodes
3:釋放所有緩存
注意:
如果因為是應用有像內存泄露、溢出的問題,從swap的使用情況是可以比較快速可以判斷的,但free上面反而比較難查看。相反,如果在這個時候,我們告訴用戶,修改系統的一個值,“可以”釋放內存,free就大了。用戶會怎么想?不會覺得操作系統“有問題”嗎?所以說,既然核心是可以快速清空buffer或cache,也不難做到(這從上面的操作中可以明顯看到),但核心並沒有這樣做(默認值是0),我們就不應該隨便去改變它。
一般情況下,應用在系統上穩定運行了,free值也會保持在一個穩定值的,雖然看上去可能比較小。當發生內存不足、應用獲取不到可用內存、OOM錯誤等問題時,還是更應該去分析應用方面的原因,如用戶量太大導致內存不足、發生應用內存溢出等情況,否則,清空buffer,強制騰出free的大小,可能只是把問題給暫時屏蔽了。
9.3 創建資源
kubectl apply -f tets1.yaml
[root@master test]# kubectl apply -f test1.yaml | |
pod/test1 created |
9.4 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w | |
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES | |
test1 0/2 ContainerCreating 0 4s <none> node01 <none> <none> | |
test1 2/2 Running 0 18s 10.244.1.55 node01 <none> <none> | |
test1 1/2 OOMKilled 0 21s 10.244.1.55 node01 <none> <none> | |
test1 2/2 Running 1 37s 10.244.1.55 node01 <none> <none> | |
test1 1/2 OOMKilled 1 40s 10.244.1.55 node01 <none> <none> | |
...... |
OOM(OverOfMemory)表示服務的運行超過了我們所設定的約束值。
Ready:2/2,status:Running說明該pod已成功創建並運行,但運行過程中發生OOM問題被kubelet殺死並重新拉起新的pod。
9.5 查看容器日志
kubectl logs test1 -c web
[root@master test]# kubectl logs test1 -c web | |
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration | |
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/ | |
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh | |
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf | |
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf | |
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh | |
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh | |
/docker-entrypoint.sh: Configuration complete; ready for start up | |
2021/11/06 08:31:23 [notice] 1#1: using the "epoll" event method | |
2021/11/06 08:31:23 [notice] 1#1: nginx/1.21.3 | |
2021/11/06 08:31:23 [notice] 1#1: built by gcc 8.3.0 (Debian 8.3.0-6) | |
2021/11/06 08:31:23 [notice] 1#1: OS: Linux 3.10.0-693.el7.x86_64 | |
2021/11/06 08:31:23 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576 | |
2021/11/06 08:31:23 [notice] 1#1: start worker processes | |
2021/11/06 08:31:23 [notice] 1#1: start worker process 31 | |
2021/11/06 08:31:23 [notice] 1#1: start worker process 32 |
nginx啟動正常,接下來查看mysql日志
kubectl logs test1 -c mysql
[root@master test]# kubectl logs test1 -c db | |
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.27-1debian10 started. | |
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' | |
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.27-1debian10 started. | |
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Initializing database files | |
2021-11-06T08:38:44.274783Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.27) initializing of server in progress as process 41 | |
2021-11-06T08:38:44.279965Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. | |
2021-11-06T08:38:44.711420Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. | |
2021-11-06T08:38:45.777355Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main | |
2021-11-06T08:38:45.777389Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main | |
2021-11-06T08:38:45.898121Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option. | |
/usr/local/bin/docker-entrypoint.sh: line 191: 41 Killed "$@" --initialize-insecure --default-time-zone=SYSTEM |
鎖定問題容器為mysql
9.6 刪除pod
kubectl delete -f test1
[root@master test]# kubectl delete -f test1.yaml | |
pod "test1" deleted |
9.7 修改yaml配置資源清單,提高mysql資源限制
[root@master test]# vim test1.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: test1 | |
spec: | |
containers: | |
- name: web | |
image: nginx | |
env: | |
- name: WEB_ROOT_PASSWORD | |
value: "password" | |
resources: | |
requests: | |
memory: "64Mi" | |
cpu: "250m" | |
limits: | |
memory: "128Mi" | |
cpu: "500m" | |
- name: db | |
image: mysql | |
env: | |
- name: MYSQL_ROOT_PASSWORD | |
value: "password" | |
resources: | |
requests: | |
memory: "512Mi" | |
cpu: "0.5" | |
limits: | |
memory: "1024Mi" | |
cpu: "1" |
9.8 再次創建資源
kubectl apply -f test1.yaml
[root@master test]# kubectl apply -f test1.yaml | |
pod/test1 created |
9.9 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w | |
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES | |
test1 0/2 ContainerCreating 0 12s <none> node01 <none> <none> | |
test1 2/2 Running 0 18s 10.244.1.56 node01 <none> <none> |
9.10 查看pod詳細信息
kubectl describe pod test1
...... | |
Containers: | |
web: | |
Container ID: docker://caf5bef54f878ebba32728b5e43743e36bbdf1457973f3ca130c98de5e1803d3 | |
Image: nginx | |
...... | |
#nginx資源限制 | |
Limits: | |
cpu: 500m | |
memory: 128Mi | |
Requests: | |
cpu: 250m | |
memory: 64Mi | |
#nginx環境變量 | |
Environment: | |
WEB_ROOT_PASSWORD: password | |
Mounts: | |
db: | |
Container ID: docker://2574f2bd02d9d7fc5bb0d2b74582b0bece3d8bd37d1d7ff3148ae8109df49367 | |
Image: mysql | |
...... | |
#mysql資源限制 | |
Limits: | |
cpu: 1 | |
memory: 1Gi | |
Requests: | |
cpu: 500m | |
memory: 512Mi | |
#mysql環境變量 | |
Environment: | |
MYSQL_ROOT_PASSWORD: password | |
Mounts: | |
...... | |
#pod創建過程/事件記錄 | |
Events: | |
Type Reason Age From Message | |
Normal Scheduled 105s default-scheduler Successfully assigned default/test1 to node01 | |
Normal Pulling 104s kubelet, node01 Pulling image "nginx" | |
Normal Pulled 103s kubelet, node01 Successfully pulled image "nginx" | |
Normal Created 103s kubelet, node01 Created container web | |
Normal Started 103s kubelet, node01 Started container web | |
Normal Pulling 103s kubelet, node01 Pulling image "mysql" | |
Normal Pulled 88s kubelet, node01 Successfully pulled image "mysql" | |
Normal Created 88s kubelet, node01 Created container db | |
Normal Started 88s kubelet, node01 Started container db |
9.11 查看node資源使用
[root@master test]# kubectl describe node node01 | |
...... | |
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE | |
--------- ---- ------------ ---------- --------------- ------------- --- | |
default test1 750m (37%) 1500m (75%) 576Mi (30%) 1152Mi (61%) 10m | |
kube-system coredns-bccdc95cf-qrlbp 100m (5%) 0 (0%) 70Mi (3%) 170Mi (9%) 4d21h | |
kube-system kube-flannel-ds-amd64-6927f 100m (5%) 100m (5%) 50Mi (2%) 50Mi (2%) 4d21h | |
kube-system kube-proxy-hjqfc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d21h | |
Allocated resources: | |
(Total limits may be over 100 percent, i.e., overcommitted.) | |
Resource Requests Limits | |
-------- -------- ------ | |
cpu 950m (47%) 1600m (80%) | |
memory 696Mi (36%) 1372Mi (72%) | |
ephemeral-storage 0 (0%) 0 (0%) | |
Events: <none> |
node01的配置為2C2G。
CPU Requests分析:
nginx的requests為250m,mysql的requests為500m,因此node01的CPU Requests為750m,在node01的兩個核中使用占比為37%。
CPU Limits分析:
nginx到的limit為500m,mysql的limit為1,因此node01到的CPU Limits為1500m,在node01的兩個核中使用占比為75%。
Memory Requests分析:
nginx的requests為64Mi,mysql的requests為512Mi,因此node01的內存Requests為576Mi,在node01的2G內存中使用占比為30%。
Memory Limits分析:
nginx的limits為128Mi,mysql的limit為1Gi,因此node01的1152Mi,在node01的2G內存中使用占比為61%。
二、健康檢查
1. 健康檢查的定義
健康檢查又稱為探針(Probe),是由kubelet對容器執行的定期診斷。
2. 探針的三種規則
2.1 livenessProbe存活探針
判斷容器是否正在運行。如果探測失敗,則kubelet會殺死容器,並且容器將根據restartPolicy來設置Pod狀態,如果容器不提供存活探針,則默認狀態為Success。
2.2 readinessProbe就緒探針
判斷容器是否准備好接受請求。如果探測失敗,端點控制器將從與Pod匹配的所有service endpoints中剔除刪除該Pod的IP地址。初始延遲之前的就緒狀態默認為Failure。如果容器不提供就緒探針,則默認狀態為Success。
2.3 startupProbe啟動探針(1.17版本新增)
判斷容器內的應用程序是否已啟動,主要針對於不能確定具體啟動時間的應用。如果匹配了startupProbe探測,則在startupProbe狀態為Success之前,其他所有探針都處於無效狀態,直到它成功后其他探針才起作用。如果startupProbe失敗,kubelet將殺死容器,容器將根據restartPolicy來重啟。如果容器沒有配置startupProbe,則默認狀態為Success。
2.4 同時定義
以上三種規則可同時定義。在readinessProbe檢測成功之前,Pod的running狀態是不會變成reasy狀態的。
3. Probe支持的三種檢測方法
3.1 exec
在容器內執行執行命令,如果容器退出時返回碼為0則認為診斷成功。
3.2 tcpSocket
對指定端口上的容器的IP地址進行TCP檢查(三次握手)。如果端口打開,則診斷被認為是成功的。
3.3 httpGet
對指定的端口和路徑上的容器的IP地址執行httpGet請求。如果響應的狀態碼大於等於200且小於400(2xx和3xx),則診斷被認為是成功的。
4. 探測結果
每次探測都將獲得以下三種結果之一:
● 成功:容器通過了診斷
● 失敗:容器未通過診斷
● 未知:診斷失敗,因此不會采取任何行動
5. 官方文檔
6. exec方式
6.1 官方示例1
apiVersion: v1 | |
kind: Pod | |
metadata: | |
labels: | |
test: liveness | |
name: liveness-exec | |
spec: | |
containers: | |
- name: liveness | |
image: k8s.gcr.io/busybox | |
args: | |
- /bin/sh | |
- -c | |
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 | |
livenessProbe: | |
exec: | |
command: | |
- cat | |
- /tmp/healthy | |
initialDelaySeconds: 5 | |
periodSeconds: 5 |
initalDeploySeconds:指定kubelet在執行第一次探測前應該等待5秒,即第一次探測是在容器啟動后的第6秒才開始執行。默認是0秒,最小值是0。
periodSeconds:指定了kubelet應該每5秒執行一次存活探測。默認是10秒,最小值是1
補充:
failureThreshold:當探測失敗時,Kubernetes將在放棄之前重試的次數。存活探測情況下的放棄就意味着重新啟動容器,就緒探測情況下的放棄Pod會被打上未就緒的標簽。默認值是3,最小值是1。
timeoutSeconds:探測的超時后等待多少秒。默認值是1秒,最小值是1。(在Kubernetes 1.20版本之前,exec探針會忽略timeoutSeconds,探針會無限期地持續運行,甚至可能超過所配置的限期,直到返回結果為止。)
在這個配置文件中,可以看到 Pod 中只有一個容器。 periodSeconds 字段指定了 kubelet 應該每 5 秒執行一次存活探測。 initialDelaySeconds 字段告訴 kubelet 在執行第一次探測前應該等待 5 秒。 kubelet 在容器內執行命令 cat /tmp/healthy 來進行探測。 如果命令執行成功並且返回值為 0,kubelet 就會認為這個容器是健康存活的。 如果這個命令返回非 0 值,kubelet 會殺死這個容器並重新啟動它。
6.2 編寫yaml資源配置清單
[root@master test]# vim exec.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: liveness-exec | |
namespace: default | |
spec: | |
containers: | |
- name: liveness-exec-container | |
image: busybox | |
imagePullPolicy: IfNotPresent | |
command: ["/bin/sh","-c","touch /tmp/live; sleep 30; rm -rf /tmp/live; sleep 3600"] | |
livenessProbe: | |
exec: | |
command: ["test","-e","/tmp/live"] | |
initialDelaySeconds: 1 | |
periodSeconds: 3 |
在這個配置文件中,可以看到Pod只有一個容器。
容器中的command字段表示創建一個/tmp/live文件后休眠30秒,休眠結束后刪除該文件,並休眠10分鍾。
僅使用livenessProbe存活探針,並使用exec檢查方式,對/tmp/live文件進行存活檢測。
initialDelaySeconds字段表示kubelet在執行第一次探測前應該等待1秒。
periodSeconds字段表示kubelet每隔3秒執行一次存活探測。
6.3 創建資源
kubectl create -f exec.yaml
[root@master test]# kubectl create -f exec.yaml | |
pod/liveness-exec created |
6.4 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master ~]# kubectl get pod -o wide -w | |
liveness-exec 0/1 Pending 0 0s <none> <none> <none> <none> | |
liveness-exec 0/1 Pending 0 0s <none> node01 <none> <none> | |
liveness-exec 0/1 ContainerCreating 0 0s <none> node01 <none> <none> | |
liveness-exec 1/1 Running 0 2s 10.244.1.62 node01 <none> <none> | |
liveness-exec 1/1 Running 1 68s 10.244.1.62 node01 <none> <none> |
發現68秒時容器重啟
6.5 查看pod事件描述
kubectl describe pod liveness-exec
...... | |
Events: | |
Type Reason Age From Message | |
Normal Scheduled 67s default-scheduler Successfully assigned default/liveness-exec to node01 | |
Normal Started 66s kubelet, node01 Started container liveness-exec-container | |
Warning Unhealthy 30s (x3 over 36s) kubelet, node01 Liveness probe failed: | |
Normal Killing 30s kubelet, node01 Container liveness-exec-container failed liveness probe, will be restarted | |
Normal Pulled 0s (x2 over 67s) kubelet, node01 Container image "busybox" already present on machine | |
Normal Created 0s (x2 over 67s) kubelet, node01 Created container liveness-exec-container |
在容器啟動37秒時,健康檢查三次失敗(倒推第一次檢查在31秒),kubelet啟動了killing程序並在67秒時拉取鏡像創建新的容器,在68秒時完成第一次容器重啟
7. httpGet方式
7.1 官網示例2
apiVersion: v1 | |
kind: Pod | |
metadata: | |
labels: | |
test: liveness | |
name: liveness-http | |
spec: | |
containers: | |
- name: liveness | |
image: k8s.gcr.io/liveness | |
args: | |
- /server | |
livenessProbe: | |
httpGet: | |
path: /healthz | |
port: 8080 | |
httpHeaders: | |
- name: Custom-Header | |
value: Awesome | |
initialDelaySeconds: 3 | |
periodSeconds: 3 |
在這個配置文件中,可以看到Pod只有一個容器。initialDealySeconds字段告訴kubelet再執行第一次探測前應該等待3秒。preiodSeconds字段指定了kubelet每隔3秒執行一次存活探測。kubelet會向容器內運行的服務(服務會監聽8080端口)發送一個認為容器是健康存活的。如果處理程序返回失敗代碼,則kubelet會殺死這個容器並且重新啟動它。
任何大於或等於200並且小於400的返回代碼標示成功,其他返回代碼都標示失敗。
7.2 編寫yaml資源配置清單
[root@master test]# vim httpget.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: liveness-httpget | |
namespace: default | |
spec: | |
containers: | |
- name: liveness-httpget-container | |
image: nginx | |
imagePullPolicy: IfNotPresent | |
ports: | |
- name: nginx | |
containerPort: 80 | |
livenessProbe: | |
httpGet: | |
port: nginx | |
path: /index.html | |
initialDelaySeconds: 1 | |
periodSeconds: 3 | |
timeoutSeconds: 10 |
7.3 創建資源
kubectl create -f httpget.yaml
[root@master test]# kubectl create -f httpget.yaml | |
pod/liveness-httpget created |
kubectl get pod
[root@master test]# kubectl get pod | |
NAME READY STATUS RESTARTS AGE | |
liveness-httpget 1/1 Running 0 6s |
7.4 刪除Pod的index.html文件
kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html
[root@master test]# kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html
7.5 查看pod狀態
kubectl get pod -w
[root@master test]# kubectl get pod -w | |
NAME READY STATUS RESTARTS AGE | |
liveness-httpget 1/1 Running 0 5m35s | |
liveness-httpget 1/1 Running 1 5m37s |
容器發生重啟
7.6 查看容器事件
kubectl describe pod liveness-httpget
[root@master ~]# kubectl describe pod liveness-httpget | |
...... | |
Events: | |
Type Reason Age From Message | |
---- ------ ---- ---- ------- | |
Normal Scheduled 5m47s default-scheduler Successfully assigned default/liveness-httpget to node01 | |
Normal Pulled 11s (x2 over 5m46s) kubelet, node01 Container image "nginx" already present on machine | |
Normal Created 11s (x2 over 5m46s) kubelet, node01 Created container liveness-httpget-container | |
Normal Started 11s (x2 over 5m46s) kubelet, node01 Started container liveness-httpget-container | |
Warning Unhealthy 11s (x3 over 17s) kubelet, node01 Liveness probe failed: HTTP probe failed with statuscode: 404 | |
Normal Killing 11s kubelet, node01 Container liveness-httpget-container failed liveness probe, will be restarted |
重啟原因是HTTP探測得到的狀態返回碼是404,HTTP probe failed with statuscode: 404。
重啟完成后,不會再次重啟,因為重新拉取的鏡像中包含了index.html文件。
8. tcpSocket方式
8.1 官方示例
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: goproxy | |
labels: | |
app: goproxy | |
spec: | |
containers: | |
- name: goproxy | |
image: k8s.gcr.io/goproxy:0.1 | |
ports: | |
- containerPort: 8080 | |
readinessProbe: | |
tcpSocket: | |
port: 8080 | |
initialDelaySeconds: 5 | |
periodSeconds: 10 | |
livenessProbe: | |
tcpSocket: | |
port: 8080 | |
initialDelaySeconds: 15 | |
periodSeconds: 20 |
這個例子同時使用readinessProbe和livenessProbe探測。kubelet會在容器啟動5秒后發送第一個readiness探測。這會嘗試連接goproxy容器的8080端口。如果探測成功,kubelet將繼續每隔10秒運行一次檢測。除了readinessProbe探測,這個配置包括了一個livenessProbe探測。kubelet會在容器啟動15秒后進行第一次livenessProbe探測。就像readinessProbe探測一樣,會嘗試連接goproxy容器的8080端口。如果livenessProbe探測失敗,這個容器會被重新啟動。
8.2 編寫yaml資源配置清單
[root@master test]# vim tcpsocket.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: liveness-tcpsocket | |
spec: | |
containers: | |
- name: liveness-tcpsocket-container | |
image: nginx | |
livenessProbe: | |
initialDelaySeconds: 5 | |
timeoutSeconds: 1 | |
tcpSocket: | |
port: 8080 | |
periodSeconds: 3 |
8.3 創建資源
kubectl apply -f tcpsocket.yaml
[root@master test]# kubectl apply -f tcpsocket.yaml | |
pod/liveness-tcpsocket created |
8.4 跟蹤查看pod狀態
kubectl get pod -w
[root@master test]# kubectl get pod -w | |
NAME READY STATUS RESTARTS AGE | |
liveness-tcpsocket 0/1 ContainerCreating 0 6s | |
liveness-tcpsocket 1/1 Running 0 17s | |
liveness-tcpsocket 1/1 Running 1 44s | |
liveness-tcpsocket 1/1 Running 2 71s |
pod異常重啟
8.5 查看pod事件
kubectl describe pod liveness-tcpsocket
...... | |
Events: | |
Type Reason Age From Message | |
Normal Scheduled 93s default-scheduler Successfully assigned default/liveness-tcpsocket to node01 | |
Normal Pulled 23s (x3 over 77s) kubelet, node01 Successfully pulled image "nginx" | |
Normal Created 23s (x3 over 77s) kubelet, node01 Created container liveness-tcpsocket-container | |
Normal Started 23s (x3 over 77s) kubelet, node01 Started container liveness-tcpsocket-container | |
Normal Pulling 11s (x4 over 92s) kubelet, node01 Pulling image "nginx" | |
Warning Unhealthy 11s (x9 over 71s) kubelet, node01 Liveness probe failed: dial tcp 10.244.1.65:8080: connect: connection refused | |
Normal Killing 11s (x3 over 65s) kubelet, node01 Container liveness-tcpsocket-container failed liveness probe, will be restarted |
重啟原因是nginx使用的默認端口為80,8080端口的健康檢查被拒絕訪問
8.5 刪除pod
kubectl delete -f tcpsocket.yaml
8.6 修改tcpSocket端口
[root@master test]# vim tcpsocket.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: liveness-tcpsocket | |
spec: | |
containers: | |
- name: liveness-tcpsocket-container | |
image: nginx | |
livenessProbe: | |
initialDelaySeconds: 5 | |
timeoutSeconds: 1 | |
tcpSocket: | |
#修改端口為80 | |
port: 80 | |
periodSeconds: 3 |
8.7 再次創建資源
kubectl apply -f tcpsocket.yaml
[root@master test]# kubectl apply -f tcpsocket.yaml | |
pod/liveness-tcpsocket created |
8.8 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master ~]# kubectl get pod -o wide -w | |
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES | |
liveness-tcpsocket 1/1 Running 0 21s 10.244.1.66 node01 <none> <none> |
啟動正常,並未出現重啟
8.9 查看pod事件
kubectl describe pod liveness-tcpsocket
...... | |
Events: | |
Type Reason Age From Message | |
Normal Scheduled 33s default-scheduler Successfully assigned default/liveness-tcpsocket to node01 | |
Normal Pulling 32s kubelet, node01 Pulling image "nginx" | |
Normal Pulled 17s kubelet, node01 Successfully pulled image "nginx" | |
Normal Created 17s kubelet, node01 Created container liveness-tcpsocket-container | |
Normal Started 17s kubelet, node01 Started container liveness-tcpsocket-container |
啟動正常
9. readinessProbe就緒探針1
9.1 編寫yaml資源配置清單
[root@master test]# vim readiness-httpget.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: readiness-httpget | |
namespace: default | |
spec: | |
containers: | |
- name: readiness-httpget-container | |
image: nginx | |
imagePullPolicy: IfNotPresent | |
ports: | |
- name: http | |
containerPort: 80 | |
readinessProbe: | |
httpGet: | |
port: 80 | |
#注意,這里設置個錯誤地址 | |
path: /index1.html | |
initialDelaySeconds: 1 | |
periodSeconds: 3 | |
livenessProbe: | |
httpGet: | |
port: http | |
path: /index.html | |
initialDelaySeconds: 1 | |
periodSeconds: 3 | |
timeoutSeconds: 10 |
9.2 創建資源
kubectl apply -f readiness-httpget.yaml
[root@master test]# kubectl apply -f readiness-httpget.yaml | |
pod/readiness-httpget created |
9.3 查看pod狀態
kubectl get pod
[root@master test]# kubectl get pod | |
NAME READY STATUS RESTARTS AGE | |
readiness-httpget 0/1 Running 0 25s |
STATUS為Running,但無法進入READY狀態
9.4 查看pod事件
kubectl describe pod readiness-httpget
[root@master test]# kubectl describe pod readiness-httpget | |
...... | |
Events: | |
Type Reason Age From Message | |
---- ------ ---- ---- ------- | |
Normal Scheduled 119s default-scheduler Successfully assigned default/readiness-httpget to node01 | |
Normal Pulled 119s kubelet, node01 Container image "nginx" already present on machine | |
Normal Created 119s kubelet, node01 Created container readiness-httpget-container | |
Normal Started 119s kubelet, node01 Started container readiness-httpget-container | |
Warning Unhealthy 54s (x22 over 117s) kubelet, node01 Readiness probe failed: HTTP probe failed with statuscode: 404 |
異常原因為readinessProbe檢測的狀態返回值為404,kubelet阻止pod進入READY狀態
9.5 查看日志
kubectl logs readiness-httpget
[root@master test]# kubectl logs readiness-httpget | |
...... | |
2021/11/07 16:40:41 [error] 32#32: *164 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.1.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.1.68:80" | |
10.244.1.1 - - [07/Nov/2021:16:40:41 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-" | |
10.244.1.1 - - [07/Nov/2021:16:40:43 +0000] "GET /index.html HTTP/1.1" 200 615 "-" "kube-probe/1.15" "-" |
9.6 為容器創建index1.html
kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html
[root@master test]# kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html
9.7 查看容器狀態
kubectl get pod
[root@master test]# kubectl get pod | |
NAME READY STATUS RESTARTS AGE | |
readiness-httpget 1/1 Running 0 15m |
10. readinessProbe就緒探針2
10.1 編寫yaml資源配置清單
[root@master test]# cat readiness-multi-nginx.yaml | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: nginx1 | |
labels: | |
app: nginx | |
spec: | |
containers: | |
- name: nginx | |
image: nginx | |
imagePullPolicy: IfNotPresent | |
ports: | |
- name: http | |
containerPort: 80 | |
readinessProbe: | |
httpGet: | |
port: http | |
path: /index.html | |
initialDelaySeconds: 5 | |
periodSeconds: 5 | |
timeoutSeconds: 10 | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: nginx2 | |
labels: | |
app: nginx | |
spec: | |
containers: | |
- name: nginx | |
image: nginx | |
imagePullPolicy: IfNotPresent | |
ports: | |
- name: http | |
containerPort: 80 | |
readinessProbe: | |
httpGet: | |
port: http | |
path: /index.html | |
initialDelaySeconds: 5 | |
periodSeconds: 5 | |
timeoutSeconds: 10 | |
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: nginx3 | |
labels: | |
app: nginx | |
spec: | |
containers: | |
- name: nginx | |
image: nginx | |
imagePullPolicy: IfNotPresent | |
ports: | |
- name: http | |
containerPort: 80 | |
readinessProbe: | |
httpGet: | |
port: http | |
path: /index.html | |
initialDelaySeconds: 5 | |
periodSeconds: 5 | |
timeoutSeconds: 10 | |
apiVersion: v1 | |
kind: Service | |
metadata: | |
name: nginx-svc | |
spec: | |
#service通過selector綁定到nginx集群中 | |
selector: | |
app: nginx | |
type: ClusterIP | |
ports: | |
- name: http | |
port: 80 | |
targetPort: 80 |
10.2 創建資源
kubectl apply -f readiness-multi-nginx.yaml
[root@master test]# kubectl apply -f readiness-multi-nginx.yaml | |
pod/nginx1 created | |
pod/nginx2 created | |
pod/nginx3 created | |
service/nginx-svc created |
10.3 查看pod,service狀態
kubectl get pod,svc -o wide
[root@master test]# kubectl get pod,svc -o wide | |
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES | |
pod/nginx1 1/1 Running 0 22s 10.244.1.69 node01 <none> <none> | |
pod/nginx2 1/1 Running 0 22s 10.244.2.31 node02 <none> <none> | |
pod/nginx3 1/1 Running 0 22s 10.244.1.70 node01 <none> <none> | |
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR | |
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 3d3h <none> | |
service/nginx-svc ClusterIP 10.1.177.18 <none> 80/TCP 22s app=nginx |
運行成功
10.4 刪除nginx1中的index.html
kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html
[root@master test]# kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html
10.5 查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w | |
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES | |
nginx1 1/1 Running 0 3m41s 10.244.1.69 node01 <none> <none> | |
nginx2 1/1 Running 0 3m41s 10.244.2.31 node02 <none> <none> | |
nginx3 1/1 Running 0 3m41s 10.244.1.70 node01 <none> <none> | |
nginx1 0/1 Running 0 3m43s 10.244.1.69 node01 <none> <none> |
nginx1的READY狀態變為0/1
10.6 查看pod事件
kubectl describe pod nginx1
[root@master test]# kubectl describe pod nginx1 | |
...... | |
Events: | |
Type Reason Age From Message | |
---- ------ ---- ---- ------- | |
Normal Scheduled 4m13s default-scheduler Successfully assigned default/nginx1 to node01 | |
Normal Pulled 4m12s kubelet, node01 Container image "nginx" already present on machine | |
Normal Created 4m12s kubelet, node01 Created container nginx | |
Normal Started 4m12s kubelet, node01 Started container nginx | |
Warning Unhealthy 0s (x9 over 40s) kubelet, node01 Readiness probe failed: HTTP probe failed with statuscode: 404 |
由於httpGet檢測到的狀態返回碼為404,所以readinessProbe失敗,kubelet將其設定為noready狀態。
10.7 查看service詳情
kubectl describe svc nginx-svc
[root@master test]# kubectl describe svc nginx-svc | |
Name: nginx-svc | |
Namespace: default | |
Labels: <none> | |
Annotations: kubectl.kubernetes.io/last-applied-configuration: | |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-svc","namespace":"default"},"spec":{"ports":[{"name":"http"... | |
Selector: app=nginx | |
Type: ClusterIP | |
IP: 10.1.177.18 | |
Port: http 80/TCP | |
TargetPort: 80/TCP | |
Endpoints: 10.244.1.70:80,10.244.2.31:80 | |
Session Affinity: None | |
Events: <none> |
nginx1被剔除出了service的終端列表
10.8 查看終端
kubectl get endpoints
[root@master test]# kubectl get endpoints | |
NAME ENDPOINTS AGE | |
kubernetes 192.168.122.10:6443 3d3h | |
nginx-svc 10.244.1.70:80,10.244.2.31:80 9m34s |
終端中無nginx1
三、總結
1. 探針
探針分為3種
- livenessProbe(存活探針)∶判斷容器是否正常運行,如果失敗則殺掉容器(不是pod),再根據重啟策略是否重啟容器
- readinessProbe(就緒探針)∶判斷容器是否能夠進入ready狀態,探針失敗則進入noready狀態,並從service的endpoints中剔除此容器
- startupProbe∶判斷容器內的應用是否啟動成功,在success狀態前,其它探針都處於無效狀態
2. 檢查方式
檢查方式分為3種
- exec∶使用 command 字段設置命令,在容器中執行此命令,如果命令返回狀態碼為0,則認為探測成功
- httpget∶通過訪問指定端口和url路徑執行http get訪問。如果返回的http狀態碼為大於等於200且小於400則認為成功
- tcpsocket∶通過tcp連接pod(IP)和指定端口,如果端口無誤且tcp連接成功,則認為探測成功
3. 常用的探針可選參數
常用的探針可選參數有4個
- initialDelaySeconds∶ 容器啟動多少秒后開始執行探測
- periodSeconds∶探測的周期頻率,每多少秒執行一次探測
- failureThreshold∶探測失敗后,允許再試幾次
- timeoutSeconds ∶ 探測等待超時的時間