k8s之資源限制以及探針檢查
- k8s之資源限制以及探針檢查
一、資源限制
1. 資源限制的使用
當定義Pod時可以選擇性地為每個容器設定所需要的資源數量。最常見的可設定資源是CPU和內存大小,以及其他類型的資源。
2. reuqest資源(請求)和limit資源(約束)
當為Pod中的容器指定了request資源時,調度器就使用該信息來決定將Pod調度到哪個節點上。當還為容器指定了limit資源時,kubelet就會確保運行的容器不會使用超出所設的limit資源量。kubelet還會為容器預留所設的request資源量,供該容器使用。
如果Pod所在的節點具有足夠的可用資源,容器可以使用超過所設置的request資源量。不過,容器不可以使用超出所設置的limit資源量。
如果給容器設置了內存的limit值,但未設置內存的request值,Kubernetes會自動為其設置與內存limit相匹配的request值。類似的,如果給容器設置了CPU的limit值但未設置CPU的request值,則Kubernetes自動為其設置CPU的request值,並使之與CPU的limit值匹配。
3. 官網示例
https://kubernetes.io/zh/docs/concepts/configuration/manage-resources-containers/
4. Pod和容器的資源請求和限制
定義創建容器時預分配的CPU資源
spec.containers[].resources.requests.cpu
定義創建容器時預分配的內存資源
spec.containers[].resources.requests.memory
定義創建容器時預分配的巨頁資源
spec.containers[].resources.requests.hugepages-<size>
定義cpu的資源上限
spec.containers[].resources.limits.cpu
定義內存的資源上限
spec.containers[].resources.limits.memory
定義巨頁的資源上限
spec.containers[].resources.limits.hugepages-<size>
5. 資源類型
CPU 和內存都是資源類型。每種資源類型具有其基本單位。 CPU 表達的是計算處理能力,其單位是 Kubernetes CPUs。 內存的單位是字節。 如果你使用的是 Kubernetes v1.14 或更高版本,則可以指定巨頁(Huge Page)資源。 巨頁是 Linux 特有的功能,節點內核在其中分配的內存塊比默認頁大小大得多。
例如,在默認頁面大小為 4KiB 的系統上,你可以指定約束 hugepages-2Mi: 80Mi。 如果容器嘗試分配 40 個 2MiB 大小的巨頁(總共 80 MiB ),則分配請求會失敗。
說明:
你不能過量使用 hugepages- * 資源。 這與 memory 和 cpu 資源不同。
6. CPU資源單位
CPU資源的request和limit以cpu為單位。kubernetes中的一個cpu相當於1個vCPU(1個超線程)。
Kubernetes也支持帶小數CPU的請求。spec.containers[].resources.requests.cpu為0.5的容器能夠獲得一個cpu的一半CPU資源(類似於cgroup對CPU資源的時間分片)。表達式0.1等價於表達式100m(毫核),表示每1000毫秒內容器可以使用的CPU時間總量為0.1*1000毫秒。
7. 內存資源單位
內存的request和limit以字節為單位。可以用證書表示,
也可以用以10為底數的指數的單位(E、P、T、G、M、K)來表示,
或者以2為底數的指數的單位(Ei、Pi、Ti、Gi、Mi、Ki)來表示。
如1KB=103=1000,1MB=106=1000000=1000KB,1GB=10^9=1000000000=1000MB
1KiB=210=1024,1MiB=220=1048576=1024KiB
PS:在買硬盤的時候,操作系統報的數量要比產品標出或商家號稱的小一些,主要原因是標出的是以MB、GB為單位的,1GB就是1,000,000,000Byte,而操作系統是以2進制為處理單位的,因此檢查硬盤容量時是以MiB、GiB為單位,1GiB=2^30=1,073,741,824Byte。相比較而言,1GiB要比1GB多出73,741,824Byte,所以檢測實際結果要比標出的少一些,單位越大,兩者的差值也就越大。
8. 官方文檔示例
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- name: log-aggregator
image: images.my-company.example/log-aggregator:v6
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
此例子中 Pod 有兩個 Container。每個 Container 的請求為 0.25 cpu 和 64MiB(226 字節)內存, 每個容器的資源約束為 0.5 cpu 和 128MiB 內存。 你可以認為該 Pod 的資源請求為 0.5 cpu 和 128 MiB 內存,資源限制為 1 cpu 和 256MiB 內存。
9. 資源限制實操
9.1 編寫yaml資源配置清單
[root@master ~]# mkdir /opt/test
[root@master ~]# cd !$
cd /opt/test
[root@master test]# vim test1.yaml
apiVersion: v1
kind: Pod
metadata:
name: test1
spec:
containers:
- name: web
image: nginx
env:
- name: WEB_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
9.2 釋放內存(node節點,以node01為例)
由於mysql對於內存的使用要求比較高,因此需要先檢查內存的可用空間是否能夠滿足mysql的正常運行,若剩余內存不夠,可對其進行釋放操作。
9.2.1 查看內存
free -mH
[root@node01 ~]# free -mh
total used free shared buff/cache available
Mem: 1.9G 1.0G 86M 26M 870M 663M
Swap: 0B 0B 0B
內存總量為1.9G,實際使用1G,因此可有內存應該為0.9G左右。
但是由於有870M的內存被用於緩存,導致了free僅為86M。
86M剩余可用內存顯然是不夠用的,因此需要釋放緩存。
9.2.2 手動釋放緩存
echo [1\2\3] > /proc/sys/vm/drop_caches
[root@node01 ~]# cat /proc/sys/vm/drop_caches
0
[root@node01 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@node01 ~]# free -mh
total used free shared buff/cache available
Mem: 1.9G 968M 770M 26M 245M 754M
Swap: 0B 0B 0B
0:0是系統默認值,默認情況下表示不釋放內存,由操作系統自動管理
1:釋放頁緩存
2:釋放dentries和inodes
3:釋放所有緩存
注意:
如果因為是應用有像內存泄露、溢出的問題,從swap的使用情況是可以比較快速可以判斷的,但free上面反而比較難查看。相反,如果在這個時候,我們告訴用戶,修改系統的一個值,“可以”釋放內存,free就大了。用戶會怎么想?不會覺得操作系統“有問題”嗎?所以說,既然核心是可以快速清空buffer或cache,也不難做到(這從上面的操作中可以明顯看到),但核心並沒有這樣做(默認值是0),我們就不應該隨便去改變它。
一般情況下,應用在系統上穩定運行了,free值也會保持在一個穩定值的,雖然看上去可能比較小。當發生內存不足、應用獲取不到可用內存、OOM錯誤等問題時,還是更應該去分析應用方面的原因,如用戶量太大導致內存不足、發生應用內存溢出等情況,否則,清空buffer,強制騰出free的大小,可能只是把問題給暫時屏蔽了。
9.3 創建資源
kubectl apply -f tets1.yaml
[root@master test]# kubectl apply -f test1.yaml
pod/test1 created
9.4 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test1 0/2 ContainerCreating 0 4s <none> node01 <none> <none>
test1 2/2 Running 0 18s 10.244.1.55 node01 <none> <none>
test1 1/2 OOMKilled 0 21s 10.244.1.55 node01 <none> <none>
test1 2/2 Running 1 37s 10.244.1.55 node01 <none> <none>
test1 1/2 OOMKilled 1 40s 10.244.1.55 node01 <none> <none>
......
OOM(OverOfMemory)表示服務的運行超過了我們所設定的約束值。
Ready:2/2,status:Running說明該pod已成功創建並運行,但運行過程中發生OOM問題被kubelet殺死並重新拉起新的pod。
9.5 查看容器日志
kubectl logs test1 -c web
[root@master test]# kubectl logs test1 -c web
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/11/06 08:31:23 [notice] 1#1: using the "epoll" event method
2021/11/06 08:31:23 [notice] 1#1: nginx/1.21.3
2021/11/06 08:31:23 [notice] 1#1: built by gcc 8.3.0 (Debian 8.3.0-6)
2021/11/06 08:31:23 [notice] 1#1: OS: Linux 3.10.0-693.el7.x86_64
2021/11/06 08:31:23 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2021/11/06 08:31:23 [notice] 1#1: start worker processes
2021/11/06 08:31:23 [notice] 1#1: start worker process 31
2021/11/06 08:31:23 [notice] 1#1: start worker process 32
nginx啟動正常,接下來查看mysql日志
kubectl logs test1 -c mysql
[root@master test]# kubectl logs test1 -c db
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.27-1debian10 started.
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.27-1debian10 started.
2021-11-06 08:38:44+00:00 [Note] [Entrypoint]: Initializing database files
2021-11-06T08:38:44.274783Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.27) initializing of server in progress as process 41
2021-11-06T08:38:44.279965Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-11-06T08:38:44.711420Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-11-06T08:38:45.777355Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-11-06T08:38:45.777389Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-11-06T08:38:45.898121Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
/usr/local/bin/docker-entrypoint.sh: line 191: 41 Killed "$@" --initialize-insecure --default-time-zone=SYSTEM
鎖定問題容器為mysql
9.6 刪除pod
kubectl delete -f test1
[root@master test]# kubectl delete -f test1.yaml
pod "test1" deleted
9.7 修改yaml配置資源清單,提高mysql資源限制
[root@master test]# vim test1.yaml
apiVersion: v1
kind: Pod
metadata:
name: test1
spec:
containers:
- name: web
image: nginx
env:
- name: WEB_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "512Mi"
cpu: "0.5"
limits:
memory: "1024Mi"
cpu: "1"
9.8 再次創建資源
kubectl apply -f test1.yaml
[root@master test]# kubectl apply -f test1.yaml
pod/test1 created
9.9 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test1 0/2 ContainerCreating 0 12s <none> node01 <none> <none>
test1 2/2 Running 0 18s 10.244.1.56 node01 <none> <none>
9.10 查看pod詳細信息
kubectl describe pod test1
[root@master test]# kubectl describe pod test1
......
Containers:
web:
Container ID: docker://caf5bef54f878ebba32728b5e43743e36bbdf1457973f3ca130c98de5e1803d3
Image: nginx
......
#nginx資源限制
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 250m
memory: 64Mi
#nginx環境變量
Environment:
WEB_ROOT_PASSWORD: password
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7lsdx (ro)
db:
Container ID: docker://2574f2bd02d9d7fc5bb0d2b74582b0bece3d8bd37d1d7ff3148ae8109df49367
Image: mysql
......
#mysql資源限制
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 500m
memory: 512Mi
#mysql環境變量
Environment:
MYSQL_ROOT_PASSWORD: password
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7lsdx (ro)
......
#pod創建過程/事件記錄
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 105s default-scheduler Successfully assigned default/test1 to node01
Normal Pulling 104s kubelet, node01 Pulling image "nginx"
Normal Pulled 103s kubelet, node01 Successfully pulled image "nginx"
Normal Created 103s kubelet, node01 Created container web
Normal Started 103s kubelet, node01 Started container web
Normal Pulling 103s kubelet, node01 Pulling image "mysql"
Normal Pulled 88s kubelet, node01 Successfully pulled image "mysql"
Normal Created 88s kubelet, node01 Created container db
Normal Started 88s kubelet, node01 Started container db
9.11 查看node資源使用
[root@master test]# kubectl describe node node01
......
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default test1 750m (37%) 1500m (75%) 576Mi (30%) 1152Mi (61%) 10m
kube-system coredns-bccdc95cf-qrlbp 100m (5%) 0 (0%) 70Mi (3%) 170Mi (9%) 4d21h
kube-system kube-flannel-ds-amd64-6927f 100m (5%) 100m (5%) 50Mi (2%) 50Mi (2%) 4d21h
kube-system kube-proxy-hjqfc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d21h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 950m (47%) 1600m (80%)
memory 696Mi (36%) 1372Mi (72%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
node01的配置為2C2G。
CPU Requests分析:
nginx的requests為250m,mysql的requests為500m,因此node01的CPU Requests為750m,在node01的兩個核中使用占比為37%。
CPU Limits分析:
nginx到的limit為500m,mysql的limit為1,因此node01到的CPU Limits為1500m,在node01的兩個核中使用占比為75%。
Memory Requests分析:
nginx的requests為64Mi,mysql的requests為512Mi,因此node01的內存Requests為576Mi,在node01的2G內存中使用占比為30%。
Memory Limits分析:
nginx的limits為128Mi,mysql的limit為1Gi,因此node01的1152Mi,在node01的2G內存中使用占比為61%。
二、健康檢查
1. 健康檢查的定義
健康檢查又稱為探針(Probe),是由kubelet對容器執行的定期診斷。
2. 探針的三種規則
2.1 livenessProbe存活探針
判斷容器是否正在運行。如果探測失敗,則kubelet會殺死容器,並且容器將根據restartPolicy來設置Pod狀態,如果容器不提供存活探針,則默認狀態為Success。
2.2 readinessProbe就緒探針
判斷容器是否准備好接受請求。如果探測失敗,端點控制器將從與Pod匹配的所有service endpoints中剔除刪除該Pod的IP地址。初始延遲之前的就緒狀態默認為Failure。如果容器不提供就緒探針,則默認狀態為Success。
2.3 startupProbe啟動探針(1.17版本新增)
判斷容器內的應用程序是否已啟動,主要針對於不能確定具體啟動時間的應用。如果匹配了startupProbe探測,則在startupProbe狀態為Success之前,其他所有探針都處於無效狀態,直到它成功后其他探針才起作用。如果startupProbe失敗,kubelet將殺死容器,容器將根據restartPolicy來重啟。如果容器沒有配置startupProbe,則默認狀態為Success。
2.4 同時定義
以上三種規則可同時定義。在readinessProbe檢測成功之前,Pod的running狀態是不會變成ready狀態的。
3. Probe支持的三種檢測方法
3.1 exec
在容器內執行執行命令,如果容器退出時返回碼為0則認為診斷成功。
3.2 tcpSocket
對指定端口上的容器的IP地址進行TCP檢查(三次握手)。如果端口打開,則診斷被認為是成功的。
3.3 httpGet
對指定的端口和路徑上的容器的IP地址執行httpGet請求。如果響應的狀態碼大於等於200且小於400(2xx和3xx),則診斷被認為是成功的。
4. 探測結果
每次探測都將獲得以下三種結果之一:
● 成功:容器通過了診斷
● 失敗:容器未通過診斷
● 未知:診斷失敗,因此不會采取任何行動
5. 官方文檔
6. exec方式
6.1 官方示例1
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
initalDeploySeconds:指定kubelet在執行第一次探測前應該等待5秒,即第一次探測是在容器啟動后的第6秒才開始執行。默認是0秒,最小值是0。
periodSeconds:指定了kubelet應該每5秒執行一次存活探測。默認是10秒,最小值是1
補充:
failureThreshold:當探測失敗時,Kubernetes將在放棄之前重試的次數。存活探測情況下的放棄就意味着重新啟動容器,就緒探測情況下的放棄Pod會被打上未就緒的標簽。默認值是3,最小值是1。
timeoutSeconds:探測的超時后等待多少秒。默認值是1秒,最小值是1。(在Kubernetes 1.20版本之前,exec探針會忽略timeoutSeconds,探針會無限期地持續運行,甚至可能超過所配置的限期,直到返回結果為止。)
在這個配置文件中,可以看到 Pod 中只有一個容器。 periodSeconds 字段指定了 kubelet 應該每 5 秒執行一次存活探測。 initialDelaySeconds 字段告訴 kubelet 在執行第一次探測前應該等待 5 秒。 kubelet 在容器內執行命令 cat /tmp/healthy 來進行探測。 如果命令執行成功並且返回值為 0,kubelet 就會認為這個容器是健康存活的。 如果這個命令返回非 0 值,kubelet 會殺死這個容器並重新啟動它。
6.2 編寫yaml資源配置清單
[root@master test]# vim exec.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
namespace: default
spec:
containers:
- name: liveness-exec-container
image: busybox
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c","touch /tmp/live; sleep 30; rm -rf /tmp/live; sleep 3600"]
livenessProbe:
exec:
command: ["test","-e","/tmp/live"]
initialDelaySeconds: 1
periodSeconds: 3
在這個配置文件中,可以看到Pod只有一個容器。
容器中的command字段表示創建一個/tmp/live文件后休眠30秒,休眠結束后刪除該文件,並休眠10分鍾。
僅使用livenessProbe存活探針,並使用exec檢查方式,對/tmp/live文件進行存活檢測。
initialDelaySeconds字段表示kubelet在執行第一次探測前應該等待1秒。
periodSeconds字段表示kubelet每隔3秒執行一次存活探測。
6.3 創建資源
kubectl create -f exec.yaml
[root@master test]# kubectl create -f exec.yaml
pod/liveness-exec created
6.4 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master ~]# kubectl get pod -o wide -w
liveness-exec 0/1 Pending 0 0s <none> <none> <none> <none>
liveness-exec 0/1 Pending 0 0s <none> node01 <none> <none>
liveness-exec 0/1 ContainerCreating 0 0s <none> node01 <none> <none>
liveness-exec 1/1 Running 0 2s 10.244.1.62 node01 <none> <none>
liveness-exec 1/1 Running 1 68s 10.244.1.62 node01 <none> <none>
發現68秒時容器重啟
6.5 查看pod事件描述
kubectl describe pod liveness-exec
[root@master test]# kubectl describe pod liveness-exec
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned default/liveness-exec to node01
Normal Started 66s kubelet, node01 Started container liveness-exec-container
Warning Unhealthy 30s (x3 over 36s) kubelet, node01 Liveness probe failed:
Normal Killing 30s kubelet, node01 Container liveness-exec-container failed liveness probe, will be restarted
Normal Pulled 0s (x2 over 67s) kubelet, node01 Container image "busybox" already present on machine
Normal Created 0s (x2 over 67s) kubelet, node01 Created container liveness-exec-container
在容器啟動37秒時,健康檢查三次失敗(倒推第一次檢查在31秒),kubelet啟動了killing程序並在67秒時拉取鏡像創建新的容器,在68秒時完成第一次容器重啟
7. httpGet方式
7.1 官網示例2
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
在這個配置文件中,可以看到Pod只有一個容器。initialDealySeconds字段告訴kubelet再執行第一次探測前應該等待3秒。preiodSeconds字段指定了kubelet每隔3秒執行一次存活探測。kubelet會向容器內運行的服務(服務會監聽8080端口)發送一個認為容器是健康存活的。如果處理程序返回失敗代碼,則kubelet會殺死這個容器並且重新啟動它。
任何大於或等於200並且小於400的返回代碼標示成功,其他返回代碼都標示失敗。
7.2 編寫yaml資源配置清單
[root@master test]# vim httpget.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-httpget
namespace: default
spec:
containers:
- name: liveness-httpget-container
image: nginx
imagePullPolicy: IfNotPresent
ports:
- name: nginx
containerPort: 80
livenessProbe:
httpGet:
port: nginx
path: /index.html
initialDelaySeconds: 1
periodSeconds: 3
timeoutSeconds: 10
7.3 創建資源
kubectl create -f httpget.yaml
[root@master test]# kubectl create -f httpget.yaml
pod/liveness-httpget created
kubectl get pod
[root@master test]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-httpget 1/1 Running 0 6s
7.4 刪除Pod的index.html文件
kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html
[root@master test]# kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html
7.5 查看pod狀態
kubectl get pod -w
[root@master test]# kubectl get pod -w
NAME READY STATUS RESTARTS AGE
liveness-httpget 1/1 Running 0 5m35s
liveness-httpget 1/1 Running 1 5m37s
容器發生重啟
7.6 查看容器事件
kubectl describe pod liveness-httpget
[root@master ~]# kubectl describe pod liveness-httpget
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m47s default-scheduler Successfully assigned default/liveness-httpget to node01
Normal Pulled 11s (x2 over 5m46s) kubelet, node01 Container image "nginx" already present on machine
Normal Created 11s (x2 over 5m46s) kubelet, node01 Created container liveness-httpget-container
Normal Started 11s (x2 over 5m46s) kubelet, node01 Started container liveness-httpget-container
Warning Unhealthy 11s (x3 over 17s) kubelet, node01 Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 11s kubelet, node01 Container liveness-httpget-container failed liveness probe, will be restarted
重啟原因是HTTP探測得到的狀態返回碼是404,HTTP probe failed with statuscode: 404。
重啟完成后,不會再次重啟,因為重新拉取的鏡像中包含了index.html文件。
8. tcpSocket方式
8.1 官方示例
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
這個例子同時使用readinessProbe和livenessProbe探測。kubelet會在容器啟動5秒后發送第一個readiness探測。這會嘗試連接goproxy容器的8080端口。如果探測成功,kubelet將繼續每隔10秒運行一次檢測。除了readinessProbe探測,這個配置包括了一個livenessProbe探測。kubelet會在容器啟動15秒后進行第一次livenessProbe探測。就像readinessProbe探測一樣,會嘗試連接goproxy容器的8080端口。如果livenessProbe探測失敗,這個容器會被重新啟動。
8.2 編寫yaml資源配置清單
[root@master test]# vim tcpsocket.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-tcpsocket
spec:
containers:
- name: liveness-tcpsocket-container
image: nginx
livenessProbe:
initialDelaySeconds: 5
timeoutSeconds: 1
tcpSocket:
port: 8080
periodSeconds: 3
8.3 創建資源
kubectl apply -f tcpsocket.yaml
[root@master test]# kubectl apply -f tcpsocket.yaml
pod/liveness-tcpsocket created
8.4 跟蹤查看pod狀態
kubectl get pod -w
[root@master test]# kubectl get pod -w
NAME READY STATUS RESTARTS AGE
liveness-tcpsocket 0/1 ContainerCreating 0 6s
liveness-tcpsocket 1/1 Running 0 17s
liveness-tcpsocket 1/1 Running 1 44s
liveness-tcpsocket 1/1 Running 2 71s
pod異常重啟
8.5 查看pod事件
kubectl describe pod liveness-tcpsocket
[root@master test]# kubectl describe pod liveness-tcpsocket
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 93s default-scheduler Successfully assigned default/liveness-tcpsocket to node01
Normal Pulled 23s (x3 over 77s) kubelet, node01 Successfully pulled image "nginx"
Normal Created 23s (x3 over 77s) kubelet, node01 Created container liveness-tcpsocket-container
Normal Started 23s (x3 over 77s) kubelet, node01 Started container liveness-tcpsocket-container
Normal Pulling 11s (x4 over 92s) kubelet, node01 Pulling image "nginx"
Warning Unhealthy 11s (x9 over 71s) kubelet, node01 Liveness probe failed: dial tcp 10.244.1.65:8080: connect: connection refused
Normal Killing 11s (x3 over 65s) kubelet, node01 Container liveness-tcpsocket-container failed liveness probe, will be restarted
重啟原因是nginx使用的默認端口為80,8080端口的健康檢查被拒絕訪問
8.5 刪除pod
kubectl delete -f tcpsocket.yaml
8.6 修改tcpSocket端口
[root@master test]# vim tcpsocket.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-tcpsocket
spec:
containers:
- name: liveness-tcpsocket-container
image: nginx
livenessProbe:
initialDelaySeconds: 5
timeoutSeconds: 1
tcpSocket:
#修改端口為80
port: 80
periodSeconds: 3
8.7 再次創建資源
kubectl apply -f tcpsocket.yaml
[root@master test]# kubectl apply -f tcpsocket.yaml
pod/liveness-tcpsocket created
8.8 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master ~]# kubectl get pod -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcpsocket 1/1 Running 0 21s 10.244.1.66 node01 <none> <none>
啟動正常,並未出現重啟
8.9 查看pod事件
kubectl describe pod liveness-tcpsocket
[root@master test]# kubectl describe pod liveness-tcpsocket
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33s default-scheduler Successfully assigned default/liveness-tcpsocket to node01
Normal Pulling 32s kubelet, node01 Pulling image "nginx"
Normal Pulled 17s kubelet, node01 Successfully pulled image "nginx"
Normal Created 17s kubelet, node01 Created container liveness-tcpsocket-container
Normal Started 17s kubelet, node01 Started container liveness-tcpsocket-container
啟動正常
9. readinessProbe就緒探針1
9.1 編寫yaml資源配置清單
[root@master test]# vim readiness-httpget.yaml
apiVersion: v1
kind: Pod
metadata:
name: readiness-httpget
namespace: default
spec:
containers:
- name: readiness-httpget-container
image: nginx
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
port: 80
#注意,這里設置個錯誤地址
path: /index1.html
initialDelaySeconds: 1
periodSeconds: 3
livenessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 1
periodSeconds: 3
timeoutSeconds: 10
9.2 創建資源
kubectl apply -f readiness-httpget.yaml
[root@master test]# kubectl apply -f readiness-httpget.yaml
pod/readiness-httpget created
9.3 查看pod狀態
kubectl get pod
[root@master test]# kubectl get pod
NAME READY STATUS RESTARTS AGE
readiness-httpget 0/1 Running 0 25s
STATUS為Running,但無法進入READY狀態
9.4 查看pod事件
kubectl describe pod readiness-httpget
[root@master test]# kubectl describe pod readiness-httpget
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 119s default-scheduler Successfully assigned default/readiness-httpget to node01
Normal Pulled 119s kubelet, node01 Container image "nginx" already present on machine
Normal Created 119s kubelet, node01 Created container readiness-httpget-container
Normal Started 119s kubelet, node01 Started container readiness-httpget-container
Warning Unhealthy 54s (x22 over 117s) kubelet, node01 Readiness probe failed: HTTP probe failed with statuscode: 404
異常原因為readinessProbe檢測的狀態返回值為404,kubelet阻止pod進入READY狀態
9.5 查看日志
kubectl logs readiness-httpget
[root@master test]# kubectl logs readiness-httpget
......
2021/11/07 16:40:41 [error] 32#32: *164 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.1.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.1.68:80"
10.244.1.1 - - [07/Nov/2021:16:40:41 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
10.244.1.1 - - [07/Nov/2021:16:40:43 +0000] "GET /index.html HTTP/1.1" 200 615 "-" "kube-probe/1.15" "-"
9.6 為容器創建index1.html
kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html
[root@master test]# kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html
9.7 查看容器狀態
kubectl get pod
[root@master test]# kubectl get pod
NAME READY STATUS RESTARTS AGE
readiness-httpget 1/1 Running 0 15m
10. readinessProbe就緒探針2
10.1 編寫yaml資源配置清單
[root@master test]# cat readiness-multi-nginx.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx1
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 10
---
apiVersion: v1
kind: Pod
metadata:
name: nginx2
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 10
---
apiVersion: v1
kind: Pod
metadata:
name: nginx3
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
spec:
#service通過selector綁定到nginx集群中
selector:
app: nginx
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 80
10.2 創建資源
kubectl apply -f readiness-multi-nginx.yaml
[root@master test]# kubectl apply -f readiness-multi-nginx.yaml
pod/nginx1 created
pod/nginx2 created
pod/nginx3 created
service/nginx-svc created
10.3 查看pod,service狀態
kubectl get pod,svc -o wide
[root@master test]# kubectl get pod,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx1 1/1 Running 0 22s 10.244.1.69 node01 <none> <none>
pod/nginx2 1/1 Running 0 22s 10.244.2.31 node02 <none> <none>
pod/nginx3 1/1 Running 0 22s 10.244.1.70 node01 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 3d3h <none>
service/nginx-svc ClusterIP 10.1.177.18 <none> 80/TCP 22s app=nginx
運行成功
10.4 刪除nginx1中的index.html
kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html
[root@master test]# kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html
10.5 查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx1 1/1 Running 0 3m41s 10.244.1.69 node01 <none> <none>
nginx2 1/1 Running 0 3m41s 10.244.2.31 node02 <none> <none>
nginx3 1/1 Running 0 3m41s 10.244.1.70 node01 <none> <none>
nginx1 0/1 Running 0 3m43s 10.244.1.69 node01 <none> <none>
nginx1的READY狀態變為0/1
10.6 查看pod事件
kubectl describe pod nginx1
[root@master test]# kubectl describe pod nginx1
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m13s default-scheduler Successfully assigned default/nginx1 to node01
Normal Pulled 4m12s kubelet, node01 Container image "nginx" already present on machine
Normal Created 4m12s kubelet, node01 Created container nginx
Normal Started 4m12s kubelet, node01 Started container nginx
Warning Unhealthy 0s (x9 over 40s) kubelet, node01 Readiness probe failed: HTTP probe failed with statuscode: 404
由於httpGet檢測到的狀態返回碼為404,所以readinessProbe失敗,kubelet將其設定為noready狀態。
10.7 查看service詳情
kubectl describe svc nginx-svc
[root@master test]# kubectl describe svc nginx-svc
Name: nginx-svc
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-svc","namespace":"default"},"spec":{"ports":[{"name":"http"...
Selector: app=nginx
Type: ClusterIP
IP: 10.1.177.18
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints: 10.244.1.70:80,10.244.2.31:80
Session Affinity: None
Events: <none>
nginx1被剔除出了service的終端列表
10.8 查看終端
kubectl get endpoints
[root@master test]# kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.122.10:6443 3d3h
nginx-svc 10.244.1.70:80,10.244.2.31:80 9m34s
終端中無nginx1
三. 啟動、退出動作
1. 編寫yaml資源配置清單
[root@master test]# vim post.yaml
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-test
spec:
containers:
- name: lifecycle-test-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","echo Hello from the postStart handler >> /var/log/nginx/message"]
preStop:
exec:
command: ["/bin/sh","-c","echo Hello from the postStop handler >> /var/log/nginx/message"]
volumeMounts:
- name: message-log
mountPath: /var/log/nginx/
readOnly: false
initContainers:
- name: init-nginx
image: nginx
command: ["/bin/sh","-c","echo 'Hello initContainers' >> /var/log/nginx/message"]
volumeMounts:
- name: message-log
mountPath: /var/log/nginx/
readOnly: false
volumes:
- name: message-log
hostPath:
path: /data/volumes/nginx/log/
type: DirectoryOrCreate
2. 創建資源
kubectl apply -f post.yaml
[root@master test]# kubectl apply -f post.yaml
pod/lifesycle-test created
3. 跟蹤查看pod狀態
kubectl get pod -o wide -w
[root@master test]# kubectl get pod -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lifecycle-test 0/1 Init:0/1 0 5s <none> node01 <none> <none>
lifecycle-test 0/1 PodInitializing 0 17s 10.244.1.73 node01 <none> <none>
lifecycle-test 1/1 Running 0 19s 10.244.1.73 node01 <none> <none>
4. 查看pod事件
kubectl describe po lifecycle-test
[root@master test]# kubectl describe po lifecycle-test
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 46s default-scheduler Successfully assigned default/lifecycle-test to node01
Normal Pulling 45s kubelet, node01 Pulling image "nginx"
Normal Pulled 30s kubelet, node01 Successfully pulled image "nginx"
Normal Created 30s kubelet, node01 Created container init-nginx
Normal Started 30s kubelet, node01 Started container init-nginx
Normal Pulling 29s kubelet, node01 Pulling image "nginx"
Normal Pulled 27s kubelet, node01 Successfully pulled image "nginx"
Normal Created 27s kubelet, node01 Created container lifecycle-test-container
Normal Started 27s kubelet, node01 Started container lifecycle-test-container
5. 查看容器日志
kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
[root@master test]# kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
Hello initContainers
Hello from the postStart handler
由上可知,init容器先執行,然后當一個主容器啟動后,kubernetes將立即發送postStart事件。
6. 關閉容器后查看節點掛載文件
kubectl delete -f post.yaml
[root@master test]# kubectl delete -f post.yaml
pod "lifecycle-test" deleted
node01節點
[root@node01 ~]# cat /data/volumes/nginx/log/message
Hello initContainers
Hello from the postStart handler
Hello from the postStop handler
由上可知,當在容器被終結之前,kubernetes將發送一個preStop事件。
7. 重新創建資源,查看容器日志
kubectl apply -f post.yaml
[root@master test]# kubectl apply -f post.yaml
pod/lifesycle-test created
kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
[root@master test]# kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
Hello initContainers
Hello from the postStart handler
Hello from the postStop handler
Hello initContainers
Hello from the postStart handler
四、總結
1. 探針
探針分為3種
1.livenessProbe(存活探針)∶判斷容器是否正常運行,如果失敗則殺掉容器(不是pod),再根據重啟策略是否重啟容器
2.readinessProbe(就緒探針)∶判斷容器是否能夠進入ready狀態,探針失敗則進入noready狀態,並從service的endpoints中剔除此容器
3.startupProbe∶判斷容器內的應用是否啟動成功,在success狀態前,其它探針都處於無效狀態
2. 檢查方式
檢查方式分為3種
1.exec∶使用 command 字段設置命令,在容器中執行此命令,如果命令返回狀態碼為0,則認為探測成功
2.httpget∶通過訪問指定端口和url路徑執行http get訪問。如果返回的http狀態碼為大於等於200且小於400則認為成功
3.tcpsocket∶通過tcp連接pod(IP)和指定端口,如果端口無誤且tcp連接成功,則認為探測成功
3. 常用的探針可選參數
常用的探針可選參數有4個
1.initialDelaySeconds∶ 容器啟動多少秒后開始執行探測
2.periodSeconds∶探測的周期頻率,每多少秒執行一次探測
3.failureThreshold∶探測失敗后,允許再試幾次
4.timeoutSeconds ∶ 探測等待超時的時間