一、存活性檢測(設置exec探針)
它只有一個可用屬性 "command",用於制定要執行的命令、下面訂一張資源清單liveness-exec.yaml
1、資源清單
[root@master chapter4]# cat liveness-exec.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness-exec name: liveness-exec spec: containers: - name: liveness-demo image: busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - test - -e - /tmp/healthy
上面的資源清單中定義了一個pod對象,基於busybox鏡像啟動一個運行"touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600" 命令的容器
此命令在容器啟動時創建/tmp/healthy"文件,並於60秒之后將其刪除。存活性探針運行"test -e /tmp/healthy" 命令檢查文件的存在性,若文件存在則返回狀態碼0,表示成功通過測試
2、運行
首先執行如下命令,創建pod對象liveness-exec
[root@master chapter4]# kubectl apply -f liveness-exec.yaml pod/liveness-exec created [root@master chapter4]# kubectl get pods liveness-exec NAME READY STATUS RESTARTS AGE liveness-exec 1/1 Running 0 42s
3、驗證效果
在60秒之內使用"kubectl describe pod liveness-exec"查看其詳細信息,其存活性探測不會出現錯誤。而超過60秒之后,再次運行 查看其詳細信息可以發現,存活性探測出現了故障,並且隔更長一段時間之后再查看甚至還可以看到容器重啟的相關信息
[root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec PodScheduled True ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 3m12s (x3 over 7m32s) kubelet, node2 Container liveness-demo failed liveness probe, will be restarted Normal Pulling 2m41s (x4 over 9m16s) kubelet, node2 Pulling image "busybox" Normal Pulled 2m26s (x4 over 8m58s) kubelet, node2 Successfully pulled image "busybox"
另外,輸出信息的"Conditions" 一段中還清晰地顯示了容器健康狀態監測及狀態變化的相關信息:容器當前處於"Running "狀態,但是前一次是為"Terminated",原因是退出碼為137的錯誤信息,它表示進程是被外部信號所終止的,137事實上是由兩部分數字之和生成的:128+signum,其中signum是導致進程終止的信號的數字標識,9表示SIGKILL,這意味着進程是被強行終止的
[root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True State: Running Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Tue, 09 Jun 2020 11:35:32 +0800 Finished: Tue, 09 Jun 2020 11:37:26 +0800 Ready: False Restart Count: 26 Liveness: exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3
待容器重啟完成后再次查看,容器已經處於正常運行狀態,直到文件再次被刪除,存活性探測失敗而重啟。從下面的命令顯示可以看出在4分鍾內已然重啟了兩次
[root@master chapter4]# kubectl get pods liveness-exec NAME READY STATUS RESTARTS AGE liveness-exec 1/1 Running 4 9m14s [root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-exec to node2 Normal Created 4m36s (x3 over 8m58s) kubelet, node2 Created container liveness-demo Normal Started 4m35s (x3 over 8m57s) kubelet, node2 Started container liveness-demo Warning Unhealthy 3m12s (x9 over 7m52s) kubelet, node2 Liveness probe failed: Normal Killing 3m12s (x3 over 7m32s) kubelet, node2 Container liveness-demo failed liveness probe, will be restarted Normal Pulling 2m41s (x4 over 9m16s) kubelet, node2 Pulling image "busybox" Normal Pulled 2m26s (x4 over 8m58s) kubelet, node2 Successfully pulled image "busybox"
需要特別說明的是,exec指定的命令運行於容器中,會消耗容器的可用資源配額,另外,考慮到探測操作的效率本身等因素、探測操作的命令應該簡單和輕量
二、存活性檢測(設置http探針)
1、官方手冊詳解
[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe.httpGet KIND: Pod VERSION: v1 RESOURCE: httpGet <Object> DESCRIPTION: HTTPGet specifies the http request to perform. HTTPGetAction describes an action based on HTTP Get requests. FIELDS: host <string> #請求的主機地址,默認為POD IP;也可以在httpheaders中使用"Host:" 來定義 Host name to connect to, defaults to the pod IP. You probably want to set "Host" in httpHeaders instead. httpHeaders <[]Object> #自定義的請求報文首部 Custom headers to set in the request. HTTP allows repeated headers. path <string> #請求http資源路徑,即URL path Path to access on the HTTP server. port <string> -required- #請求端口,必須字段 Name or number of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME. scheme <string> #建立連接使用的協議,僅可為HTTPS,默認為HTTP Scheme to use for connecting to the host. Defaults to HTTP.
2、資源清單
創建一個專用於httpGet測試頁面的文件healthz:
[root@master chapter4]# cat liveness-http.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-http spec: containers: - name: liveness-demo image: nginx:1.12-alpine ports: - name: http containerPort: 80 lifecycle: postStart: exec: command: - /bin/sh - -c - 'echo Healty > /usr/share/nginx/html/healthz' livenessProbe: httpGet: path: /healthz port: http
3、創建運行
首先創建POD對象
[root@master chapter4]# kubectl apply -f liveness-http.yaml pod/liveness-http created
4、驗證效果
而后查看其監控康狀態監測相關的信息,健康狀態監測正常時,容器也講正常運行
root@master chapter4]# kubectl describe pod liveness-http Name: liveness-http ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-http to node2 Normal Pulling 55s kubelet, node2 Pulling image "nginx:1.12-alpine" Normal Pulled 21s kubelet, node2 Successfully pulled image "nginx:1.12-alpine" Normal Created 21s kubelet, node2 Created container liveness-demo Normal Started 21s kubelet, node2 Started container liveness-demo
接下來借助於"kubectl exec" 命令刪除經由poststart hook創建的測試頁面healthz:
[root@master chapter4]# kubectl exec liveness-http rm /usr/share/nginx/html/healthz kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. [root@master chapter4]# kubectl exec liveness-http rm /usr/share/nginx/html/healthz kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
再次執行"kubectl get pods liveness-http" 查看其詳細的狀態信息,事件輸出中的信息可以表明探測測試失敗,容器被殺掉后進行了重新創建
[root@master chapter4]# kubectl get pods liveness-http NAME READY STATUS RESTARTS AGE liveness-http 1/1 Running 2 5m11s [root@master chapter4]# kubectl describe pod liveness-http Name: liveness-http ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-http to node2 Normal Pulling 5m58s kubelet, node2 Pulling image "nginx:1.12-alpine" Normal Pulled 5m24s kubelet, node2 Successfully pulled image "nginx:1.12-alpine" Warning Unhealthy 2m12s (x6 over 3m2s) kubelet, node2 Liveness probe failed: HTTP probe failed with statuscode: 404 Normal Killing 2m12s (x2 over 2m42s) kubelet, node2 Container liveness-demo failed liveness probe, will be restarted Normal Created 2m11s (x3 over 5m24s) kubelet, node2 Created container liveness-demo Normal Started 2m11s (x3 over 5m24s) kubelet, node2 Started container liveness-demo Normal Pulled 2m11s (x2 over 2m41s) kubelet, node2 Container image "nginx:1.12-alpine" already present on machine
一般來說HTTP類型的探測操作應該針對專用的URL路徑進行,例如:/healthz
另外此URL路徑對應的web資源應該以輕量化的方式在內部對應用程序的個關鍵組件進行全面檢測以確保可正常向客戶端提供完整的服務
需要注意的是:這種檢測試試僅對分層架構中的前一層有效、但重啟操作卻無法解決其后端服務(如數據庫或緩存服務)導致的故障此時容器可能會被一次次的重啟,知道后端服務恢復正常位置。其他兩種檢測方式也存在類似的問題
三、存活性檢測(設置TCP探針)
1、官方手冊詳解
[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe.tcpSocket KIND: Pod VERSION: v1 RESOURCE: tcpSocket <Object> DESCRIPTION: TCPSocket specifies an action involving a TCP port. TCP hooks not yet supported TCPSocketAction describes an action based on opening a socket FIELDS: host <string> #請求連接的目標IP地址,默認POD ip Optional: Host name to connect to, defaults to the pod IP. port <string> -required- #請求連接的目標端口,必選字段 Number or name of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME.
2、模板示例
cat nginx_pod_tcpSocket.yaml apiVersion: v1 kind: Pod metadata: name: tcpSocket spec: containers: - name: nginx image: 10.0.0.11:5000/nginx:1.13 ports: - containerPort: 80 livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 3 periodSeconds: 3
四、存活性探測行為屬性
1、查看存活性探測pod對象的詳細信息
使用"kubectl describe" 命令查看配置了存活性探測的pod對象的詳細信息時,其相關容器中會輸出類似如下一行的內容
[root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Ready: False Restart Count: 10 Liveness: exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3
它給出了探測方式及其額外的配置屬性delay、timeout、period、success和failure及其各自的相關屬性值。
用戶沒有明確定義這些屬性字段時,它們會使用各自的默認值,例如上面顯示出的設定,這些屬性信息可通過"pod.spec.containers.livenessProbe" 的如下屬性字段來給出:
2、官方手冊詳解
kubectl explain pod.spec.containers.livenessProbe
[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe KIND: Pod VERSION: v1 RESOURCE: livenessProbe <Object> DESCRIPTION: Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. FIELDS: exec <Object> One and only one of the following should be specified. Exec specifies the action to take. failureThreshold <integer> #處於成功狀態時,探測操作至少連續多少次的失敗才被視為是檢測不通過、顯示為#failure屬性、默認值為3、最小值為1 Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1. httpGet <Object> HTTPGet specifies the http request to perform. initialDelaySeconds <integer> #存活性探針延遲時長、即容器啟動多久之后再開始第一次探測操作,顯示為delay屬性;默認為0秒、即容器啟動后立刻便開始進行探測 Number of seconds after the container has started before liveness probes are initiated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes periodSeconds <integer> #存活性探針的頻度,顯示為period屬性、默認值為10s、最小值為1s、過高頻率會對pod對象帶來較大的額外開銷、而過低的頻率會使得對錯誤的發應不及時 How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. successThreshold <integer> #處於失敗狀態時、探測操作至少連續多少次的成功才被認為通過檢測,顯示為#success屬性、默認值為1、最小值也為1 Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup. Minimum value is 1. tcpSocket <Object> TCPSocket specifies an action involving a TCP port. TCP hooks not yet supported timeoutSeconds <integer> #存活性探測的超時時長,顯示為timeout屬性,默認為1s、最小值也為1s Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes 根據修改的清單再次創建pod對象並進行效果測試,可以從輸出的詳細信息中看出已經更新到自定義的屬性,其內容如下所示 [root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Ready: False Restart Count: 10 Liveness: exec [test -e /tmp/healthy] delay=5s timeout=2s period=5s #success=1 #failure=3
五、就緒性探測
1、就緒性探測的用途
就緒性探測是用來判斷容器就緒與否的周期性操作、他用於探測容器是否已經初始化完成並可服務於客戶端請求、探測操作返回"success"狀態時,即為傳遞容器已經"就緒"的信號
探測失敗時、就緒性探測不會殺死活重啟容器以保證其健康性,而是通知其尚未就緒,並觸發依賴於其就緒狀態操作(例如從service對象中移除pod對象)以確保客戶端請求接入此pod對象
2、價值所在
價值所在:Pod A 依賴的Pod B因網絡故障等原因而不可用時,Pod A上的服務應該轉為未就緒狀態、以免無法向客戶端提供完整的相應
將容器定義中liveness的字段名替換為readinessProbe即可定義出就緒性探測的配置、一個簡單的示例如下面的配置清單(readiness-exec)所示,它會在pod對象創建完成5秒鍾后使用test -e /tmp/ready命令來探測容器的就緒性,命令執行成功即為就緒、探測周期為5秒鍾:
3、資源清單
[root@master chapter4]# cat readiness-exec.yaml apiVersion: v1 kind: Pod metadata: labels: test: readiness-exec name: readiness-exec spec: containers: - name: readiness-demo image: busybox args: ["/bin/sh", "-c", "while true; do rm -f /tmp/ready; sleep 30; touch /tmp/ready; sleep 300; done"] readinessProbe: exec: command: ["test", "-e", "/tmp/ready"] initialDelaySeconds: 5 periodSeconds: 5
4、創建運行
首先、使用"kubectl create"命令將資源配置清單定義的資源創建到集群中:
[root@master chapter4]# kubectl create -f readiness-exec.yaml pod/readiness-exec created
5、效果驗證
接着、運行"kubectl get -w "命令監視其資源變動信息,由如下命令結果可知,盡管pod對象處於Running狀態,但知道就緒探測命令執行成功后pod資源才轉為"就緒"
[root@master chapter4]# kubectl get pods -l test=readiness-exec -w NAME READY STATUS RESTARTS AGE readiness-exec 0/1 Running 0 22s readiness-exec 1/1 Running 0 50s
另外、還可以從pod對象的詳細信息中得到類似如下的表示其已經處於就緒狀態的信息
[root@master chapter4]# kubectl describe pod readiness-exec Name: readiness-exec ....... Ready: True Restart Count: 0 Readiness: exec [test -e /tmp/ready] delay=5s timeout=1s period=5s #success=1 #failure=3
特別提醒:
未定義就緒性探測的POD迪歐瞎忙活早pod進入"Running" 狀態后將立即就緒,在容器需要時間進行初始化場景中,在應用真正就緒之前
必然無法正常想用客戶請求,因此、生產實踐中,必須為關鍵性pod資源中的容器定義就緒性探測機制,其探測機制的定義請參考4.6節中定義