1 .健康檢查
健康檢查(Health Check)是讓系統知道您的應用實例是否正常工作的簡單方法。 如果您的應用實例不再工作,則其他服務不應訪問該應用或向其發送請求。 相反,應該將請求發送到已准備好的應用程序實例,或稍后重試。 系統還應該能夠使您的應用程序恢復健康狀態。
強大的自愈能力是 Kubernetes 這類容器編排引擎的一個重要特性。自愈的默認實現方式是自動重啟發生故障的容器。除此之外,用戶還可以利用Liveness 和 Readiness 探測機制設置更精細的健康檢查,進而實現如下需求:
- 零停機部署。
- 避免部署無效的鏡像。
- 更加安全的滾動升級。
2 .探針類型
Liveness存活性探針
Liveness探針讓Kubernetes知道你的應用程序是活着還是死了。 如果你的應用程序還活着,那么Kubernetes就不管它了。 如果你的應用程序已經死了,Kubernetes將刪除Pod並啟動一個新的替換它。
Readiness就緒性探針
Readiness探針旨在讓Kubernetes知道您的應用何時准備好其流量服務。 Kubernetes確保Readiness探針檢測通過,然后允許服務將流量發送到Pod。 如果Readiness探針開始失敗,Kubernetes將停止向該容器發送流量,直到它通過。 判斷容器是否處於可用
Ready
狀態
,
達到
ready
狀態表示
pod
可以接受請求
,
如果不健康,
從
service
的后端
endpoint
列表中把
pod
隔離出去。
3 .探針執行方式
HTTP
HTTP探針可能是最常見的自定義Liveness探針類型。 即使您的應用程序不是HTTP服務,您也可以在應用程序內創建輕量級HTTP服務以響應Liveness探針。 Kubernetes去ping一個路徑,如果它得到的是200或300范圍內的HTTP響應,它會將應用程序標記為健康。 否則它被標記為不健康。
httpget配置項
host:連接的主機名,默認連接到pod的IP。你可能想在http header中設置"Host"而不是使用IP。 scheme:連接使用的schema,默認HTTP。 path: 訪問的HTTP server的path。 httpHeaders:自定義請求的header。HTTP運行重復的header。 port:訪問的容器的端口名字或者端口號。端口號必須介於1和65535之間。
Exec
對於Exec探針,Kubernetes則只是在容器內運行命令。 如果命令以退出代碼0返回,則容器標記為健康。 否則,它被標記為不健康。 當您不能或不想運行HTTP服務時,此類型的探針則很有用,但是必須是運行可以檢查您的應用程序是否健康的命令。
TCP
最后一種類型的探針是TCP探針,Kubernetes嘗試在指定端口上建立TCP連接。 如果它可以建立連接,則容器被認為是健康的;否則被認為是不健康的。
如果您有HTTP探針或Command探針不能正常工作的情況,TCP探測器會派上用場。 例如,gRPC或FTP服務是此類探測的主要候選者。
4 .Liveness-exec樣例
執行命令。容器的狀態由命令執行完返回的狀態碼確定。如果返回的狀態碼是0,則認為pod是健康的,如果返回的是其他狀態碼,則認為pod不健康,這里不停的重啟它。
#cat liveness_exec.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness-exec name: liveness-exec spec: containers: - name: liveness-exec image: busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5
periodSeconds字段指定kubelet應每5秒執行一次活躍度探測。
initialDelaySeconds字段告訴kubelet它應該在執行第一個探測之前等待5秒。 要執行探測,kubelet將在Container中執行命令cat /tmp/healthy。 如果命令成功,則返回0,並且kubelet認為Container是活動且健康的。 如果該命令返回非零值,則kubelet會終止容器並重新啟動它。
5 .readiness-exec樣例
apiVersion: apps/v1 kind: Deployment metadata: name: busybox-deployment namespace: default labels: app: busybox spec: selector: matchLabels: app: busybox replicas: 3 template: metadata: labels: app: busybox spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent ports: - containerPort: 80 args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5
pod啟動,創建健康檢查文件,這個時候是正常的,30s后刪除,ready變成0,但pod沒有被刪除或者重啟,k8s只是不管他了,仍然可以登錄
[root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-deployment-6f86ddd894-l9phc 0/1 Running 0 3m10s busybox-deployment-6f86ddd894-lh46t 0/1 Running 0 3m busybox-deployment-6f86ddd894-sz5c2 0/1 Running 0 3m17s
我們再登錄進去,手動創建健康檢查文件,健康檢查通過
[root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-deployment-6f86ddd894-l9phc 1/1 Running 0 7m44s busybox-deployment-6f86ddd894-lh46t 0/1 Running 0 7m34s busybox-deployment-6f86ddd894-sz5c2 1/1 Running 0 7m51s [root@k8s-master health]# [root@k8s-master health]# [root@k8s-master health]# [root@k8s-master health]# kubectl exec -it busybox-deployment-6f86ddd894-lh46t /bin/sh / # touch tmp/healthy / # [root@k8s-master health]# kubeget pods NAME READY STATUS RESTARTS AGE busybox-deployment-6f86ddd894-l9phc 1/1 Running 0 8m21s busybox-deployment-6f86ddd894-lh46t 1/1 Running 0 8m11s busybox-deployment-6f86ddd894-sz5c2 1/1 Running 0 8m28s [root@k8s-master health]#
6.iveness-http樣例
#cat liveness_http.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: default labels: app: nginx spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent ports: - containerPort: 80 livenessProbe: httpGet: path: /index.html port: 80 httpHeaders: - name: X-Custom-Header value: hello initialDelaySeconds: 5 periodSeconds: 3
7 readiness-http樣例
創建一個2個副本的deployment
# cat readiness_http.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: default labels: app: nginx spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent ports: - containerPort: 80 readinessProbe: httpGet: path: /index.html port: 80 httpHeaders: - name: X-Custom-Header value: hello initialDelaySeconds: 5 periodSeconds: 3
創建一個svc能訪問
# cat readiness_http_svc.yaml apiVersion: v1 kind: Service metadata: name: nginx spec: type: NodePort ports: - port: 80 nodePort: 30001 selector: #標簽選擇器 app: nginx
服務可以訪問
[root@k8s-master health]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-7db8445987-9wplj 1/1 Running 0 57s 10.254.1.81 k8s-node-1 <none> <none> nginx-deployment-7db8445987-mlc6d 1/1 Running 0 57s 10.254.2.65 k8s-node-2 <none> <none> [root@k8s-master health]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d3h nginx NodePort 10.108.167.58 <none> 80:30001/TCP 27m [root@k8s-master health]# [root@k8s-master health]# curl -I 10.6.76.24:30001/index.html HTTP/1.1 200 OK Server: nginx/1.17.3 Date: Tue, 03 Sep 2019 04:40:05 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Tue, 13 Aug 2019 08:50:00 GMT Connection: keep-alive ETag: "5d5279b8-264" Accept-Ranges: bytes [root@k8s-master health]# curl -I 10.6.76.23:30001/index.html HTTP/1.1 200 OK Server: nginx/1.17.3 Date: Tue, 03 Sep 2019 04:40:11 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Tue, 13 Aug 2019 08:50:00 GMT Connection: keep-alive ETag: "5d5279b8-264" Accept-Ranges: bytes
修改Nginx pod
[root@k8s-master health]# kubectl exec -it nginx-deployment-7db8445987-9wplj /bin/bash root@nginx-deployment-7db8445987-9wplj:/# cd /usr/share/nginx/html/ root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# ls 50x.html index.html root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# rm -f index.html root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# nginx -s reload 2019/09/03 03:58:52 [notice] 14#14: signal process started root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# root@nginx-deployment-7db8445987-9wplj:/usr/share/nginx/html# exit [root@k8s-master health]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-7db8445987-9wplj 0/1 Running 0 110s 10.254.1.81 k8s-node-1 <none> <none> nginx-deployment-7db8445987-mlc6d 1/1 Running 0 110s 10.254.2.65 k8s-node-2 <none> <none> [root@k8s-master health]# curl -I 10.254.1.81/index.html HTTP/1.1 404 Not Found Server: nginx/1.17.3 Date: Tue, 03 Sep 2019 03:59:16 GMT Content-Type: text/html Content-Length: 153 Connection: keep-alive [root@k8s-master health]# kubectl describe pod nginx-deployment-7db8445987-9wplj Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 43m default-scheduler Successfully assigned default/nginx-deployment-7db8445987-9wplj tok8s-node-1 Normal Pulled 43m kubelet, k8s-node-1 Container image "nginx" already present on machine Normal Created 43m kubelet, k8s-node-1 Created container nginx Normal Started 43m kubelet, k8s-node-1 Started container nginx Warning Unhealthy 3m47s (x771 over 42m) kubelet, k8s-node-1 Readiness probe failed: HTTP probe failed with statuscode: 404
不再分發流量
我們把Nginx pod的index這個健康檢查文件刪除,並且把Nginx reload,k8s根據readiness把ready變成0,把它從集群摘除,不再分發流量,我們查看 一下兩個pod的日志
[root@k8s-master health]# kubectl logs nginx-deployment-7db8445987-mlc6d | tail -10 10.254.1.0 - - [03/Sep/2019:04:43:44 +0000] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.29.0" "-" 10.254.1.0 - - [03/Sep/2019:04:43:45 +0000] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.29.0" "-" 10.254.1.0 - - [03/Sep/2019:04:43:45 +0000] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.29.0" "-" 10.254.2.1 - - [03/Sep/2019:04:43:46 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" 10.254.2.1 - - [03/Sep/2019:04:43:49 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" 10.254.2.1 - - [03/Sep/2019:04:43:52 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" 10.254.2.1 - - [03/Sep/2019:04:43:55 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" 10.254.2.1 - - [03/Sep/2019:04:43:58 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" 10.254.2.1 - - [03/Sep/2019:04:44:01 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" 10.254.2.1 - - [03/Sep/2019:04:44:04 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-" [root@k8s-master health]# kubectl logs nginx-deployment-7db8445987- | tail -10 nginx-deployment-7db8445987-9wplj nginx-deployment-7db8445987-mlc6d [root@k8s-master health]# kubectl logs nginx-deployment-7db8445987-9wplj | tail -10 2019/09/03 04:44:11 [error] 15#15: *939 open() "/usr/share/nginx/html/index.html" failed (2: No such file or directory), client: 10.254.1.1, server: localhost, request: "GET /index.html HTTP/1.1", host: "10.254.1.81:80" 10.254.1.1 - - [03/Sep/2019:04:44:11 +0000] "GET /index.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-" 10.254.1.1 - - [03/Sep/2019:04:44:14 +0000] "GET /index.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-" 2019/09/03 04:44:14 [error] 15#15: *940 open() "/usr/share/nginx/html/index.html" failed (2: No such file or directory), client: 10.254.1.1, server: localhost, request: "GET /index.html HTTP/1.1", host: "10.254.1.81:80" 2019/09/03 04:44:17 [error] 15#15: *941 open() "/usr/share/nginx/html/index.html" failed (2: No such file or directory), client: 10.254.1.1, server: localhost, request: "GET /index.html HTTP/1.1", host: "10.254.1.81:80" 10.254.1.1 - - [03/Sep/2019:04:44:17 +0000] "GET /index.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-" 10.254.1.1 - - [03/Sep/2019:04:44:20 +0000] "GET /index.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-" 2019/09/03 04:44:20 [error] 15#15: *942 open() "/usr/share/nginx/html/index.html" failed (2: No such file or directory), client: 10.254.1.1, server: localhost, request: "GET /index.html HTTP/1.1", host: "10.254.1.81:80" 2019/09/03 04:44:23 [error] 15#15: *943 open() "/usr/share/nginx/html/index.html" failed (2: No such file or directory), client: 10.254.1.1, server: localhost, request: "GET /index.html HTTP/1.1", host: "10.254.1.81:80" 10.254.1.1 - - [03/Sep/2019:04:44:23 +0000] "GET /index.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-" [root@k8s-master health]#
8 TCP liveness和readiness探針
TCP檢查的配置與HTTP檢查非常相似,主要對於沒有http接口的pod,像MySQL,Redis,等等
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: default labels: app: nginx spec: selector: matchLabels: app: nginx replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent ports: - containerPort: 80 livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 3 readinessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 3
9 Probe詳細配置
initialDelaySeconds
:容器啟動后第一次執行探測是需要等待多少秒。
periodSeconds
:執行探測的頻率。默認是
10
秒,最小
1
秒。
timeoutSeconds
:探測超時時間。默認
1
秒,最小
1
秒。
successThreshold
:探測失敗后,最少連續探測成功多少次才被認定為成功。默認是
1
。對於
liveness
必須是
1
。最小值是
1
。
failureThreshold
:探測成功后,最少連續探測失敗多少次才被認定為失敗。默認是
3
。最小值是
1
。
HTTP probe
中可以給
httpGet
設置其他配置項:
使用Liveness探針時需要配置一個非常重要的設置,就是initialDelaySeconds設置。
Liveness探針失敗會導致Pod重新啟動。 在應用程序准備好之前,您需要確保探針不會啟動。 否則,應用程序將不斷重啟,永遠不會准備好!
10 健康檢查在擴容中的應用readiness
對於多副本應用,當執行 Scale Up 操作時,新副本會作為 backend 被添加到 Service 的負責均衡中,與已有副本一起處理客戶的請求。考慮到應用啟動通常都需要一個准備階段,比如加載緩存數據,連接數據庫等,從容器啟動到正真能夠提供服務是需要一段時間的。我們可以通過 Readiness 探測判斷容器是否就緒,避免將請求發送到還沒有 ready 的 backend。
以上面readiness-http為例
# cat liveness_http.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: default labels: app: nginx spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent ports: - containerPort: 80 livenessProbe: httpGet: path: /index.html port: 80 httpHeaders: - name: X-Custom-Header value: hello initialDelaySeconds: 5 periodSeconds: 3 # cat readiness_http_svc.yaml apiVersion: v1 kind: Service metadata: name: nginx spec: type: NodePort ports: - port: 80 nodePort: 30001 selector: #標簽選擇器 app: nginx
- 容器啟動 5 秒之后開始探測。
- 如果 http://[container_ip]:80/index.html 返回代碼不是 200-400,表示容器沒有就緒,不接收 Service web-svc 的請求。
- 每隔 3 秒再探測一次。
- 直到返回代碼為 200-400,表明容器已經就緒,然后將其加入到 web-svc 的負責均衡中,開始處理客戶請求。
- 探測會繼續以 5 秒的間隔執行,如果連續發生 3 次失敗,容器又會從負載均衡中移除,直到下次探測成功重新加入。
- 對於生產環境中重要的應用都建議配置 Health Check,保證處理客戶請求的容器都是准備就緒的 Service backend。
我們手動擴容一下,在pod的健康檢查沒有通過之前,新起的pod就不加入集群
[root@k8s-master health]# kubectl scale deployment nginx-deployment --replicas=5 deployment.extensions/nginx-deployment scaled [root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-deployment-7db8445987-k9df8 0/1 ContainerCreating 0 3s nginx-deployment-7db8445987-mlc6d 1/1 Running 0 3h14m nginx-deployment-7db8445987-q5d9k 0/1 ContainerCreating 0 3s nginx-deployment-7db8445987-w2w2t 0/1 ContainerCreating 0 3s nginx-deployment-7db8445987-zwj8t 1/1 Running 0 8m4s
11.健康檢查在滾動更新中的應用
現有一個正常運行的多副本應用,接下來對應用進行更新(比如使用更高版本的 image),Kubernetes 會啟動新副本,然后發生了如下事件:
l 正常情況下新副本需要 10 秒鍾完成准備工作,在此之前無法響應業務請求。
l 但由於人為配置錯誤,副本始終無法完成准備工作(比如無法連接后端數據庫)。
因為新副本本身沒有異常退出,默認的 Health Check 機制會認為容器已經就緒,進而會逐步用新副本替換現有副本,其結果就是:當所有舊副本都被替換后,整個應用將無法處理請求,無法對外提供服務。如果這是發生在重要的生產系統上,后果會非常嚴重。
如果正確配置了 Health Check,新副本只有通過了 Readiness 探測,才會被添加到 Service;如果沒有通過探測,現有副本不會被全部替換,業務仍然正常進行。
app.v1模擬一個5個副本的應用
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: app spec: replicas: 5 template: metadata: labels: run: app spec: containers: - name: app image: busybox args: - /bin/sh - -c - sleep 10; touch /tmp/healthy; sleep 30000 readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 10 periodSeconds: 5
10 秒后副本能夠通過 Readiness 探測
[root@k8s-master health]# kubectl apply -f app_v1.yaml deployment.extensions/app unchanged [root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE app-6dd7f876c4-5hvdl 1/1 Running 0 6m17s app-6dd7f876c4-9vcp7 1/1 Running 0 6m17s app-6dd7f876c4-k59mm 1/1 Running 0 6m17s app-6dd7f876c4-trw8f 1/1 Running 0 6m17s app-6dd7f876c4-wrhz8 1/1 Running 0 6m17s [root@k8s-master health]#
接下來滾動更新應用
# cat app_v2.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: app spec: replicas: 5 template: metadata: labels: run: app spec: containers: - name: app image: busybox args: - /bin/sh - -c - sleep 30000 readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 10 periodSeconds: 5
[root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE app-6dd7f876c4-5hvdl 1/1 Running 0 14m app-6dd7f876c4-9vcp7 1/1 Running 0 14m app-6dd7f876c4-k59mm 1/1 Running 0 14m app-6dd7f876c4-trw8f 1/1 Running 0 14m app-6dd7f876c4-wrhz8 1/1 [root@k8s-master health]# kubectl apply -f app_v2.yaml deployment.extensions/app configured [root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE app-6dd7f876c4-5hvdl 1/1 Running 0 14m app-6dd7f876c4-9vcp7 1/1 Running 0 14m app-6dd7f876c4-k59mm 1/1 Terminating 0 14m app-6dd7f876c4-trw8f 1/1 Running 0 14m app-6dd7f876c4-wrhz8 1/1 Running 0 14m app-7fbf9d8fb7-g99hn 0/1 ContainerCreating 0 2s app-7fbf9d8fb7-ltlv5 0/1 ContainerCreating 0 3s [root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE app-6dd7f876c4-5hvdl 1/1 Running 0 14m app-6dd7f876c4-9vcp7 1/1 Running 0 14m app-6dd7f876c4-k59mm 1/1 Terminating 0 14m app-6dd7f876c4-trw8f 1/1 Running 0 14m app-6dd7f876c4-wrhz8 1/1 Running 0 14m app-7fbf9d8fb7-g99hn 0/1 ContainerCreating 0 9s app-7fbf9d8fb7-ltlv5 0/1 Running 0 10s [root@k8s-master health]# [root@k8s-master health]# kubectl get pods NAME READY STATUS RESTARTS AGE app-6dd7f876c4-5hvdl 1/1 Running 0 15m app-6dd7f876c4-9vcp7 1/1 Running 0 15m app-6dd7f876c4-trw8f 1/1 Running 0 15m app-6dd7f876c4-wrhz8 1/1 Running 0 15m app-7fbf9d8fb7-g99hn 0/1 Running 0 68s app-7fbf9d8fb7-ltlv5 0/1 Running 0 69s [root@k8s-master health]#
- 從 Pod 的 AGE 欄可判斷,最后 2 個 Pod 是新副本,目前處於 NOT READY 狀態。
- 舊副本從最初 5個減少到 4 個。
[root@k8s-master health]# kubectl get deployment app NAME READY UP-TO-DATE AVAILABLE AGE app 4/5 2 4 16m
l DESIRED 5 表示期望的狀態是 5個 READY 的副本。
l UP-TO-DATE 2 表示當前已經完成更新的副本數:即 2 個新副本。
l AVAILABLE 4 表示當前處於 READY 狀態的副本數:即 4個舊副本。
在我們的設定中,新副本始終都無法通過 Readiness 探測,所以這個狀態會一直保持下去。
上面我們模擬了一個滾動更新失敗的場景。不過幸運的是:Health Check 幫我們屏蔽了有缺陷的副本,同時保留了大部分舊副本,業務沒有因更新失敗受到影響。
滾動更新可以通過參數 maxSurge 和 maxUnavailable 來控制副本替換的數量。