Docker容器和K8s添加Health Check

本文轉載自查看原文 2020-10-28 22:10 1662 Docker/ k8s/ docker

docker容器啟動后，怎么確認容器運行正常，怎么確認可以對外提供服務了，這就需要health check功能了。

之前對health check的功能不在意，因為只要鏡像跑起來了就是健康的，如果有問題就會運行失敗。在連續兩次收到兩個啟動失敗的issue之后，我決定修正一下。

遇到的問題是，一個web服務依賴mongo容器啟動，通過docker-compose啟動，雖然設置了depends on, 但有時候還是會遇到mongo容器中db實例還沒有完全初始化，web服務已經啟動連接了，然后返回連接失敗。

version: '3.1'

services:
  mongo:
    image: mongo:4
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example
      MONGO_INITDB_DATABASE: yapi
    volumes: 
        - ./mongo-conf:/docker-entrypoint-initdb.d
        - ./mongo/etc:/etc/mongo
        - ./mongo/data/db:/data/db
  yapi:
    build:
      context: ./
      dockerfile: Dockerfile
    image: yapi
    # 第一次啟動使用
    # command: "yapi server"
    # 之后使用下面的命令
    command: "node /my-yapi/vendors/server/app.js"
    depends_on: 
      - mongo

理論上，只有mongo服務啟動后，status變成up，yapi這個服務才會啟動。但確實有人遇到這個問題了。那就看看解決方案。

官方文檔說depends_on並不會等待db ready， emmm 也沒說depends on的標准是什么，是依賴service的status up？

官方說depends on依賴service是running狀態，如果啟動中的狀態也算running的話，確實有可能db沒有ready。官方的說法是，服務依賴和db依賴是一個分布式系統的話題，服務應該自己解決各種網絡問題，畢竟db隨時都有可能斷開，服務應該自己配置重聯策略。

官方推薦是服務啟動前檢查db是否已經啟動了，通過ping的形式等待。搞一個wait-for-it.sh腳本前置檢查依賴。

docker-compose.yml

version: "2"
services:
  web:
    build: .
    ports:
      - "80:8000"
    depends_on:
      - "db"
    command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
  db:
    image: postgres

wait-for-it.sh

#!/bin/sh
# wait-for-postgres.sh

set -e
  
host="$1"
shift
cmd="$@"
  
until PGPASSWORD=$POSTGRES_PASSWORD psql -h "$host" -U "postgres" -c '\q'; do
  >&2 echo "Postgres is unavailable - sleeping"
  sleep 1
done
  
>&2 echo "Postgres is up - executing command"
exec $cmd

Dockerfile中添加Health Check

回歸標題，上面這個問題讓我想起了健康檢查這個東西。於是有了本文總結。那還是記錄下使用容器鏡像的時候怎么作健康檢查吧。

在dockerfile中可以添加HEALTHCHECK指令，檢查后面的cmd是否執行成功，成功則表示容器運行健康。

HEALTHCHECK [OPTIONS] CMD command  在容器中執行cmd，返回0表示成功，返回1表示失敗

HEALTHCHECK NONE  取消base鏡像到當前鏡像之間所有的health check

options

--interval=DURATION (default: 30s) healthcheck檢查時間間隔
--timeout=DURATION (default: 30s) 執行cmd超時時間
--start-period=DURATION (default: 0s) 容器啟動后多久開始執行health check
--retries=N (default: 3) 連續n次失敗則認為失敗

一個檢查80端口的示例

HEALTHCHECK --interval=5m --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

Health check在docker-compose.yml中的配置

在docker-compose.yml中添加healthcheck節點，內容和dockerfile類似。

version: '3.1'

services:
  mongo:
    image: mongo:4
    healthcheck:
      test: ["CMD", "netstat -anp | grep 27017"]
      interval: 2m
      timeout: 10s
      retries: 3

Docker lib官方health check示例

在github上發現了docker library下的healthcheck項目，比如mongo的健康檢查可以這么做：

Dockerfile

FROM mongo

COPY docker-healthcheck /usr/local/bin/

HEALTHCHECK CMD ["docker-healthcheck"]

docker-healthcheck

#!/bin/bash
set -eo pipefail

host="$(hostname --ip-address || echo '127.0.0.1')"

if mongo --quiet "$host/test" --eval 'quit(db.runCommand({ ping: 1 }).ok ? 0 : 2)'; then
	exit 0
fi

exit 1

類色的， mysql

#!/bin/bash
set -eo pipefail

if [ "$MYSQL_RANDOM_ROOT_PASSWORD" ] && [ -z "$MYSQL_USER" ] && [ -z "$MYSQL_PASSWORD" ]; then
	# there's no way we can guess what the random MySQL password was
	echo >&2 'healthcheck error: cannot determine random root password (and MYSQL_USER and MYSQL_PASSWORD were not set)'
	exit 0
fi

host="$(hostname --ip-address || echo '127.0.0.1')"
user="${MYSQL_USER:-root}"
export MYSQL_PWD="${MYSQL_PASSWORD:-$MYSQL_ROOT_PASSWORD}"

args=(
	# force mysql to not use the local "mysqld.sock" (test "external" connectibility)
	-h"$host"
	-u"$user"
	--silent
)

if command -v mysqladmin &> /dev/null; then
	if mysqladmin "${args[@]}" ping > /dev/null; then
		exit 0
	fi
else
	if select="$(echo 'SELECT 1' | mysql "${args[@]}")" && [ "$select" = '1' ]; then
		exit 0
	fi
fi

exit 1

redis

#!/bin/bash
set -eo pipefail

host="$(hostname -i || echo '127.0.0.1')"

if ping="$(redis-cli -h "$host" ping)" && [ "$ping" = 'PONG' ]; then
	exit 0
fi

exit 1

K8s中的健康檢查

實際上，我們用的更多的是使用k8s的健康檢查來標注容器是否健康。

k8s利用 Liveness 和 Readiness 探測機制設置更精細的健康檢查，進而實現如下需求：

零停機部署。
避免部署無效的鏡像。
更加安全的滾動升級。

每個容器啟動時都會執行一個進程，此進程由 Dockerfile 的 CMD 或 ENTRYPOINT 指定。如果進程退出時返回碼非零，則認為容器發生故障，Kubernetes 就會根據 restartPolicy 重啟容器。

在創建Pod時，可以通過liveness和readiness兩種方式來探測Pod內容器的運行情況。liveness可以用來檢查容器內應用的存活的情況來，如果檢查失敗會殺掉容器進程，是否重啟容器則取決於Pod的重啟策略。readiness檢查容器內的應用是否能夠正常對外提供服務，如果探測失敗，則Endpoint Controller會將這個Pod的IP從服務中刪除。

探針的檢測方法有三種：

exec：執行一段命令
HTTPGet：通過一個http請求得到返回的狀態碼
tcpSocket：測試某個端口是否可以連通

每種檢查動作都可能有三種返回狀態。

Success，表示通過了健康檢查
Failure，表示沒有通過健康檢查
Unknown，表示檢查動作失敗

Container Exec

nginx_pod_exec.yaml：

apiVersion: v1
kind: Pod
metadata:
  name: test-exec
  labels:
    app: web
spec:
  containers:
    - name: nginx
      image: 192.168.56.201:5000/nginx:1.13
      ports:
        - containerPort: 80
      args:
        - /bin/sh
        - -c
        - touch /tmp/healthy;sleep 30;rm -rf /tmp/healthy;sleep 600
      livenessProbe:
        exec:
          command:
            - cat
            - /tmp/healthy
        initialDelaySeconds: 5
        periodSeconds: 5

本例創建了一個容器，通過檢查一個文件是否存在來判斷容器運行是否正常。容器運行30秒后，將文件刪除，這樣容器的liveness檢查失敗從而會將容器重啟。

HTTP Health Check

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
    app: httpd
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: docker.io/httpd
    ports:
    - containerPort: 80
    livenessProbe:
      httpGet:
        path: /index.html
        port: 80
        httpHeaders:
        - name: X-Custom-Header
          value: Awesome
      initialDelaySeconds: 5
      periodSeconds: 5

本例通過創建一個服務器，通過訪問 index 來判斷服務是否存活。通過手工刪除這個文件的方式，可以導致檢查失敗，從而重啟容器。

[root@devops-101 ~]# kubectl exec -it liveness-http /bin/sh
# 
# ls
bin  build  cgi-bin  conf  error  htdocs  icons  include  logs	modules
# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 11:39 ?        00:00:00 httpd -DFOREGROUND
daemon       6     1  0 11:39 ?        00:00:00 httpd -DFOREGROUND
daemon       7     1  0 11:39 ?        00:00:00 httpd -DFOREGROUND
daemon       8     1  0 11:39 ?        00:00:00 httpd -DFOREGROUND
root        90     0  0 11:39 ?        00:00:00 /bin/sh
root        94    90  0 11:39 ?        00:00:00 ps -ef
#              
# cd /usr/local/apache2
# ls
bin  build  cgi-bin  conf  error  htdocs  icons  include  logs	modules
# cd htdocs
# ls
index.html
# rm index.html
# command terminated with exit code 137
[root@devops-101 ~]# kubectl describe pod liveness-http
Events:
  Type     Reason     Age               From                 Message
  ----     ------     ----              ----                 -------
  Normal   Scheduled  1m                default-scheduler    Successfully assigned default/liveness-http to devops-102
  Warning  Unhealthy  8s (x3 over 18s)  kubelet, devops-102  Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Pulling    7s (x2 over 1m)   kubelet, devops-102  pulling image "docker.io/httpd"
  Normal   Killing    7s                kubelet, devops-102  Killing container with id docker://liveness:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     1s (x2 over 1m)   kubelet, devops-102  Successfully pulled image "docker.io/httpd"
  Normal   Created    1s (x2 over 1m)   kubelet, devops-102  Created container
  Normal   Started    1s (x2 over 1m)   kubelet, devops-102  Started container

TCP Socket

這種方式通過TCP連接來判斷是否存活，Pod編排示例。

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
    app: node
  name: liveness-tcp
spec:
  containers:
  - name: goproxy
    image: docker.io/googlecontainer/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

readiness 檢查實例

另一種 readiness配置方式和liveness類似，只要修改livenessProbe改為readinessProbe即可。

一些參數解釋

initialDelaySeconds：檢查開始執行的時間，以容器啟動完成為起點計算
periodSeconds：檢查執行的周期，默認為10秒，最小為1秒
timeoutSeconds：檢查超時的時間，默認為1秒，最小為1秒
successThreshold：從上次檢查失敗后重新認定檢查成功的檢查次數閾值（必須是連續成功），默認為1
failureThreshold：從上次檢查成功后認定檢查失敗的檢查次數閾值（必須是連續失敗），默認為1
httpGet的屬性
- host：主機名或IP
- scheme：鏈接類型，HTTP或HTTPS，默認為HTTP
- path：請求路徑
- httpHeaders：自定義請求頭
- port：請求端口

參考

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 k8s的健康性檢查-Health Check k8s的Health Check（健康檢查） ASP.NET Core on K8S深入學習（6）Health Check linux運維、架構之路-K8s健康檢查Health Check k8s部署docker容器容器化-K8s與Docker的網絡 k8s容器內操作-----如-添加hosts k8s部署docker容器 k8s之容器 k8s集群之Docker安裝鏡像加速器配置與k8s容器網絡