Nginx配置主動健康檢查
在使用nginx的反向代理過程中,對於服務器節點的健康檢查和故障轉移很重要。
早期使用nginx的時候,用來做故障轉移用到的是如下配置(比較粗暴)
upstream portals
{
server172.16.68.134:8082 max_fails=1 fail_timeout=5;
server172.16.68.135:8082 max_fails=1 fail_timeout=5;
server172.16.68.136:8082 max_fails=1 fail_timeout=5;
server172.16.68.137:8082 max_fails=1 fail_timeout=5;
}
經過實際測試,在5s內,如果第一個服務器節點都不返回,在這5s內,請求不會向這台服務器轉發,5s的超時時間到了,再次發起請求,就按照輪轉規則,該到這台服務器還是會過去,這時候再經歷5s,請求不會到這台服務器。這樣子實際達不到想要的效果,在服務沒恢復以前,請求不到這台服務器。
之后在網上找關於nginx健康檢查的模塊、組件 ,找到了淘寶的nginx_upstream_check_module。
安裝過程比較簡單,nginx增加這個module的編譯即可。
Github地址:
https://github.com/yaoweibin/nginx_upstream_check_module
下載地址:
https://www.sumaott.com/download/%E5%B7%A5%E5%85%B7/nginx_upstream_check_module-0.3.0.tar.gz
nginx、pcre的編譯目錄均默認/home/soft,將下載的nginx_upstream_check_module-0.3.0.tar.gz上傳至/home/soft后解壓:
tar -zxvf nginx_upstream_check_module-0.3.0.tar.gz
重新編譯:
#進入編譯目錄
cd /home/soft/nginx-1.10.1
#打補丁
patch -p0 < ../nginx_upstream_check_module-0.3.0/check_1.11.1+.patch
#確認configure參數與現網一致,只增加一個module
./configure --prefix=/usr/local/nginx --with-pcre=/home/soft/pcre-8.36/ --with-http_stub_status_module --with-http_ssl_module --add-module=/home/soft/nginx_upstream_check_module-0.3.0/
#執行make
make
#備份現網nginx執行文件
cd /usr/local/nginx/sbin
mv nginx nginx.old.20181016
#拷貝升級后的執行文件到現網目錄
cp /home/soft/nginx-1.10.1/objs/nginx /usr/local/nginx/sbin
#測試nginx版本及是否正常
./nginx -V
./nginx -t
在nginx中用到的配置是:
upstream portals {
server 192.166.62.137:8080;
server 192.166.66.85:8080;
server 192.166.62.231:8080;
server 192.166.66.88:8080;
check interval=5000 rise=2 fall=5 timeout=1000 type=http;
check_http_send"HEAD / HTTP/1.0\r\n\r\n"; check_http_expect_alive http_2xx http_3xx;
}
server {
listen 8080;
charset utf-8;
location /status {
check_status;
access_log off;
#allow 192.166.62.25;
#deny all;
}
location / {
proxy_pass http://portal_service_pool;
index index.html;
}
interval間隔5s,連續失敗5次,連續成功2次,超時時間1s,使用http協議,發送一個請求頭,如果是2xx或者3xx狀態(比如200,302等)表示服務正常運行。
可以開啟注釋的配置,以使只有固定ip可以查看status頁面,其他ip無法訪問此location。
修改完成后reload nginx使配置生效。
sbin/nginx -s reload
觀察主動的健康檢查效果:
在一台服務器上執行ab並發測試:
ab -n 20000 -c 10 "http://192.166.62.104:8080/PortalServer-App/index.html"
查看104上nginx日志:
tail -f logs/access.log
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.000"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
可見status中正常的后端可以被負載到,從而實現主動健康檢查的效果。