一、TCP keepalived
1. tcp-keepalive,顧名思義,它可以盡量讓 TCP 連接“活着”,或者讓一些對方無響應的 TCP 連接“宣告死亡”。
2. 一些特定環境,防火牆會自動斷開長期無活動的 TCP 連接,tcp-keepalive 可以在連接無活動一段時間后,發送一個空 ack,使 TCP 連接不會被防火牆關閉。
3. 一些時候,對方的服務器可能出現宕機或者網絡中斷等問題, tcp-keepalive 可以幫助斷開這些無響應的連接。
4. tcp-keepalive 需要在應用程序層面針對其所用到的 Socket 進行開啟。操作系統層面無法強制所有 socket 啟用 tcp-keepalive.
二、相關內核參數
1、tcp_keepalive_intvl
tcp_keepalive_intvl (integer; default: 75; since Linux 2.4) The number of seconds between TCP keep-alive probes.
2、tcp_keepalive_probes
tcp_keepalive_probes (integer; default: 9; since Linux 2.2) The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other end.
3、tcp_keepalive_time
(integer; default: 7200; since Linux 2.2) The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep- alives are sent only when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connection is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled.
在連接閑置 tcp_keepalive_time 秒后,發送探測包,如果對方回應ACK,便認為依然在線;
否則間隔 tcp_keepalive_intvl 秒后,持續發送探測包,一直到發送了 tcp_keepalive_probes 個探測包后,還未得到ACK回饋,便認為對方crash了。
三、查看tcp keepalive狀態
# netstat -no|grep ESTABLISHED tcp 0 0 10.16.140.30:11100 10.16.140.16:37848 ESTABLISHED keepalive (12.19/0/0) tcp 0 0 10.16.140.30:11100 10.16.140.16:57178 ESTABLISHED keepalive (5.60/0/0)
四、啟用
tcp-keepalive 需要在應用程序層面啟動,如:python
"""開啟keepalive""" s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) """設置每20秒發送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 20) """對方沒有回應心跳包后,每隔一秒發送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1)
五、測試
客戶端IP:10.30.20.90
服務端IP:10.30.20.125
1、服務端, 監聽9999端口
# nc -l 9999
2、客戶端
#!/usr/bin/python # -*- coding: UTF-8 -*- import time import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) """開啟keepalive""" s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) """設置每20秒發送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 20) """對方沒有回應心跳包后,每隔一秒發送一次心跳包""" s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1) s.connect(('10.30.20.125', 9999)) time.sleep(200)
操作系統參數
# sysctl -a|grep keepalive net.ipv4.tcp_keepalive_intvl = 5 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_time = 10
3、模擬客戶端
1)每隔20s發送一次keep-alive探測包
2)模擬故障
服務端
iptables -A INPUT -p tcp --dport 9999 -j DROP
iptables -A OUTPUT -p tcp --dport 9999 -j DROP
第一次探測包沒有收到應答后,每隔1s發送一次探測包,連續發送5次后,發送rst標志給服務端,重置連接

