最近在linux做服務器開發的時候,發現了一個現象:服務器在啟動的時候調用了 connect 函數,因為連接了一個不可用的端口,導致connect最后報出了 “Connection timed out” 的錯誤。但是這中間過了六十多秒的時間。
為何會等待這么長的時間才超時呢?這個時間又在哪里設置?
《UNIX網絡編程(第一卷)——套接口 API 和 X/Open 傳輸接口 API》一書的4.3節有寫到:
對於TCP套接口來說,函數 connect 激發TCP的三路握手過程,且僅在鏈接成功建立或出錯時才返回,返回的錯誤可能有如下幾種情況:
1. 如果TCP客戶沒有收到SYN分節的響應,則返回ETIMEDOUT。例如在4.4BSD中,當調用函數 connect 時,發出一個SYN,若無響應,等待6秒之后再發一個;若仍無響應,24秒鍾之后再發一個。若總共等待了75秒鍾之后仍未響應,則返回錯誤...
從書中可以看到 connect 建立TCP鏈接的過程中,會發送SYN包,如果沒有收到SYN包的回包,內核會多次發送SYN包,並且每次重試的間隔會逐漸增加,避免發送太多的SYN包影響網絡。
在CentOS上,這個重試次數是可以設置的:
$ sysctl net.ipv4 | grep tcp net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_retrans_collapse = 1 net.ipv4.tcp_syn_retries = 5 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_max_tw_buckets = 262144 net.ipv4.tcp_keepalive_time = 7200 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_retries2 = 15 net.ipv4.tcp_fin_timeout = 60 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_abort_on_overflow = 0 net.ipv4.tcp_stdurg = 0 net.ipv4.tcp_rfc1337 = 0 net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.tcp_orphan_retries = 0 net.ipv4.tcp_fack = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_ecn = 2 net.ipv4.tcp_dsack = 1 net.ipv4.tcp_mem = 639168 852224 1278336 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_app_win = 31 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_frto = 2 net.ipv4.tcp_frto_response = 0 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_no_metrics_save = 0 net.ipv4.tcp_moderate_rcvbuf = 1 net.ipv4.tcp_tso_win_divisor = 3 net.ipv4.tcp_congestion_control = cubic net.ipv4.tcp_abc = 0 net.ipv4.tcp_mtu_probing = 0 net.ipv4.tcp_base_mss = 512 net.ipv4.tcp_workaround_signed_windows = 0 net.ipv4.tcp_challenge_ack_limit = 1000 net.ipv4.tcp_limit_output_bytes = 262144 net.ipv4.tcp_dma_copybreak = 4096 net.ipv4.tcp_slow_start_after_idle = 1 net.ipv4.tcp_available_congestion_control = cubic reno net.ipv4.tcp_allowed_congestion_control = cubic reno net.ipv4.tcp_max_ssthresh = 0 net.ipv4.tcp_thin_linear_timeouts = 0 net.ipv4.tcp_thin_dupack = 0 net.ipv4.tcp_min_tso_segs = 2 net.ipv4.tcp_invalid_ratelimit = 500
其中的 net.ipv4.tcp_syn_retries 選項控制着SYN的重試次數,可以通過如下命令來查看和設置:
$ sysctl net.ipv4.tcp_syn_retries #查看 net.ipv4.tcp_syn_retries = 5 $ sudo sysctl -w net.ipv4.tcp_syn_retries=1 #設置 net.ipv4.tcp_syn_retries = 1
下面用一個簡單的程序,來驗證各種次數下的connect超時時間:
#include <iostream> #include <sys/socket.h> #include <sys/time.h> #include <netinet/in.h> #include <errno.h> #include <string.h> #include <arpa/inet.h> long long GetCurrentMSecond() { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * 1000 + tv.tv_usec / 1000; } int main() { int fd = 0; struct sockaddr_in addr; fd = socket(AF_INET, SOCK_STREAM, 0); socklen_t bufSize = 128 * 1024; int retCode = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &bufSize, sizeof(bufSize)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = inet_addr("192.168.207.128"); addr.sin_port = htons(13500); //連接一個不用的端口,以保證會觸發超時 long long llBeginTime = GetCurrentMSecond(); if (connect(fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) { long long llEndTime = GetCurrentMSecond(); std::cout << "connect failed, errno: " << errno << ", error: " << strerror(errno) << ", cost time: " << llEndTime - llBeginTime << std::endl; return 0; } std::cout << "connect success" << std::endl; }
通過設置不同的重試次數,探究各種重試次數下的time out時間:
$ g++ connect.cpp -o main $ sudo sysctl -w net.ipv4.tcp_syn_retries=1 net.ipv4.tcp_syn_retries = 1 $ ./main connect failed, errno: 110, error: Connection timed out, cost time: 3000 $ sudo sysctl -w net.ipv4.tcp_syn_retries=2 net.ipv4.tcp_syn_retries = 2 $ ./main connect failed, errno: 110, error: Connection timed out, cost time: 7000
重試次數 | 超時時間(單位:毫秒) |
1 | 3000 |
2 | 7000 |
3 | 14999 |
4 | 31000 |
5 | 63001 |
6 | 126999 |
從表格中可以看到,當前設置重試次數為5的時候,超時時間是63秒,可以通過修改重試次數的方式,來改變connect的超時時間。