最近項目需要做單機100萬長連接與高並發的服務器,我們開發完服務器以后,通過自己搭的高速壓測框架壓測服務端的時候,發生了奇怪的現象,就是服務端莫名其妙的少接收了連接,造成了數據包的丟失,通過網上查資料,和自己的實踐,下面是我做實驗,抓包分析的過程如下:
總共5個連接
其中全連接隊列somaxconn參數為1表示監聽隊列的總長度(實際可以完成somaxconn+1個連接的建立)
半連接隊列tcp_max_syn_backlog為1(實際可以將tcp_max_syn_backlog+1個syn分節放入其中)
當全連接隊列已滿且半連接隊列未滿的情況下:
當客戶端發起一個syn分節時,服務端不會丟棄該syn分節,而是直接響應ack和syn,這時客戶端響應ack,並成為established狀態,而服務端收到ack響應后,試圖將該syn分節從半連接隊列中移除,並加入全鏈接隊列,然后由於全連接隊列已經滿了,這時,在默認情況下,服務端啥也不做,而且不會將該連接由SYN_RECV變成ESTABLISHED,服務端僅僅只是創建一個定時器,以固定間隔重傳syn和ack到服務端,直到到達系統默認的synack重傳闔值,然后服務端將處於半連接隊列里面的syn分節丟棄,此時服務端只剩2個連接。這個時候客戶端一側的連接仍然有5個established
當全連接已滿且半連接隊列也滿的情況下:
當客戶端發起一個syn分節時,服務端發現半連接隊列已經滿了,同時該syn分節尚未重傳過,服務端直接丟棄該syn分節,然后客戶端過了4秒重傳syn分節,這個時候服務端發現半連接隊列已滿同時,該syn分節已經重傳過了,服務端收下了該syn分節,並響應客戶端syn+ack,客戶端收到syn+ack后,響應ack。客戶端三次握手已經完成,而服務端收到ack之后,發現全連接隊列是滿的,這個時候不會將該連接從SYN_RECV轉換成ESTABLISHED,服務端創建定時器,定時重傳syn+ack, 直到到達系統默認的synack重傳闔值,然后服務端將處於半連接隊列里面的syn分節丟棄
具體如下
使用tcpdump -i eth0 -l -nn -S port 8000 -c 40 -w out.txt
如下所示,測試環境在Linux 2.6.18-164.el5
全連接隊列 cat /proc/sys/net/core/somaxconn 1
半連接隊列 cat /proc/sys/net/ipv4/tcp_max_syn_backlog 1
監聽端口 8000
以客戶端建立5條連接為例
1.臨時端口號為53991的客戶端連接向監聽端口為8000的服務端連接發起syn分節,客戶端狀態變為SYN_SEND狀態
服務端收到來自客戶端的syn分節后,這個時候由於半連接隊列未滿,直接將syn分節放入半連接隊列,同時向客戶端發送了一個ack+syn分節,此時服務端進入SYN_RECV狀態
客戶端收到服務端發送的syn分節之后,向服務端回傳了ack確認分節,客戶端進入ESTABLISHED狀態
服務端收到來自客戶端的ack確認之后,就試圖把syn分節從半連接隊列中移除,並加入到全連接隊列,由於全連接隊列未滿,此時該syn分節成功轉入全連接隊列,服務端的狀
態從SYN_RECV轉換成了ESTABLISHED狀態
2.臨時端口號為53992的客戶端連接和臨時端口號為53991的連接同樣順利的完成3次握手
根據以上連接基本可以得出一個結論,能建立起全連接隊列長度+1個長連接
3.臨時端口號為53993的客戶端連接向監聽端口為8000的服務端連接發起syn分節,客戶端狀態變為SYN_SEND狀態
服務端收到來自客戶端的syn分節后,這個時候由於半連接隊列未滿,直接將syn分節放入半連接隊列,同時向客戶端發送一個ack+syn分節,此時服務端進入SYN_RECV狀態
客戶端收到服務端發送的syn分節之后,向服務端回傳了ack確認分節,客戶端進入ESTABLISHED狀態
服務端收到來自客戶端的ack確認之后,就試圖把syn分節從半連接隊列中移除,並加入全連接隊列,由於全連接隊列已經滿了,此時該syn分節無法轉入全連接隊列,也就不會
從SYN_RECV轉換為ESTABLISHED狀態系統默認會把該syn標記為acked模式,並建立定時器,定時從服務端重傳syn+ack分節,客戶端此時已經處於ESTABLISHED狀態,
不過在收到服務端發送的syn+ack分節后,仍然響應服務端ack。服務端繼續上面的過程,直到到達synack(cat /proc/sys/net/ipv4/tcp_synack_retries )重傳的闔值,然后拋棄
該syn分節
4.臨時端口號為53994的客戶端連接和臨時端口號為53993的連接同樣的過程
根據以上連接基本可以得出一個結論,半連接隊列能容下半連接隊列長度+1個syn分節
5.臨時端口號為53995的客戶端連接向監聽端口為8000的服務端連接發起syn分節,客戶端狀態變為SYN_SEND狀態
服務端收到來自客戶端的syn分節后,這個時候半連接隊列已經滿了,服務端發現該syn分節尚未重傳過,默認直接丟棄該syn分節,客戶端在超時時間內並未收到來自服務端的
ack響應,大約過了3秒,客戶端重傳了syn分節,這個時候服務端收到syn分節之后,發現該分節已經收到過,這次服務端收下了syn分節,並向客戶端發送syn+ack。客戶端
收到來自服務端的ack+syn之后,向服務端發送ack,客戶端進入ESTABLISHED狀態,服務端收到來自客戶端的ack之后,試圖把syn分節從半連接隊列中移除,並添加到全連
接隊列,由於半連接隊列已滿,加入全連接隊列失敗,和臨時端口號53993以及53994的行為是一樣的
1.
20:25:57.637476 IP 192.168.172.128.53991 > 192.168.172.128.8000: S 3828162025:3828162025(0) win 32792 <mss 16396,sackOK,timestamp 250818 0,nop,wscale 6>
20:25:57.637483 IP 192.168.172.128.8000 > 192.168.172.128.53991: S 3826007261:3826007261(0) ack 3828162026 win 32768 <mss 16396,sackOK,timestamp 250818 250818,nop,wscale 6>
20:25:57.637486 IP 192.168.172.128.53991 > 192.168.172.128.8000: . ack 3826007262 win 513 <nop,nop,timestamp 250818 250818>
2.
20:25:58.638894 IP 192.168.172.128.53992> 192.168.172.128.8000: S 3822825008:3822825008(0) win 32792 <mss 16396,sackOK,timestamp 251820 0,nop,wscale 6>
20:25:58.638984 IP 192.168.172.128.8000 > 192.168.172.128.53992: S 3829073969:3829073969(0) ack 3822825009 win 32768 <mss 16396,sackOK,timestamp 251820 251820,nop,wscale 6>
20:25:58.638995 IP 192.168.172.128.53992 > 192.168.172.128.8000: . ack 3829073970 win 513 <nop,nop,timestamp 251820 251820>
3.
20:25:59.640701 IP 192.168.172.128.53993> 192.168.172.128.8000: S 3828629154:3828629154(0) win 32792 <mss 16396,sackOK,timestamp 252821 0,nop,wscale 6>
20:25:59.640713 IP 192.168.172.128.8000 > 192.168.172.128.53993: S 3833556338:3833556338(0) ack 3828629155 win 32768 <mss 16396,sackOK,timestamp 252821 252821,nop,wscale 6>
20:25:59.640719 IP 192.168.172.128.53993 > 192.168.172.128.8000: . ack 3833556339 win 513 <nop,nop,timestamp 252821 252821>
4.
20:26:00.642260 IP 192.168.172.128.53994 > 192.168.172.128.8000: S 3820886884:3820886884(0) win 32792 <mss 16396,sackOK,timestamp 253823 0,nop,wscale 6>
20:26:00.642345 IP 192.168.172.128.8000> 192.168.172.128.53994: S 3831669364:3831669364(0) ack 3820886885 win 32768 <mss 16396,sackOK,timestamp 253823 253823,nop,wscale 6>
20:26:00.642356 IP 192.168.172.128.53994> 192.168.172.128.8000: . ack 3831669365 win 513 <nop,nop,timestamp 253823 253823>
5.
20:26:01.645291 IP 192.168.172.128.53995> 192.168.172.128.8000: S 3824454212:3824454212(0) win 32792 <mss 16396,sackOK,timestamp 254826 0,nop,wscale 6>
3.
20:26:03.040319 IP 192.168.172.128.8000 > 192.168.172.128.53993: S 3833556338:3833556338(0) ack 3828629155 win 32768 <mss 16396,sackOK,timestamp 256221 252821,nop,wscale 6>
20:26:03.040332 IP 192.168.172.128.53993 > 192.168.172.128.8000: . ack 3833556339 win 513 <nop,nop,timestamp 256221 256221,nop,nop,sack 1 {3833556338:3833556339}>
4.
20:26:03.842281 IP 192.168.172.128.8000 > 192.168.172.128.53994: S 3831669364:3831669364(0) ack 3820886885 win 32768 <mss 16396,sackOK,timestamp 257023 253823,nop,wscale 6>
20:26:03.842295 IP 192.168.172.128.53994 > 192.168.172.128.8000: . ack 3831669365 win 513 <nop,nop,timestamp 257023 257023,nop,nop,sack 1 {3831669364:3831669365}>
5.
20:26:04.645946 IP 192.168.172.128.53995 > 192.168.172.128.8000: S 3824454212:3824454212(0) win 32792 <mss 16396,sackOK,timestamp 257826 0,nop,wscale 6>
20:26:04.645962 IP 192.168.172.128.8000 > 192.168.172.128.53995: S 3832652478:3832652478(0) ack 3824454213 win 32768 <mss 16396,sackOK,timestamp 257826 257826,nop,wscale 6>
20:26:04.645971 IP 192.168.172.128.53995 > 192.168.172.128.8000: . ack 3832652479 win 513 <nop,nop,timestamp 257826 257826>
20:26:08.846521 IP 192.168.172.128.8000 > 192.168.172.128.53995: S 3832652478:3832652478(0) ack 3824454213 win 32768 <mss 16396,sackOK,timestamp 262026 257826,nop,wscale 6>
20:26:08.846534 IP 192.168.172.128.53995 > 192.168.172.128.8000: . ack 3832652479 win 513 <nop,nop,timestamp 262026 262026,nop,nop,sack 1 {3832652478:3832652479}>
3.
20:26:09.246438 IP 192.168.172.128.8000 > 192.168.172.128.53993: S 3833556338:3833556338(0) ack 3828629155 win 32768 <mss 16396,sackOK,timestamp 262426 256221,nop,wscale 6>
20:26:09.246452 IP 192.168.172.128.53993 > 192.168.172.128.8000: . ack 3833556339 win 513 <nop,nop,timestamp 262426 262426,nop,nop,sack 1 {3833556338:3833556339}>
4.
20:26:09.853232 IP 192.168.172.128.8000 > 192.168.172.128.53994: S 3831669364:3831669364(0) ack 3820886885 win 32768 <mss 16396,sackOK,timestamp 263026 257023,nop,wscale 6>
20:26:09.853246 IP 192.168.172.128.53994 > 192.168.172.128.8000: . ack 3831669365 win 513 <nop,nop,timestamp 263026 263026,nop,nop,sack 1 {3831669364:3831669365}>
5.
20:26:15.055318 IP 192.168.172.128.8000 > 192.168.172.128.53995: S 3832652478:3832652478(0) ack 3824454213 win 32768 <mss 16396,sackOK,timestamp 268228 262026,nop,wscale 6>
20:26:15.055335 IP 192.168.172.128.53995 > 192.168.172.128.8000: . ack 3832652479 win 513 <nop,nop,timestamp 268228 268228,nop,nop,sack 1 {3832652478:3832652479}>
3.
20:26:21.260288 IP 192.168.172.128.8000 > 192.168.172.128.53993: S 3833556338:3833556338(0) ack 3828629155 win 32768 <mss 16396,sackOK,timestamp 274430 262426,nop,wscale 6>
20:26:21.260308 IP 192.168.172.128.53993 > 192.168.172.128.8000: . ack 3833556339 win 513 <nop,nop,timestamp 274430 274430,nop,nop,sack 1 {3833556338:3833556339}>