關於TCP 半連接隊列和全連接隊列

本文轉載自查看原文 2020-01-15 17:13 1158

tomcat最根本就是一個Socket Server，於是我找到了org.apache.tomcat.util.net.DefaultServerSocketFactory#createSocket(int port, int backlog)，最終就是這個方法執行new java.net.ServerSocket(port, backlog)啟動了一個ServerSocket實例。

查看ServerSocket API就發現@param backlog the maximum length of the queue.

明確了，tomcat的acceptCount就是ServerSocket的等待隊列。

但設置的acceptCount怎么設置到backlog上呢，我翻了好一會兒代碼才注意到org.apache.catalina.connector.Connector中有一個變態的HashMap通過這個HashMap把參數名做了一次轉換，再賦值給Http11Protocol使用。這樣的變態我想應該是想方便tomcat的使用者吧，畢竟整一個backlog參數誰知道是干什么的，另外這個HashMap也把其它參數名做了轉換，代碼如下

Java代碼

protected static HashMap replacements = new HashMap();
static {
replacements.put("acceptCount", "backlog");
replacements.put("connectionLinger", "soLinger");
replacements.put("connectionTimeout", "soTimeout");
replacements.put("connectionUploadTimeout", "timeout");
replacements.put("clientAuth", "clientauth");
replacements.put("keystoreFile", "keystore");
replacements.put("randomFile", "randomfile");
replacements.put("rootFile", "rootfile");
replacements.put("keystorePass", "keypass");
replacements.put("keystoreType", "keytype");
replacements.put("sslProtocol", "protocol");
replacements.put("sslProtocols", "protocols");
}
// ------------------------------------------------------------- Properties
/**
* Return a configured property.
*/
public Object getProperty(String name) {
String repl = name;
if (replacements.get(name) != null) {
repl = (String) replacements.get(name);
}
return IntrospectionUtils.getProperty(protocolHandler, repl);
}
/**
* Set a configured property.
*/
public boolean setProperty(String name, String value) {
String repl = name;
if (replacements.get(name) != null) {
repl = (String) replacements.get(name);
}
return IntrospectionUtils.setProperty(protocolHandler, repl, value);
}

總結：acceptCount參數其實就是new java.net.ServerSocket(port, backlog)的第二個參數，了解后再設置就不會盲目了。

myblog
https://www.iteye.com/blog/shilimin-1607722

TCP連接的ACCEPT隊列

https://blog.csdn.net/sinat_20184565/article/details/87887118

服務器accept隊列溢出及其解決

之前對我的NetServer服務器進行測試，在經壓力測試一段時間之后，數據曲線降0，之后所有的連接都連不上，我認為不是服務器掛了就是監聽端口出問題了，於是看了下服務器還在運行，端口還在listened(通過命令查看：netstat -ltp),非常奇怪，這說明監聽正常，能夠進行三次握手的。

后來抓包分析，發現三次握手正常建立，但是服務器竟然重傳了第二次握手包，總共5次，根據這個現象來看，表面上是服務器沒有收到客戶端的ACK確認才會觸發第二次握手的重傳的，但是實際上抓包抓到了ACK ，而且單機內部測試不太可能在傳輸過程中丟包。
猜測:收到ACK包后，內核協議棧處理出問題了，並沒有把連接放到全連接隊列中，可能全連接隊列已滿並溢出。

網上看內核協議棧的TCP連接建立過程如下：
於是我執行命令netsta -s | grep -i listen，ss -lnt 發現隊列真的溢出了。
網上查找解決辦法，調大accept隊列，listen參數和somaxconn（默認128），而且又查到了tcp_abort_on_overflow，等於0時直接丟棄，定時器繼續計時；等於1時發送RST給客戶端，斷開連接。

之后又看到相關的網絡內核參數優化的博客：

1. time_wait 問題解決
cat /etc/sysctl.conf
net.ipv4.tcp_tw_reuse = 1 表示開啟重用。允許將TIME-WAIT sockets重新用於新的TCP連接，默認為0，表示關閉；復用連接，1s后；服務器無效
net.ipv4.tcp_tw_recycle = 1 表示開啟TCP連接中TIME-WAIT sockets的快速回收，默認為0，表示關閉。3.5*RTO 內回收
net.ipv4.tcp_timestamps

2.常見網絡內核參數優化 https://www.cnblogs.com/jking10/p/5472386.html
net.ipv4.tcp_syncookies=1
表示開啟SYN Cookies。當出現SYN等待隊列溢出時，啟用cookies來處理，可防范少量SYN攻擊，默認為0，表示關閉；

3.net.ipv4.tcp_max_syn_backlog = 16384
————————————————
版權聲明：本文為CSDN博主「shhchen」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/shhchen/article/details/88554161

這可以理解成：Client的連接請求已經過來了，只不過還沒有完成“三次握手”。因此，Server端需要把當前的請求保存到一個隊列里面，直至當Server再次收到了Client的ACK之后，Server進入ESTABLISHED狀態，此時：serverSocket 從accpet() 阻塞狀態中返回。也就是說：當第三次握手的ACK包到達Server端后，Server從該請求隊列中取出該連接請求，同時Server端的程序從accept()方法中返回。

那么這個請求隊列長度，就是由 backlog 參數指定。那這個隊列是如何實現的呢？這個就和操作系統有關了，感興趣的可參考：How TCP backlog works in Linux

https://www.cnblogs.com/hapjin/p/5774460.html

在 SYN queue 未滿的情況下，在收到 SYN 包后，TCP 協議棧自動回復 SYN,ACK 包，之后在收到 ACK 時，根據 accept queue 狀態進行后續處理；
若 SYN queue 已滿，在收到 SYN 時
若設置 net.ipv4.tcp_syncookies = 0 ，則直接丟棄當前 SYN 包；
若設置 net.ipv4.tcp_syncookies = 1 ，則令 want_cookie = 1 繼續后面的處理；
若 accept queue 已滿，並且 qlen_young 的值大於 1 ，則直接丟棄當前 SYN 包；
若 accept queue 未滿，或者 qlen_young 的值未大於 1 ，則輸出 "possible SYN flooding on port %d. Sending cookies.\n"，生成 syncookie 並在 SYN,ACK 中帶上；
若 accept queue 已滿，在收到三次握手最后的 ACK 時
若設置 tcp_abort_on_overflow = 1 ，則 TCP 協議棧回復 RST 包，並直接從 SYN queue 中刪除該連接信息；
若設置 tcp_abort_on_overflow = 0 ，則 TCP 協議棧將該連接標記為 acked ，但仍保留在 SYN queue 中，並啟動 timer 以便重發 SYN,ACK 包；當 SYN,ACK 的重傳次數超過 net.ipv4.tcp_synack_retries 設置的值時，再將該連接從 SYN queue 中刪除；

https://blog.csdn.net/weixin_33675507/article/details/91961139

記一次驚心的網站 TCP 隊列問題排查經歷

當前最流行的DoS（拒絕服務攻擊）與DDoS（分布式拒絕服務攻擊）的方式之一，這是一種利用TCP協議缺陷，導致被攻擊服務器保持大量SYN_RECV狀態的“半連接”，並且會重試默認5次回應第二個握手包，塞滿TCP等待連接隊列，資源耗盡（CPU滿負荷或內存不足），讓正常的業務請求連接不進來。

————————————————
版權聲明：本文為CSDN博主「weixin_37478507」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/weixin_37478507/article/details/80319089

Java Socket 之參數測試-backlog

之前在做項目的時候，剛好用到Spring-Integration的TCP/IP組件，在定義ServerSocket的過程中，有一個參數backlog比較突出，通過網上的查閱，才知道這是原生Java中ServerSocket的參數。通過查API得知，ServerSocket的構造參數：public ServerSocket(int port,int backlog)，API對於backlog的解析是這樣的：requested maximum length of the queue of incoming connections；大意就是說TCP連接請求隊列的最大容量。最初的理解就是如果ServerSocket由於請求太多處理不過來，后續的客戶端連接就會放到阻塞隊列里面，當隊列滿了（超過backlog定義的容量，默認為50），就會拒絕后續的連接。

　　1. 先上代碼，再說結論：

Server端：

public class TestBackLog {

    public static void main(String[] args) throws IOException, InterruptedException {

        int backlog = 3;
        ServerSocket serverSocket = new ServerSocket(5000, backlog);
        //此處會造成客戶端的連接阻塞，這時就會把request connection放到請求隊列，而請求隊列的容量就是backlog的值
        //當程序啟動，此處睡眠50秒，馬上啟動客戶端，客戶端每一秒鍾起一個，起到第二個的時候，第三個就無法獲得到ServerSocket
        //只能等待，前兩個連接已獲取ServerSocket連接，只有等這兩個處理完了，后續第三個才會拿到連接，進行處理（可從客戶端輸出得出此結論）
        Thread.sleep(50000);//模擬服務端處理高延時任務
        while (true) {
            Socket socket = serverSocket.accept();

            InputStream is = socket.getInputStream();
            OutputStream os = socket.getOutputStream();

            BufferedReader br = new BufferedReader(new InputStreamReader(is));
            PrintWriter out = new PrintWriter(new OutputStreamWriter(os));
            int length = -1;

            char[] buffer = new char[200];
            while (-1 != (length = br.read(buffer, 0, buffer.length))) {
                String string = new String(buffer, 0, length);
                System.out.println("TestBackLog receive String "
                        + socket.getInetAddress() + ":" + socket.getLocalPort() + string);
                out.write("server welcome!");
                out.flush();
                socket.shutdownOutput();
                
            }
            out.close();
            br.close();
            socket.close();
        }
    }
}

Client 端：

public class TcpClient {

    public static void main(String[] args) throws UnknownHostException, IOException, InterruptedException {
        for(int i=0; i<10; i++) {
            Thread.sleep(1000);//創建端口太快，降低創建速率
            new Thread(new ClientThread()).start();
        }
    }
}

class ClientThread implements Runnable {

    @Override
    public void run() {
        try {
            Socket client = new Socket("127.0.0.1", 5000);
            System.out.println("client connected server");
            InputStream is = client.getInputStream();
            OutputStream os = client.getOutputStream();
            
            os.write("clinet hello world!".getBytes());
            client.shutdownOutput();//這一句非常重要啊，如果沒有這句，服務端的read()會一直阻塞
            
            int length = 0;
            byte[] buffer = new byte[200];
            
            while(-1 != (length = is.read(buffer, 0, buffer.length))) {
                System.out.println("client");
                String receiveString = new String(buffer, 0, length);
                System.out.println("receiveString : " + receiveString);
            }
        } catch(Exception e) {
            e.printStackTrace();
        }
    }
}

　　2. 步驟：在win7 64位的機器上運行的，先啟動TestBackLog，TestBackLog創建ServerSocket之后，會進入睡眠狀態。馬上再運行TcpClient，一旦客戶端連上服務端，將會在控制台輸出：client connected server

　　3. 觀察現象並得出結論：當服務端創建服務之后，main線程會睡眠，會睡眠50秒，那如果現在運行TcpClient，將會創建10個客戶端，這時將會出現什么情況？實際情況就是：在控制台只輸出了三次：client connected server，那就意味着只有三個客戶端連上服務端，剛好與backlog所設定的請求隊列容量一致，后續的客戶端再進行連接，則會拋出異常：java.net.ConnectException: Connection refused: connect（這是在windows會發生的情況，在mac上運行也是只有三次輸出，但是其他的客戶端沒有被拒絕，而是直到連接超時）。當服務端睡眠結束，處理最初三個客戶端的請求，之后再把后面的客戶端請求處理完（前提是客戶端的連接沒有超時）。

　　這個實驗剛好驗證了請求的緩存隊列，隊列里的客戶端已經跟服務端建立連接，等待服務端處理。但是后面未進入的隊列的客戶端，會進行new Socket(ip, port)，還在苦苦地跟服務端進行連接啊~

https://blog.csdn.net/z69183787/article/details/81199836

最近碰到一個client端連接異常問題，然后定位分析並查閱各種資料文章，對TCP連接隊列有個深入的理解

查資料過程中發現沒有文章把這兩個隊列以及怎么觀察他們的指標說清楚，希望通過這篇文章能把他們說清楚一點

問題描述

JAVA的client和server，使用socket通信。server使用NIO。
1.間歇性的出現client向server建立連接三次握手已經完成，但server的selector沒有響應到這連接。
2.出問題的時間點，會同時有很多連接出現這個問題。
3.selector沒有銷毀重建，一直用的都是一個。
4.程序剛啟動的時候必會出現一些，之后會間歇性出現。

分析問題

正常TCP建連接三次握手過程：

image.png

第一步：client 發送 syn 到server 發起握手；
第二步：server 收到 syn后回復syn+ack給client；
第三步：client 收到syn+ack后，回復server一個ack表示收到了server的syn+ack（此時client的56911端口的連接已經是established）

從問題的描述來看，有點像TCP建連接的時候全連接隊列（accept隊列）滿了，尤其是症狀2、4. 為了證明是這個原因，馬上通過 ss -s 去看隊列的溢出統計數據：

667399 times the listen queue of a socket overflowed

反復看了幾次之后發現這個overflowed 一直在增加，那么可以明確的是server上全連接隊列一定溢出了

接着查看溢出后，OS怎么處理：

# cat /proc/sys/net/ipv4/tcp_abort_on_overflow
0

tcp_abort_on_overflow 為0表示如果三次握手第三步的時候全連接隊列滿了那么server扔掉client 發過來的ack（在server端認為連接還沒建立起來）

為了證明客戶端應用代碼的異常跟全連接隊列滿有關系，我先把tcp_abort_on_overflow修改成 1，1表示第三步的時候如果全連接隊列滿了，server發送一個reset包給client，表示廢掉這個握手過程和這個連接（本來在server端這個連接就還沒建立起來）。

接着測試然后在客戶端異常中可以看到很多connection reset by peer的錯誤，到此證明客戶端錯誤是這個原因導致的。

於是開發同學翻看java 源代碼發現socket 默認的backlog（這個值控制全連接隊列的大小，后面再詳述）是50，於是改大重新跑，經過12個小時以上的壓測，這個錯誤一次都沒出現過，同時 overflowed 也不再增加了。

到此問題解決，簡單來說TCP三次握手后有個accept隊列，進到這個隊列才能從Listen變成accept，默認backlog 值是50，很容易就滿了。滿了之后握手第三步的時候server就忽略了client發過來的ack包（隔一段時間server重發握手第二步的syn+ack包給client），如果這個連接一直排不上隊就異常了。

深入理解TCP握手過程中建連接的流程和隊列

（圖片來源：http://www.cnxct.com/something-about-phpfpm-s-backlog/）

如上圖所示，這里有兩個隊列：syns queue(半連接隊列）；accept queue（全連接隊列）

三次握手中，在第一步server收到client的syn后，把相關信息放到半連接隊列中，同時回復syn+ack給client（第二步）；

比如syn floods 攻擊就是針對半連接隊列的，攻擊方不停地建連接，但是建連接的時候只做第一步，第二步中攻擊方收到server的syn+ack后故意扔掉什么也不做，導致server上這個隊列滿其它正常請求無法進來

第三步的時候server收到client的ack，如果這時全連接隊列沒滿，那么從半連接隊列拿出相關信息放入到全連接隊列中，否則按tcp_abort_on_overflow指示的執行。

這時如果全連接隊列滿了並且tcp_abort_on_overflow是0的話，server過一段時間再次發送syn+ack給client（也就是重新走握手的第二步），如果client超時等待比較短，就很容易異常了。

在我們的os中retry 第二步的默認次數是2（centos默認是5次）：

net.ipv4.tcp_synack_retries = 2

如果TCP連接隊列溢出，有哪些指標可以看呢？

上述解決過程有點繞，那么下次再出現類似問題有什么更快更明確的手段來確認這個問題呢？

netstat -s

[root@server ~]#  netstat -s | egrep "listen|LISTEN" 
667399 times the listen queue of a socket overflowed
667399 SYNs to LISTEN sockets ignored

比如上面看到的 667399 times ，表示全連接隊列溢出的次數，隔幾秒鍾執行下，如果這個數字一直在增加的話肯定全連接隊列偶爾滿了。

ss 命令

[root@server ~]# ss -lnt
Recv-Q Send-Q Local Address:Port  Peer Address:Port 
0        50               *:3306             *:*

上面看到的第二列Send-Q 表示第三列的listen端口上的全連接隊列最大為50，第一列Recv-Q為全連接隊列當前使用了多少

全連接隊列的大小取決於：min(backlog, somaxconn) . backlog是在socket創建的時候傳入的，somaxconn是一個os級別的系統參數

半連接隊列的大小取決於：max(64, /proc/sys/net/ipv4/tcp_max_syn_backlog)。不同版本的os會有些差異

實踐驗證下上面的理解

把java中backlog改成10（越小越容易溢出），繼續跑壓力，這個時候client又開始報異常了，然后在server上通過 ss 命令觀察到：

Fri May  5 13:50:23 CST 2017
Recv-Q Send-QLocal Address:Port  Peer Address:Port
11         10         *:3306               *:*

按照前面的理解，這個時候我們能看到3306這個端口上的服務全連接隊列最大是10，但是現在有11個在隊列中和等待進隊列的，肯定有一個連接進不去隊列要overflow掉

容器中的Accept隊列參數

Tomcat默認短連接，backlog（Tomcat里面的術語是Accept count）Ali-tomcat默認是200, Apache Tomcat默認100.

#ss -lnt
Recv-Q Send-Q   Local Address:Port Peer Address:Port
0       100                 *:8080            *:*

Nginx默認是511

$sudo ss -lnt
State  Recv-Q Send-Q Local Address:PortPeer Address:Port
LISTEN    0     511              *:8085           *:*
LISTEN    0     511              *:8085           *:*

因為Nginx是多進程模式，也就是多個進程都監聽同一個端口以盡量避免上下文切換來提升性能

進一步思考

如果client走完第三步在client看來連接已經建立好了，但是server上的對應連接實際沒有准備好，這個時候如果client發數據給server，server會怎么處理呢？（有同學說會reset，還是實踐看看）

先來看一個例子：

image.png
（圖片來自：http://blog.chinaunix.net/uid-20662820-id-4154399.html）

如上圖，150166號包是三次握手中的第三步client發送ack給server，然后150167號包中client發送了一個長度為816的包給server，因為在這個時候client認為連接建立成功，但是server上這個連接實際沒有ready，所以server沒有回復，一段時間后client認為丟包了然后重傳這816個字節的包，一直到超時，client主動發fin包斷開該連接。

這個問題也叫client fooling，可以看這里：https://github.com/torvalds/linux/commit/5ea8ea2cb7f1d0db15762c9b0bb9e7330425a071 （感謝淺奕的提示)

從上面的實際抓包來看不是reset，而是server忽略這些包，然后client重傳，一定次數后client認為異常，然后斷開連接。

過程中發現的一個奇怪問題

[root@server ~]# date; netstat -s | egrep "listen|LISTEN" 
Fri May  5 15:39:58 CST 2017
1641685 times the listen queue of a socket overflowed
1641685 SYNs to LISTEN sockets ignored

[root@server ~]# date; netstat -s | egrep "listen|LISTEN" 
Fri May  5 15:39:59 CST 2017
1641906 times the listen queue of a socket overflowed
1641906 SYNs to LISTEN sockets ignored

如上所示：
overflowed和ignored居然總是一樣多，並且都是同步增加，overflowed表示全連接隊列溢出次數，socket ignored表示半連接隊列溢出次數，沒這么巧吧。

翻看內核源代碼（http://elixir.free-electrons.com/linux/v3.18/source/net/ipv4/tcp_ipv4.c）：

image.png

可以看到overflow的時候一定會drop++（socket ignored），也就是drop一定大於等於overflow。

同時我也查看了另外幾台server的這兩個值來證明drop一定大於等於overflow：

server1
150 SYNs to LISTEN sockets dropped

server2
193 SYNs to LISTEN sockets dropped

server3
16329 times the listen queue of a socket overflowed
16422 SYNs to LISTEN sockets dropped

server4
20 times the listen queue of a socket overflowed
51 SYNs to LISTEN sockets dropped

server5
984932 times the listen queue of a socket overflowed
988003 SYNs to LISTEN sockets dropped

那么全連接隊列滿了會影響半連接隊列嗎？

來看三次握手第一步的源代碼（http://elixir.free-electrons.com/linux/v2.6.33/source/net/ipv4/tcp_ipv4.c#L1249）：

image.png

TCP三次握手第一步的時候如果全連接隊列滿了會影響第一步drop 半連接的發生。大概流程的如下：

tcp_v4_do_rcv->tcp_rcv_state_process->tcp_v4_conn_request
//如果accept backlog隊列已滿，且未超時的request socket的數量大於1，則丟棄當前請求  
  if(sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_yong(sk)>1)
      goto drop;

總結

全連接隊列、半連接隊列溢出這種問題很容易被忽視，但是又很關鍵，特別是對於一些短連接應用（比如Nginx、PHP，當然他們也是支持長連接的）更容易爆發。一旦溢出，從cpu、線程狀態看起來都比較正常，但是壓力上不去，在client看來rt也比較高（rt=網絡+排隊+真正服務時間），但是從server日志記錄的真正服務時間來看rt又很短。

希望通過本文能夠幫大家理解TCP連接過程中的半連接隊列和全連接隊列的概念、原理和作用，更關鍵的是有哪些指標可以明確看到這些問題。

另外每個具體問題都是最好學習的機會，光看書理解肯定是不夠深刻的，請珍惜每個具體問題，碰到后能夠把來龍去脈弄清楚。

參考文章：

http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html

http://www.cnblogs.com/zengkefu/p/5606696.html

http://www.cnxct.com/something-about-phpfpm-s-backlog/

http://jaseywang.me/2014/07/20/tcp-queue-%E7%9A%84%E4%B8%80%E4%BA%9B%E9%97%AE%E9%A2%98/

http://jin-yang.github.io/blog/network-synack-queue.html#

http://blog.chinaunix.net/uid-20662820-id-4154399.html

https://www.atatech.org/articles/12919

http://jm.taobao.org/2017/05/25/525-1/

When an application puts a socket into LISTEN state using the listen syscall, it needs to specify a backlog for that socket. The backlog is usually described as the limit for the queue of incoming connections.

Because of the 3-way handshake used by TCP, an incoming connection goes through an intermediate state SYN RECEIVED before it reaches the ESTABLISHED state and can be returned by the accept syscall to the application (see the part of the TCP state diagram reproduced above). This means that a TCP/IP stack has two options to implement the backlog queue for a socket in LISTEN state:

The implementation uses a single queue, the size of which is determined by the backlog argument of the listen syscall. When a SYN packet is received, it sends back a SYN/ACK packet and adds the connection to the queue. When the corresponding ACK is received, the connection changes its state to ESTABLISHED and becomes eligible for handover to the application. This means that the queue can contain connections in two different state: SYN RECEIVED and ESTABLISHED. Only connections in the latter state can be returned to the application by the accept syscall.
The implementation uses two queues, a SYN queue (or incomplete connection queue) and an accept queue (or complete connection queue). Connections in state SYN RECEIVED are added to the SYN queue and later moved to the accept queue when their state changes to ESTABLISHED, i.e. when the ACK packet in the 3-way handshake is received. As the name implies, the accept call is then implemented simply to consume connections from the accept queue. In this case, the backlog argument of the listen syscall determines the size of the accept queue.

Historically, BSD derived TCP implementations use the first approach. That choice implies that when the maximum backlog is reached, the system will no longer send back SYN/ACK packets in response to SYN packets. Usually the TCP implementation will simply drop the SYN packet (instead of responding with a RST packet) so that the client will retry. This is what is described in section 14.5, listen Backlog Queue in W. Richard Stevens’ classic textbook TCP/IP Illustrated, Volume 3.

Note that Stevens actually explains that the BSD implementation does use two separate queues, but they behave as a single queue with a fixed maximum size determined by (but not necessary exactly equal to) the backlog argument, i.e. BSD logically behaves as described in option 1:

The queue limit applies to the sum of […] the number of entries on the incomplete connection queue […] and […] the number of entries on the completed connection queue […].

On Linux, things are different, as mentioned in the man page of the listen syscall:

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.

This means that current Linux versions use the second option with two distinct queues: a SYN queue with a size specified by a system wide setting and an accept queue with a size specified by the application.

The interesting question is now how such an implementation behaves if the accept queue is full and a connection needs to be moved from the SYN queue to the accept queue, i.e. when the ACK packet of the 3-way handshake is received. This case is handled by the tcp_check_req function in net/ipv4/tcp_minisocks.c. The relevant code reads:

 
                  child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL); if (child == NULL) goto listen_overflow;  
         

For IPv4, the first line of code will actually call tcp_v4_syn_recv_sock in net/ipv4/tcp_ipv4.c, which contains the following code:

 
                  if (sk_acceptq_is_full(sk)) goto exit_overflow;  
         

We see here the check for the accept queue. The code after the exit_overflow label will perform some cleanup, update the ListenOverflows and ListenDrops statistics in /proc/net/netstat and then return NULL. This will trigger the execution of the listen_overflow code in tcp_check_req:

 
          listen_overflow: if (!sysctl_tcp_abort_on_overflow) { inet_rsk(req)->acked = 1; return NULL; }  
         

This means that unless /proc/sys/net/ipv4/tcp_abort_on_overflow is set to 1 (in which case the code right after the code shown above will send a RST packet), the implementation basically does… nothing!

To summarize, if the TCP implementation in Linux receives the ACK packet of the 3-way handshake and the accept queue is full, it will basically ignore that packet. At first, this sounds strange, but remember that there is a timer associated with the SYN RECEIVED state: if the ACK packet is not received (or if it is ignored, as in the case considered here), then the TCP implementation will resend the SYN/ACK packet (with a certain number of retries specified by /proc/sys/net/ipv4/tcp_synack_retries and using an exponential backoff algorithm).

This can be seen in the following packet trace for a client attempting to connect (and send data) to a socket that has reached its maximum backlog:

 
            0.000  127.0.0.1 -> 127.0.0.1  TCP 74 53302 > 9999 [SYN] Seq=0 Len=0
000  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=0
000  127.0.0.1 -> 127.0.0.1  TCP 66 53302 > 9999 [ACK] Seq=1 Ack=1 Len=0
000  127.0.0.1 -> 127.0.0.1  TCP 71 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
207  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
623  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
199  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=0
199  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 6#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=0
455  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
123  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
399  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=0
399  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 10#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=0
459  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
599  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=0
599  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 13#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=0
131  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
599  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=0
599  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 16#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=0
491  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
599  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=0
599  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 19#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=0
179  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
491  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
491  127.0.0.1 -> 127.0.0.1  TCP 54 9999 > 53302 [RST] Seq=1 Len=0
 
         

Since the TCP implementation on the client side gets multiple SYN/ACK packets, it will assume that the ACK packet was lost and resend it (see the lines with TCP Dup ACK in the above trace). If the application on the server side reduces the backlog (i.e. consumes an entry from the accept queue) before the maximum number of SYN/ACK retries has been reached, then the TCP implementation will eventually process one of the duplicate ACKs, transition the state of the connection from SYN RECEIVED to ESTABLISHED and add it to the accept queue. Otherwise, the client will eventually get a RST packet (as in the sample shown above).

The packet trace also shows another interesting aspect of this behavior. From the point of view of the client, the connection will be in state ESTABLISHED after reception of the first SYN/ACK. If it sends data (without waiting for data from the server first), then that data will be retransmitted as well. Fortunately TCP slow-start should limit the number of segments sent during this phase.

On the other hand, if the client first waits for data from the server and the server never reduces the backlog, then the end result is that on the client side, the connection is in state ESTABLISHED, while on the server side, the connection is considered CLOSED. This means that we end up with a half-open connection!

There is one other aspect that we didn’t discuss yet. The quote from the listen man page suggests that every SYN packet would result in the addition of a connection to the SYN queue (unless that queue is full). That is not exactly how things work. The reason is the following code in the tcp_v4_conn_request function (which does the processing of SYN packets) in net/ipv4/tcp_ipv4.c:

 
                  /* Accept backlog is full. If we have already queued enough * of warm entries in syn queue, drop request. It is better than * clogging syn queue with openreqs with exponentially increasing * timeout. */ if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); goto drop; }  
         

What this means is that if the accept queue is full, then the kernel will impose a limit on the rate at which SYN packets are accepted. If too many SYN packets are received, some of them will be dropped. In this case, it is up to the client to retry sending the SYN packet and we end up with the same behavior as in BSD derived implementations.

To conclude, let’s try to see why the design choice made by Linux would be superior to the traditional BSD implementation. Stevens makes the following interesting point:

The backlog can be reached if the completed connection queue fills (i.e., the server process or the server host is so busy that the process cannot call accept fast enough to take the completed entries off the queue) or if the incomplete connection queue fills. The latter is the problem that HTTP servers face, when the round-trip time between the client and server is long, compared to the arrival rate of new connection requests, because a new SYN occupies an entry on this queue for one round-trip time. […]

The completed connection queue is almost always empty because when an entry is placed on this queue, the server’s call to accept returns, and the server takes the completed connection off the queue.

The solution suggested by Stevens is simply to increase the backlog. The problem with this is that it assumes that an application is expected to tune the backlog not only taking into account how it intents to process newly established incoming connections, but also in function of traffic characteristics such as the round-trip time. The implementation in Linux effectively separates these two concerns: the application is only responsible for tuning the backlog such that it can call accept fast enough to avoid filling the accept queue); a system administrator can then tune /proc/sys/net/ipv4/tcp_max_syn_backlog based on traffic characteristics.

http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

細說SocketOption，就是要讓你懂TCP

Java的Socket的API中所有控制TCP的SocketOptions

SO_KEEPALIVE setKeepAlive
 SO_OOBINLINE setOOBInline
 SO_RCVBUF setReciveBufferSize
 SO_SNDBUF setSendBufferSize
 SO_TIMEOUT setSoTimeOut
 TCP_NODELAY setTcpNoDelay
SO_REUSEADDR setReuseAddress
 /**
* Connects this socket to the server with a specified timeout value.
* A timeout of zero is interpreted as an infinite timeout. The connection
* will then block until established or an error occurs.
*
* @param endpoint the <code>SocketAddress</code>
* @param timeout the timeout value to be used in milliseconds.
* @throws IOException if an error occurs during the connection
* @throws SocketTimeoutException if timeout expires before connecting
* @throws java.nio.channels.IllegalBlockingModeException
* if this socket has an associated channel,
* and the channel is in non-blocking mode
* @throws IllegalArgumentException if endpoint is null or is a
* SocketAddress subclass not supported by this socket
* @since 1.4
* @spec JSR-51
*/
public void connect(SocketAddress endpoint, int timeout) throws IOException 三次握手，第二次syn+ack的超時時間

 /**
* Create a server with the specified port, listen backlog, and
* local IP address to bind to. The bindAddr argument
* can be used on a multi-homed host for a ServerSocket that
* will only accept connect requests to one of its addresses.
* If bindAddr is null, it will default accepting
* connections on any/all local addresses.
* The port must be between 0 and 65535, inclusive.
* A port number of <code>0</code> means that the port number is
* automatically allocated, typically from an ephemeral port range.
* This port number can then be retrieved by calling
* {@link #getLocalPort getLocalPort}.
*
* If there is a security manager, this method
* calls its <code>checkListen</code> method
* with the <code>port</code> argument
* as its argument to ensure the operation is allowed.
* This could result in a SecurityException.
*
* The <code>backlog</code> argument is the requested maximum number of
* pending connections on the socket. Its exact semantics are implementation
* specific. In particular, an implementation may impose a maximum length
* or may choose to ignore the parameter altogther. The value provided
* should be greater than <code>0</code>. If it is less than or equal to
* <code>0</code>, then an implementation specific default will be used.
* 
* @param port the port number, or <code>0</code> to use a port
* number that is automatically allocated.
* @param backlog requested maximum length of the queue of incoming
* connections.
* @param bindAddr the local InetAddress the server will bind to
*
* @throws SecurityException if a security manager exists and
* its <code>checkListen</code> method doesn't allow the operation.
*
* @throws IOException if an I/O error occurs when opening the socket.
* @exception IllegalArgumentException if the port parameter is outside
* the specified range of valid port values, which is between
* 0 and 65535, inclusive.
*
* @see SocketOptions
* @see SocketImpl
* @see SecurityManager#checkListen
* @since JDK1.1
*/
public ServerSocket(int port, int backlog, InetAddress bindAddr) throws IOException {
//backlog:半連接隊列的長度，默認50
setImpl();
if (port < 0 || port > 0xFFFF)
throw new IllegalArgumentException(
"Port value out of range: " + port);
if (backlog < 1)
backlog = 50;
try {
bind(new InetSocketAddress(bindAddr, port), backlog);
} catch(SecurityException e) {
close();
throw e;
} catch(IOException e) {
close();
throw e;
}
}

方法和參數詳解
一、setSoLinger
在Java Socket中，當我們調用Socket的close方法時，默認的行為是當底層網卡所有數據都發送完畢后，關閉連接

通過setSoLinger方法，我們可以修改close方法的行為

1，setSoLinger(true, 0)

當網卡收到關閉連接請求后，無論數據是否發送完畢，立即發送RST包關閉連接

2，setSoLinger(true, delay_time)

當網卡收到關閉連接請求后，等待delay_time

如果在delay_time過程中數據發送完畢，正常四次揮手關閉連接

如果在delay_time過程中數據沒有發送完畢，發送RST包關閉連接

/**
* Specify a linger-on-close timeout. This option disables/enables
* immediate return from a close() of a TCP Socket. Enabling
* this option with a non-zero Integer timeout means that a
* close() will block pending the transmission and acknowledgement
* of all data written to the peer, at which point the socket is closed
* gracefully. Upon reaching the linger timeout, the socket is
* closed forcefully, with a TCP RST. Enabling the option with a
* timeout of zero does a forceful close immediately. If the specified
* timeout value exceeds 65,535 it will be reduced to 65,535.
* 
* Valid only for TCP: SocketImpl
*
* @see Socket#setSoLinger
* @see Socket#getSoLinger
*/
二、setKeepAlive
/**
* When the keepalive option is set for a TCP socket and no data
* has been exchanged across the socket in either direction for
* 2 hours (NOTE: the actual value is implementation dependent),
* TCP automatically sends a keepalive probe to the peer. This probe is a
* TCP segment to which the peer must respond.
* One of three responses is expected:
* 1. The peer responds with the expected ACK. The application is not
* notified (since everything is OK). TCP will send another probe
* following another 2 hours of inactivity.
* 2. The peer responds with an RST, which tells the local TCP that
* the peer host has crashed and rebooted. The socket is closed.
* 3. There is no response from the peer. The socket is closed.
*
* The purpose of this option is to detect if the peer host crashes.
* Valid only for TCP socket: SocketImpl
當建立TCP鏈接后，如果應用程序或者上層協議一直不發送數據，或者隔很長一段時間才發送數據，當鏈接很久沒有數據報文傳輸時就需要通過keepalive機制去確定對方是否在線，鏈接是否需要繼續保持。當超過一定時間沒有發送數據時，TCP會自動發送一個數據為空的報文給對方，如果對方回應了報文，說明對方在線，鏈接可以繼續保持，如果對方沒有報文返回，則在重試一定次數之后認為鏈接丟失，就不會釋放鏈接。

控制對閑置連接的檢測機制，鏈接閑置達到7200秒，就開始發送探測報文進行探測。

net.ipv4.tcp_keepalive_time：單位秒，表示發送探測報文之前的鏈接空閑時間，默認為7200。
net.ipv4.tcp_keepalive_intvl：單位秒，表示兩次探測報文發送的時間間隔，默認為75。
net.ipv4.tcp_keepalive_probes：表示探測的次數，默認9次。

三、backlog
詳解：http://jm.taobao.org/2017/05/25/525-1/
————————————————
版權聲明：本文為CSDN博主「gold_zwj」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/zwjyyy1203/article/details/93932967

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 關於TCP全連接隊列和半連接隊列 TCP 半連接隊列和全連接隊列滿了會發生什么？又該如何應對？ linux詭異的半連接(SYN_RECV)隊列長度從Linux源碼看Socket(TCP)的listen及連接隊列 TCP 隊列溢出了 tcp連接 TCP的三個接收隊列 pytorch:全連接層 TCP建立連接和斷開連接 TCP長連接與短連接