TCP源碼—連接建立


一、SYN報文處理:

公共部分:tcp_v4_rcv->tcp_v4_do_rcv->tcp_v4_cookie_check(無處理動作)->tcp_rcv_state_process->tcp_v4_conn_request[conn_request]->tcp_conn_request(傳入兩個參數tcp_request_sock_ops和tcp_request_sock_ipv4_ops,該函數內解析SYN報文的option,req初始化工作)

 

syncookie連接:inet_reqsk_alloc(創建request_sock)->[tcp_conn_request]tcp_v4_send_synack [send_synack](發送SYNACK)->tcp_make_synack->[tcp_conn_request]reqsk_free(syncookie下不保存request_sock,因此需要在這里進行釋放)

TFO連接:inet_reqsk_alloc(創建request_sock)->[tcp_conn_request]tcp_try_fastopen(當不需要syncookie的時候嘗試fastopen)->tcp_fastopen_create_child(設置TFO的重傳定時器,接收隨着SYN報文的數據)->tcp_v4_syn_recv_sock[syn_recv_sock]->tcp_create_openreq_child(創建SYN_RCVD狀態的newsk,sock_copy復制listen狀態的成員值)->[tcp_v4_syn_recv_sock]inet_ehash_nolisten(把newsk插入到ehash散列表)->[tcp_conn_request]tcp_v4_send_synack [send_synack](發送SYNACK)->tcp_make_synack->[tcp_conn_request]inet_csk_reqsk_queue_add(把req插入到accept隊列)

普通連接:inet_reqsk_alloc(創建request_sock)->[tcp_conn_request]inet_csk_reqsk_queue_hash_add->reqsk_queue_hash_req(把req插入ehash散列表中並設置普通連接SYNACK的重傳定時器)->inet_ehash_insert->[inet_csk_reqsk_queue_hash_add]inet_csk_reqsk_queue_added(更新半連接邏輯隊列qlen的長度)->[tcp_conn_request]tcp_v4_send_synack [send_synack](發送SYNACK)->tcp_make_synack

 

tcp_conn_request:

1、判斷是否使能syncookie,首先需要內核編譯啟用CONFIG_SYN_COOKIES選項,下面兩種場景下會啟用syncookie

  • tcp_syncookies=2

  • tcp_syncookies=1,半連接邏輯隊列滿、當前狀態不是timewait轉換過來的

2、accept隊列滿,且當前半連接隊列有未重傳的SYNACK請求,則直接丟棄SYN連接請求報文

3、非timwait切換非syncookie場景下,如果tcp_tw_recycle參數有效,則執行更嚴格的PAWS校驗,如果tcp_tw_recycle無效且參數tcp_syncookies=0,則在半連接隊列中預留1/4的空間給TCP metric中已經proved的destination。

 

二、SYNACK的處理

tcp_v4_rcv->tcp_v4_do_rcv->tcp_rcv_state_process->tcp_rcv_synsent_state_process

 

tcp_rcv_synsent_state_process:

ACK標志位有效的正常流程:

1、檢測收到數據包的ack number、TSopt、RST、SYN等的有效性

2、初始化mtup、fack、mss等

3、調用tcp_finish_connect完成連接

4、如果是TFO,需要進入tcp_rcv_fastopen_synack處理TFO相關內容,如重傳未被ACK的數據並重啟重傳定時器

5、如果寫操作被掛起、或者TCP_QUICKACK選項設置為0、或者TCP_DEFER_ACCEPT設置為1,那么設置延遲ACK,正確節省一個ACK報文的消耗。

ACK標志位無效的時候

1、檢測RST是否有效

2、進行paws檢查

3、如果SYN標志位有效,則切換到TCP_SYN_RECV狀態,發送SYNACK進行TCP同開的處理。

 

tcp_finish_connect:更新狀態為TCP_ESTABLISHED

1、初始化metrics

2、觸發擁塞控制模塊init接口來進行擁塞控制模塊的初始化

3、初始化rcvbuf和sndbuf

4、初始化keepalive

5、判斷是否啟動數據包處理的快速路徑

6、通過sock_def_wakeup[sk->sk_state_change]喚醒等待的連接

 

sock_def_wakeup[sk->sk_state_change]

當sk被添加到prequeue或者backlog的時候最終會被tcp_v4_do_rcv處理,請參考release_sock

 

三、最后的ACK處理:

syncookie連接:tcp_v4_rcv->tcp_v4_do_rcv->tcp_v4_cookie_check->cookie_v4_check->inet_reqsk_alloc(創建request_sock)->[cookie_v4_check]tcp_get_cookie_sock->tcp_v4_syn_recv_sock[syn_recv_sock]->tcp_create_openreq_child(創建SYN_RCVD狀態的newsk,sock_copy復制listen狀態的成員值)->[tcp_v4_syn_recv_sock]inet_ehash_nolisten(把newsk插入到ehash隊列)->[tcp_v4_do_rcv] tcp_child_process->tcp_rcv_state_process(把newsk從SYN_RCVD切換到ESTABLISHED)->sock_def_readable[parent->sk_data_ready](喚醒sk->sk_wq中的等待進程)

TFO連接:tcp_v4_rcv->tcp_v4_do_rcv(tcp_v4_rcv還可能切換到其他函數)->tcp_rcv_state_process[把TFO的sock從SYN_RCVD切換到ESTABLISHED]->tcp_check_req(僅做報文有效性檢查會提前return,不處理defer_accept處理,也不調用tcp_v4_syn_recv_sock)->reqsk_fastopen_remove(從fastopen邏輯隊列移除這個TFO連接)

普通連接:tcp_v4_rcv(查找之前創建並插入到ehash的request_sock,狀態為NEW_SYN_RCVD)->tcp_check_req(報文有效性檢測,如PAWS、系列號是否窗口內、純SYN包的SYNACK重傳處理、defer_accept處理等等)->tcp_v4_syn_recv_sock[syn_recv_sock]->tcp_create_openreq_child(創建SYN_RCVD狀態的newsk,sock_copy復制listen狀態的成員值)->[tcp_v4_syn_recv_sock]inet_ehash_nolisten(把newsk插入到ehash散列表,並替換出之前插入的req)->[tcp_check_req]inet_csk_complete_hashdance(更新半連接隊列的長度qlen=qlen-1,並把req插入到accept隊列)->[tcp_v4_rcv]tcp_child_process->tcp_rcv_state_process(把newsk從SYN_RCVD切換到ESTABLISHED)

 

tcp_v4_syn_recv_sock:

Accept隊列滿或者不能新建立newsk的時候返回Null,靜默丟棄三次握手最后的ACK確認報文。

 

__inet_inherit_port[tcp_v4_syn_recv_sock]:

listen后建立的新連接不一定和listen的socket端口一致,tproxy場景可能不一致。

 

tcp_check_req:

接收到PAWS校驗失敗且系列號不相符的SYN報文時候,普通連接回復RST報文並從半連接隊列刪除req,TFO會把sock加入TFO RST邏輯隊列

收到RST報文的時候,普通連接直接從半連接隊列刪除req並不回復RST,TFO會把sock加入TFO RST邏輯隊列

 

四個隊列:

Accept隊列:表示等待用戶層accept的連接,隊列為icsk_accept_queue   sk->sk_ack_backlog

半連接邏輯隊列: 表示非TFO、非syncookie下的TCP普通連接,已經接收到SYN報文但是還沒接收到SYNACK報文的SOCK數量,實際保存在ehash散列表中,隊列長度(&inet_csk(sk)->icsk_accept_queue)->qlen

fastopen邏輯隊列:表示接收到SYN報文但是還沒有進入ESTABLISHED狀態的TFO sock數量,TFO sock存在ehash散列表里面,隊列長度(&inet_csk(sk)->icsk_accept_queue)->fastopenq.qlen   tcp_fastopen_create_child中增加隊列長度   當TFO連接進入ESTABLISHED或者直接進入FIN_WAIT1狀態時候在reqsk_fastopen_remove函數中削減隊列

fastopen rst隊列:&inet_csk(sk)->icsk_accept_queue->fastopenq->{rskq_rst_head,rskq_rst_tail}    RST處理的差異需要額外注意一下    存活60s。關於這個RST隊列描述如下:

To protect the server, it is important to limit the maximum number of

   total pending TFO connection requests, i.e., PendingFastOpenRequests

   (Section 4.2).  When the limit is exceeded, the server temporarily

   disables TFO entirely as described in "Server Cookie Handling"

   (Section 4.1.2).  Then, subsequent TFO requests will be downgraded to

   regular connection requests, i.e., with the data dropped and only

   SYNs acknowledged.  This allows regular SYN flood defense techniques

   [RFC4987] like SYN cookies to kick in and prevent further service

   disruption.

 

   The main impact of SYN floods against the standard TCP stack is not

   directly from the floods themselves costing TCP processing overhead

   or host memory, but rather from the spoofed SYN packets filling up

   the often small listener's queue.

 

   On the other hand, TFO SYN floods can cause damage directly if

   admitted without limit into the stack.  The reset (RST) packets from

   the spoofed host will fuel rather than defeat the SYN floods as

   compared to the non-TFO case, because the attacker can flood more

   SYNs with data and incur more cost in terms of data processing

   resources.  For this reason, a TFO server needs to monitor the

   connections in SYN-RCVD being reset in addition to imposing a

   reasonable max queue length.  Implementations may combine the two,

   e.g., by continuing to account for those connection requests that

   have just been reset against the listener's PendingFastOpenRequests

   until a timeout period has passed.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 






免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM