linux詭異的半連接(SYN_RECV)隊列長度


linux詭異的半連接(SYN_RECV)隊列長度(一)

>>轉載請注明來源:飄零的代碼 piao2010 ’s blog,謝謝!^_^
>>本文鏈接地址:linux詭異的半連接(SYN_RECV)隊列長度(一)

最近在學習TCP方面的基礎知識,對於古老的SYN Flood也有了更多認識。SYN Flood利用的是TCP協議缺陷,發送大量偽造的TCP連接請求,從而使得被攻擊方資源耗盡(CPU滿負荷或內存不足)的攻擊方式。
SYN Flood的原理簡單,實現也不復雜,而且網上有許多現成的程序。

我在兩台虛擬機上(虛擬機C攻擊虛擬機S)做測試,S上跑了apache監聽80端口,用C對S的80端口發送SYN Flood,在無任何防護的情況下攻擊效果顯著。用netstat可以看見80端口存在大量的半連接狀態(SYN_RECV),用tcpdump抓包可以看見大量偽造IP發來的SYN連接,S也不斷回復SYN+ACK給對方,可惜對方並不存在(如果存在則S會收到RST這樣就失去效果了),所以會超時重傳。
這個時候如果有正常客戶A請求S的80端口,它的SYN包就被S丟棄了,因為半連接隊列已經滿了,達到攻擊目的。

對於SYN Flood的防御一般會提到修改 net.ipv4.tcp_synack_retries, net.ipv4.tcp_syncookies, net.ipv4.tcp_max_syn_backlog
目的就是減小SYN+ACK重傳次數,增加半連接隊列長度,啟用syn cookie。
當S開啟syn cookie的時候情況會緩解,一旦半連接隊列滿了系統就會啟用syn cookie功能,同時在/var/log/messages記錄kernel: possible SYN flooding on port 80. Sending cookies.
但也不是可以完全防御的,如果說攻擊瞬間並發量足夠大,畢竟S的CPU內存有限,一般大公司都有專業的防火牆設備來應對。

其中對於net.ipv4.tcp_max_syn_backlog的描述一般都稱為半連接隊列的長度,但在我實際測試的過程中卻發現SYN_RECV狀態的數量與net.ipv4.tcp_max_syn_backlog設置的值相差甚遠。
S系統配置如下:
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_syn_backlog = 4096
但SYN_RECV狀態的數量卻只有256

於是就開始相關資料,首先想到的是TCP/IP詳解卷1中提到的backlog,man 2 listen:
int listen(int sockfd, int backlog);
The backlog parameter defines the maximum length the queue of pending connections may grow to. If a connection request arrives with
the queue full the client may receive an error with an indication of ECONNREFUSED or, if the underlying protocol supports retransmis-
sion, the request may be ignored so that retries succeed.

NOTES
The behaviour of the backlog parameter on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely estab-
lished sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for
incomplete sockets can be set using the tcp_max_syn_backlog sysctl. When syncookies are enabled there is no logical maximum length
and this sysctl setting is ignored. See tcp(7) for more information.

可見backlog在Linux 2.2之后表示的是已完成三次握手但還未被應用程序accept的隊列長度。

man 7 tcp:
tcp_max_syn_backlog (integer; default: see below)
The maximum number of queued connection requests which have still not received an acknowledgement from the connecting client.
If this number is exceeded, the kernel will begin dropping requests. The default value of 256 is increased to 1024 when the
memory present in the system is adequate or greater (>= 128Mb), and reduced to 128 for those systems with very low memory (<=
32Mb). It is recommended that if this needs to be increased above 1024, TCP_SYNQ_HSIZE in include/net/tcp.h be modified to
keep TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be recompiled.

可見tcp_max_syn_backlog確實是半連接隊列的長度,那為何會不准呢?
這時候正好讓同事也在兩台機器上測試了一下,得到的數據居然與tcp_max_syn_backlog完全一致。
開始懷疑是系統哪個地方配置有問題,又發現一個可疑的配置 net.core.somaxconn 它是listen的第二個參數int backlog的上限值,如果程序里的backlog大於
net.core.somaxconn的話就會取net.core.somaxconn的值。S系統的net.core.somaxconn = 128

?View Code C
 
//file:net/socket.c
 
SYSCALL_DEFINE2(listen, int, fd, int, backlog)
{
	struct socket *sock;
	int err, fput_needed;
	int somaxconn;
 
	sock = sockfd_lookup_light(fd, &err, &fput_needed);
	if (sock) {
		somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
		//上限不超過somaxconn
		if ((unsigned)backlog > somaxconn)
			backlog = somaxconn;
 
		err = security_socket_listen(sock, backlog);
		if (!err)
			err = sock->ops->listen(sock, backlog);
 
		fput_light(sock->file, fput_needed);
	}
	return err;
}

查了apache文檔關於ListenBackLog 指令的說明,默認值是511. 可見最終全連接隊列(backlog)應該是net.core.somaxconn = 128
證實這點比較容易,用慢連接攻擊測試觀察到虛擬機S的80端口ESTABLISHED狀態最大數量384
正好等於256(apache prefork模式MaxClients即apache可以響應的最大並發連接數) + 128(backlog即已完成三次握手等待apache accept的最大連接數)。說明全連接隊列長度等於min(backlog,somaxconn);
好久沒寫這么多文字了,下回linux詭異的半連接(SYN_RECV)隊列長度(二)繼續

 
 
 

linux詭異的半連接(SYN_RECV)隊列長度(二)

>>轉載請注明來源:飄零的代碼 piao2010 ’s blog,謝謝!^_^
>>本文鏈接地址:linux詭異的半連接(SYN_RECV)隊列長度(二)

繼續上回:我們已經確認了全連接隊列的長度計算,接下來繼續尋找半連接隊列長度。
試着慢慢減小tcp_max_syn_backlog的值,但還是看不到半連接狀態數量的變化。
實在沒什么思路,只能Google之,搜出來的基本都是關於SYN Flood的文章,難道沒同學關注過半連接隊列的長度嗎?
困擾數日終於在某個夜晚被我找一篇題為《關於半連接隊列的釋疑》的文章,激動吶。根據作者提供的思路我開始翻代碼,注意我用的內核版本2.6.32,不同版本代碼也有差異。
首先定位到tcp_v4_conn_request函數,在文件netipv4tcp_ipv4.c中。

?View Code C
 
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
	struct inet_request_sock *ireq;
	struct tcp_options_received tmp_opt;
	struct request_sock *req;
	__be32 saddr = ip_hdr(skb)->saddr;
	__be32 daddr = ip_hdr(skb)->daddr;
	__u32 isn = TCP_SKB_CB(skb)->when;
	struct dst_entry *dst = NULL;
#ifdef CONFIG_SYN_COOKIES
	int want_cookie = 0;
#else
#define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
#endif
 
	/* Never answer to SYNs send to broadcast or multicast */
	if (skb->rtable->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
		goto drop;
 
	/* TW buckets are converted to open requests without
	 * limitations, they conserve resources and peer is
	 * evidently real one.
	 */
 
	//關鍵函數inet_csk_reqsk_queue_is_full
	if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
#ifdef CONFIG_SYN_COOKIES
		if (sysctl_tcp_syncookies) {
			want_cookie = 1;
		} else
#endif
		goto drop;
	}
 
	/* Accept backlog is full. If we have already queued enough
	 * of warm entries in syn queue, drop request. It is better than
	 * clogging syn queue with openreqs with exponentially increasing
	 * timeout.
	 */
	if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
		goto drop;
 
	req = inet_reqsk_alloc(&tcp_request_sock_ops);
	if (!req)
		goto drop;
省略N多代碼

跟進關鍵函數inet_csk_reqsk_queue_is_full,在文件includenetinet_connection_sock.h中。

?View Code C
 
static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
{
	return reqsk_queue_is_full(&inet_csk(sk)->icsk_accept_queue);
}

跟進關鍵函數reqsk_queue_is_full,在文件includenetrequest_sock.h中。

?View Code C
 
static inline int reqsk_queue_is_full(const struct request_sock_queue *queue)
{
        //注意這里是用>>(右移)來判斷的,不是大於號
	return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
}

查找qlen和max_qlen_log的定義,在文件includenetrequest_sock.h中。

?View Code C
 
/** struct listen_sock - listen state
 *
 * @max_qlen_log - log_2 of maximal queued SYNs/REQUESTs
 */
struct listen_sock {
	u8			max_qlen_log;// 2^max_qlen_log = 半連接隊列最大長度
	/* 3 bytes hole, try to use */
	int			qlen;//全連接隊列的當前長度
	int			qlen_young;
	int			clock_hand;
	u32			hash_rnd;
	u32			nr_table_entries;
	struct request_sock	*syn_table[0];
};

可見關鍵是如何計算max_qlen_log,前一篇博客提到了listen的系統調用:

?View Code C
 
//file:net/socket.c
 
SYSCALL_DEFINE2(listen, int, fd, int, backlog)
{
	struct socket *sock;
	int err, fput_needed;
	int somaxconn;
 
	sock = sockfd_lookup_light(fd, &err, &fput_needed);
	if (sock) {
		somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
		//上限不超過somaxconn
		if ((unsigned)backlog > somaxconn)
			backlog = somaxconn;
 
		err = security_socket_listen(sock, backlog);
		if (!err)
		        //這里是關鍵。
			err = sock->ops->listen(sock, backlog);
 
		fput_light(sock->file, fput_needed);
	}
	return err;
}

sock->ops->listen其實是inet_listen,在文件netipv4af_inet.c中。

?View Code C
 
int inet_listen(struct socket *sock, int backlog)
{
	struct sock *sk = sock->sk;
	unsigned char old_state;
	int err;
 
	lock_sock(sk);
 
	err = -EINVAL;
	if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
		goto out;
 
	old_state = sk->sk_state;
	if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
		goto out;
 
	/* Really, if the socket is already in listen state
	 * we can only allow the backlog to be adjusted.
	 */
	if (old_state != TCP_LISTEN) {
	        //關鍵函數inet_csk_listen_start
		err = inet_csk_listen_start(sk, backlog);
		if (err)
			goto out;
	}
	sk->sk_max_ack_backlog = backlog;
	err = 0;
 
out:
	release_sock(sk);
	return err;
}

跟進inet_csk_listen_start,在文件netipv4inet_connection_sock.c中。

?View Code C
 
int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
{
	struct inet_sock *inet = inet_sk(sk);
	struct inet_connection_sock *icsk = inet_csk(sk);
 
	//關鍵函數reqsk_queue_alloc
	int rc = reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries);
  //后面省略
}

跟進reqsk_queue_alloc,在文件netcorerequest_sock.c中。

?View Code C
 
int reqsk_queue_alloc(struct request_sock_queue *queue,
		      unsigned int nr_table_entries)
{
	size_t lopt_size = sizeof(struct listen_sock);
	struct listen_sock *lopt;
 
  //這里開始影響到nr_table_entries的取值,內核版本小於2.6.20的話nr_table_entries是不會修改的
        nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
	nr_table_entries = max_t(u32, nr_table_entries, 8);
	nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
 
	//nr_table_entries到這里已經確定
	lopt_size += nr_table_entries * sizeof(struct request_sock *);
	if (lopt_size > PAGE_SIZE)
		lopt = __vmalloc(lopt_size,
			GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
			PAGE_KERNEL);
	else
		lopt = kzalloc(lopt_size, GFP_KERNEL);
	if (lopt == NULL)
		return -ENOMEM;
 
  //這里確定了lopt->max_qlen_log的值
	for (lopt->max_qlen_log = 3;
	     (1 << lopt->max_qlen_log) < nr_table_entries;//內核版本小於2.6.20的話這里是sysctl_max_syn_backlog
	     lopt->max_qlen_log++);
 
	get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd));
	rwlock_init(&queue->syn_wait_lock);
	queue->rskq_accept_head = NULL;
	lopt->nr_table_entries = nr_table_entries;
 
	write_lock_bh(&queue->syn_wait_lock);
	queue->listen_opt = lopt;
	write_unlock_bh(&queue->syn_wait_lock);
 
	return 0;
}

代碼到此為止,然后我們計算一下為何在虛擬機S上的SYN_RECV狀態數量會是256

nr_table_entries = listen的第二個參數int backlog ,上限是系統的somaxconn
若 somaxconn = 128 sysctl_max_syn_backlog = 4096 backlog = 511 則 nr_table_entries = 128

nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
取兩者較小的一個 nr_table_entries = 128

nr_table_entries = max_t(u32, nr_table_entries, 8);
取兩者較大的一個 nr_table_entries = 128

nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); //roundup_pow_of_two - round the given value up to nearest power of two
roundup_pow_of_two(128 + 1) = 256

for (lopt->max_qlen_log = 3; (1 << lopt->max_qlen_log) < nr_table_entries; lopt->max_qlen_log++);
max_qlen_log = 8

判斷半連接隊列是否滿 queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
queue->listen_opt->qlen = 256 時reqsk_queue_is_full返回1 , 進入drop
所以queue->listen_opt->qlen 取值 0~255, 因此SYN_RECV狀態數量會是 256

另外同事的測試結果為何與我的不同?
因為內核版本小於2.6.20的話max_qlen_log是直接由sysctl_max_syn_backlog決定的,所以半連接隊列的長度就是等於sysctl_max_syn_backlog
文章有點長,不過總算是把問題給解決了。這里要特別感謝雨哥(博客),很多代碼是他帶着我分析的。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM