Linux 端口限流(tc + iptables)


Linux 端口限流(tc + iptables)

 

關於 iptables

iptables 是包過濾軟件,包過濾的順序如下:

每一個包都會匹配 rule 策略,而每一個 rule 策略會有一個 action,觸發了其中一個 rule 就不會觸發另外一個 rule,但如果要觸發的 rule 放在最后面,那么可以想象,包過濾的效率就會大大降低,所以設計策略的時候要盡量將常用的策略放在最前面,策略的順序可以通過不斷的調整 -A 和 -I 策略,甚至還有 return 的動作,設計 iptables 的人真的很厲害。

這是 iptables 內部的 table 表和 chain 鏈,可以理解為 iptables 是一個大網,table 就是小網,里面的 chain 就是他的網線,當數據包經過這個小網的時候必然會觸碰這些網線,這樣“看不順眼”的數據包就會被攔住。鳥哥的圖畫的真好。這里需要理解的是數據包的流行會分 2 個地方,就是進入本機或者不進入本機,進入本機的包就會走 input 的 chain 鏈,不進入本機的包就會去 FORWARD,什么是進入或者不進入呢?

舉個例子就是這是一台路由器服務器,服務器上面假設了 web 服務器,然后這個路由器負責的內部網絡還有一台數據庫服務器,不過這台服務器是獨立於路由器的另外一台機器,不過上網也是要經過路由器,那么一個外網用戶訪問這個 web 服務器和訪問數據庫服務器的行為就是進入本機和不進入本機的行為,因為 web 服務器是跟路由器在同一台機器上的,所以要進入本機,因為數據庫服務器是另外一台機器上的,所以不進入本機。解釋得好渣,還是看鳥哥吧。鳥哥乃神人。回歸主題,看下圖的結構,可以看出如果我們要在 iptables 上操刀的話可以在任何表上操刀,例如可以在 PREROUTING,FORWARD,POSTROUTING 表上做限速是完全沒有問題的,前提是要注意不能沖突,每個表都有各自的作用。所以一般來說,要寫 iptables 策略的時候都要跟着這個圖來筆畫一下,這樣才能知道有沒有寫錯。

  1. filter (過濾器):主要跟進入 Linux 本機的封包有關,這個是預設的 table 喔!
  2. INPUT:主要與想要進入我們 Linux 本機的封包有關;
  3. OUTPUT:主要與我們 Linux 本機所要送出的封包有關;
  4. FORWARD:這個咚咚與 Linux 本機比較沒有關係, 他可以『轉遞封包』到後端的電腦中,與下列 nat table 相關性較高。
  5. nat (位址轉換):是 Network Address Translation 的縮寫, 這個表格主要在進行來源與目的之 IP 或 port 的轉換,與 Linux 本機較無關,主要與 Linux 主機後的區域網路內電腦較有相關。
  6. PREROUTING:在進行路由判斷之前所要進行的規則(DNAT/REDIRECT)
  7. POSTROUTING:在進行路由判斷之後所要進行的規則(SNAT/MASQUERADE)
  8. OUTPUT:與發送出去的封包有關
  9. mangle (破壞者):這個表格主要是與特殊的封包的路由旗標有關, 早期僅有 PREROUTING 及 OUTPUT 鏈,不過從 kernel 2.4.18 之後加入了 INPUT 及 FORWARD 鏈。 由於這個表格與特殊旗標相關性較高,所以像咱們這種單純的環境當中,較少使用 mangle 這個表格。
Table (表名) Explanation (注釋)
nat nat 表的主要用處是網絡地址轉換,即 Network Address Translation,縮寫為 NAT。做過 NAT 操作的數據包的地址就被改變了,當然這種改變是根據我們的規則進行的。屬於一個流的包只會經過這個表一次。如果第一個包被允許做 NAT 或 Masqueraded,那么余下的包都會自 動地被做相同的操作。也就是說,余下的包不會再通過這個表,一個一個的被 NAT,而是自動地完成。這就 是我們為什么不應該在這個表中做任何過濾的主要原因,對這一點,后面會有更加詳細的討論。PREROUTING 鏈的作用是在包剛剛到達防火牆時改變它的目的地址,如果需要的話。 OUTPUT 鏈改變本地產生的包的目的地址。POSTROUTING 鏈在包就要離開防火牆之前改變其源地址。
mangle 這個表主要用來 mangle 數據包。我們可以改變不同的包及包 頭的內容,比如 TTL,TOS 或 MARK。 注意 MARK 並沒有真正地改動數據包,它只是在內核空間為包設了一個標記。防火牆 內的其他的規則或程序(如 tc)可以使用這種標記對包進行過濾或高級路由。這個表有五個內建的鏈: PREROUTING,POSTROUTING, OUTPUT,INPUT 和 FORWARD。PREROUTING 在包進入防火牆之后、路由判斷之前改變 包, POSTROUTING 是在所有路由判斷之后。 OUTPUT 在確定包的目的之前更改數據包。INPUT 在包被路由到本地 之后,但在用戶空間的程序看到它之前改變包。FORWARD 在最初的路由判 斷之后、最后一次更改包的目的之前mangle包。注意,mangle 表不能做任何 NAT,它只是改變數據包的 TTL,TOS 或 MARK,而不是其源目地址。NAT 是在nat 表中操作的。
filter filter 表是專門過濾包 的,內建三個鏈,可以毫無問題地對包進行 DROP、LOG、ACCEPT 和 REJECT 等操作。FORWARD 鏈過濾所有不是本地產生的並且目的地不是本地(所謂本地就是防火牆了)的包,而 INPUT 恰恰針對那些目的地是本地的包。OUTPUT 是用來過濾所有本地生成的包的

  • iptables 是主要工作在第三,四層的,即主要處理 ip、tcp,偶爾能夠在第七層工作是因為打了 patch。
  • 什么是數據包:其實就是只 ip 數據包和 tcp 數據包

    包(Packet)是 TCP/IP 協議通信傳輸中的數據單位,一般也稱“數據包”。有人說,局域網中傳輸的不是“幀”(Frame)嗎?沒錯,但是 TCP/IP 協議是工作在 OSI 模型第三層(網絡層)、第四層(傳輸層)上的,而幀是工作在第二層(數據鏈路層)。上一層的內容由下一層的內容來傳輸,所以在局域網中,“包”是包含在“幀”里的。

舉例來說 tcp 包的包頭含有以下這些信息(等等):

信息 解釋 iptables 關鍵字
源 IP 地址 發送包的 IP 地址。 src
目的 IP 地址 接收包的 IP 地址。 dst
源端口 源系統上的連接的端口。 sport
目的端口 目的系統上的連接的端口。 dport

 

關於 tc

TC--Traffic Control

TC 是 linux 中的流量控制模塊,利用隊列規定建立起數據包隊列,並定義了隊列中數據包的發送方式,從而實現對流量的控制。關鍵字:隊列系統,包接收和傳輸。

Traffic control is the name given to the sets of queuing systems and mechanisms by which packets are received and transmitted on a router. This includes deciding which (and whether) packets to accept at what rate on the input of an interface and determining which packets to transmit in what order at what rate on the output of an interface.

這里是官方翻譯:http://my.oschina.net/guol/blog/82453?p=1
原版:http://www.tldp.org/HOWTO/Traffic-Control-HOWTO/

tc工作位置圖:

在我使用的過程中,對於他的理解是有一些加深了:

  1. tc 就是看門的,like as a dog,所以這就可以解釋了為什么要 iptables + tc 了,tc 能夠和 iptables 合作,因為可以從圖上看到各自工作的位置是不一樣的,各施其職。
  2. tc 對於包一視同仁,專門負責包的排隊分發,官方里面提到一個很經典的說法就是他是一個接收和傳輸的隊列系統,tc 翻譯為交通管制是很巧妙的,有鑒於此,我認為他的限速效果最好,無論你是 p2p 包還是什么加密包,只要是包就要受到約束,這樣就可以避免了那些日新月異的封裝加密之類的包被逃掉了。
  3. tc 主要是以 mark 的形式來匹配,所以使用的時候 mark 標記需要注意不要沖突,mark 標記是 iptables 里面提到的一個東西:

    6.5.5. MARK target

    用來設置 mark 值,這個值只能在本地的 mangle 表里使用,不能用在其他任何地方,就更不用說路由器或 另一台機子了。因為 mark 比較特殊,它不是包本身的一部分,而是在包穿越計算機的過程中由內核分配的和它相關聯的一個字段。它可以和本地的高級路由功能聯用,以使不同的包能使用不同的隊列要求,等等。如 果你想在傳輸過程中也有這種功能,還是用 TOS target 吧。有關高級路由的更多信息,可以查看 Linux Advanced Routing and Traffic Control HOW-TO。

    mark 只能存在於內核之中,不受三界法則影響,所以 mark 值我覺得是配置 tc 的特別需要注意的地方,尤其是如果你使用了 wifidog 之類的要玩 mark 的時候。

  4. tc 的類是樹架構,有主干和葉這樣很分明的區分的,這種層次是很容易理解的,不過文檔的解釋是相當的難理解,難理解的是怎么做,命令寫法簡直坑爹。

  5. 涉及很多相當高深的隊列算法,流控制模式其實略懂就行了,諸葛先生不也就略懂么。所以不是那種極端情況其實無須特別考慮這個。
  6. 對於 tc 來說,上傳和下載行為是這樣區分的,上傳,就是用戶端發送數據包給服務器,假設路由器是雙網卡,所以負責發送數據包給服務器的是外網網卡,所以限制上傳速度在外網網卡處, 下載,就是服務器發送數據包給用戶,因為路由器是雙網卡的關系,所以負責發送數據包給用戶的是內網網卡,所以限制下載速度是在內網網卡,因為 tc 是一個能夠負責接收數據包的工具,所以限制上傳速度其實就是限制外網網卡接收用戶發送的數據包的速度,而限制下載速度其實就是限制內網網卡接收到要發送給用戶的數據包的速度。

 

測試流程介紹

  1. 首先需要建立 tc 策略
  2. 然后由 iptables 來進行調用,主要通過 set mark,根據不同的 mark 標記來進行不同的 tc 策略調用

備注

  1. 測試環境是 eth0 負責外網,p3p1 是負責內網
  2. 考慮到特殊需求,tc 限制的是所有的包,所以需要 iptables 將發到內網服務器的包分開處理,以便實現訪問外網能夠限制網速,訪問內網沒有限制

上傳

清除 eth0 所有隊列規則

tc qdisc del dev eth0 root 2>/dev/null 

定義最頂層(根)隊列規則,並指定 default 類別編號,為網絡接口 eth1 綁定一個隊列,類型為 htb,並指定了一個 handle 句柄 1:0 用於標識它下面的子類,沒有標識的會被分配到默認子類 123(默認值只是設置而已,可以不用)

tc qdisc add dev eth0 root handle 1:0 htb default 123 

用於為隊列建一個主干類,帶寬為 100 Mbit,最大速率為 100 Mbit,(這里是 bit,所以實際速度需要除以 8)優先級為 0,htb 的主干類不能互相借用帶寬,但是一個父類的所有子類之間可以借用帶寬,這里 parent 1:0 是剛才建立的 handle1:0 ,classid 是他的子類,分類號為 1:1,冒號前面是父類號,后面是子類號

tc class add dev eth0 parent 1:0 classid 1:1 htb rate 100Mbit ceil 100Mbit prio 0 

為主干類建立第一個葉分類,帶寬為 10Mbit,最大速為 10 Mbit,優先級為1,所有葉分類的全部子類優先級低於主干類,以防止重要數據堵塞,主要還是避免邏輯混亂,10 Mbit 必須要有 96 kbit 的 burst 速度

tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10Mbit ceil 10Mbit prio 1 burst 96kbit 

設置調度,sfq 隨機公平算法,這里的 parent 是指隸屬於之前的子分類,你需要對哪一個子分類的條目做隊列分配控制就需要在這里寫對應的子分類 id 在每個類下面再附加上另一個隊列規定,隨機公平隊列(SFQ),不被某個連接不停占用帶寬,以保證帶寬的平均公平使用:

  1. #SFQ(Stochastic Fairness Queueing,隨機公平隊列),SFQ的關鍵詞是“會話”(或稱作“流”) ,
  2. #主要針對一個TCP會話或者UDP流。流量被分成相當多數量的FIFO隊列中,每個隊列對應一個會話。
  3. #數據按照簡單輪轉的方式發送, 每個會話都按順序得到發送機會。這種方式非常公平,保證了每一
  4. #個會話都不會沒其它會話所淹沒。SFQ之所以被稱為“隨機”,是因為它並不是真的為每一個會話創建
  5. #一個隊列,而是使用一個散列算法,把所有的會話映射到有限的幾個隊列中去。
  6. #參數perturb是多少秒后重新配置一次散列算法。默認為10
tc qdisc add dev eth0 parent 1:11 handle 111:0 sfq perturb 10 

設置過濾器 filter,對應之前配置的哪一個父類和子類,然后設置控制編號 handle,這里是跟 iptables 的 mark 相對應的,並且多個不同的filter注意 prio 不要相同。

tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1001 fw classid 1:11 

設置 iptables規則,在 mangle 表的 postroutingchain 上配置,源地址是 172.16.1.138 並且目標地址不是 192.168.0.10,從網卡eth0發出的包進行 mark,mark 號是 1001

iptables -t mangle -A POSTROUTING -s 172.16.1.138/32 ! -d 192.168.0.10 -o eth0 -j MARK --set-xmark 1001 

設置 return 是為了加快包檢查,return 的順序是:子鏈——>父鏈——>缺省的策略,檢查到源地址是 172.16.1.138 並且目標地址不是 192.168.0.10 的包就會跳到 postrouting 層,然后會繼續檢查其他這層的 chain,這樣不用每個包都要檢查一次這條 chain 的內容了,加快了一倍的速度

iptables -t mangle -A POSTROUTING -o eth0 -s 172.16.1.138 ! -d 192.168.0.10 -j RETURN 

下載

  1. tc qdisc del dev p3p1 root 2>/dev/null
  2. tc qdisc add dev p3p1 root handle 1:0 htb default 123
  3. tc class add dev p3p1 parent 1:0 classid 1:1 htb rate 100Mbit ceil 100Mbit prio 0
  4. tc class add dev p3p1 parent 1:1 classid 1:11 htb rate 10Mbit ceil 10Mbit prio 1
  5. tc qdisc add dev p3p1 parent 1:11 handle 111:0 sfq perturb 10
  6. tc filter add dev p3p1 parent 1:0 protocol ip prio 1 handle 1000 fw classid 1:11

這里用 I 的是 insert 一條配置,這樣排序會放在前面,因為 iptables 是按順序匹配的,並且為了跟 wifidog 的策略避免沖突

  1. iptables -t mangle - I POSTROUTING -o p3p1 -d 172.16.1.138 ! -s 192.168.0.10 -j MARK --set-mark 1000
  2. iptables -t mangle - I POSTROUTING -o p3p1 -d 172.16.1.138 ! -s 192.168.0.10 -j RETURN

 

================

 

htb基礎知識:Linux Htb隊列規定指南中文版:http://wenku.baidu.com/view/64da046825c52cc58bd6beac.html

TC基礎知識: Linux 的高級路由和流量控制LARTC
http://download.csdn.net/detail/wuwentao2000/3963140

 

iptables基礎知識: 中文howto:http://man.chinaunix.net/network/iptables-tutorial-cn-1.1.19.html

 

來源一:http://www.right.com.cn/forum/viewthread.php?tid=71981&highlight=QOS

#現在開始用TC建立數據的上行和下行通道
TCA="tc class add dev br0"
TFA="tc filter add dev br0"
tc qdisc del dev br0 root
tc qdisc add dev br0 root handle 1: htb
tc class add dev br0 parent 1: classid 1:1 htb rate 1600kbit            #這個1600是下行總速度
$TCA parent 1:1 classid 1:10 htb rate 200kbit ceil 400kbit prio 2     #這個是10號通道的下行速度,最小200,最大400,優先級為2
$TCA parent 1:1 classid 1:25 htb rate 1000kbit ceil 1600kbit prio 1   #這是我自己使用的特殊25號通道,下行速度最小1000,最大1600,優先級為1, 呵呵,待遇就是不一樣
$TFA parent 1:0 prio 2 protocol ip handle 10 fw flowid 1:10             
$TFA parent 1:0 prio 1 protocol ip handle 25 fw flowid 1:25
tc qdisc add dev br0 ingress
$TFA parent ffff: protocol ip handle 35 fw police rate 800kbit mtu 12k burst 10k drop      #這是我自己使用的35號上行通道,最大速度800
$TFA parent ffff: protocol ip handle 50 fw police rate 80kbit mtu 12k burst 10k drop         #這是給大伙使用的50號上行通道,最大速度80

#好了,現在用iptables來覺得哪些人走哪些通道吧,哈哈,由於dd wrt的iptables不支持ip range,所以只能每個IP寫一條語句,否則命令無效

iptables -t mangle -A POSTROUTING -d 192.168.1.22 -j MARK --set-mark 10     #ip為192.168.1.22的走10號通道
iptables -t mangle -A POSTROUTING -d 192.168.1.22 -j RETURN                        #給每條規則加入RETURN,這樣效率會更高.
iptables -t mangle -A POSTROUTING -d 192.168.1.23 -j MARK --set-mark 25      #ip為192.168.1.23的走25號特殊通道,23是我的ip,所以特殊點
iptables -t mangle -A POSTROUTING -d 192.168.1.23 -j RETURN                        #給每條規則加入RETURN,這樣效率會更高.

iptables -t mangle -A PREROUTING -s 192.168.1.22 -j MARK --set-mark 50         #ip為22的走50號上行通道
iptables -t mangle -A PREROUTING -s 192.168.1.22 -j RETURN                        #給每條規則加入RETURN,這樣效率會更高.
iptables -t mangle -A PREROUTING -s 192.168.1.23 -j MARK --set-mark 35        #ip為23的走35號上行通道,我自己的IP.呵呵
iptables -t mangle -A PREROUTING -s 192.168.1.23 -j RETURN                        #給每條規則加入RETURN,這樣效率會更高.

#其他的我就不寫了,大家自己換IP吧,想讓誰走哪個通道,就把IP改了執行,現在發發慈悲,讓大家開網頁的時候走我使用25和35號通道吧,當然你也可以不發慈悲
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 80 -j MARK --set-mark 35    #http的端口號80,所以dport是80,這是發起http請求的時候
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 80 -j RETURN
iptables -t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j MARK --set-mark 25   #http的端口號80,所以sport是80,這是http響應回來的時候
iptables -t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j RETURN

-------------------------

現在來看看如何限制TCP和UDP的連接數吧,很NB的(不知道標准版本和簡化版是否支持,一下語句不保證可用,因個人路由器環境而定):
iptables -I FORWARD -p tcp -m connlimit --connlimit-above 100 -j DROP           #看到了吧,在FORWARD轉發鏈的時候,所有tcp連接大於100 的數據包就丟棄!是針對所有IP的限制
iptables -I FORWARD -p udp -m limit --limit 5/sec -j DROP   #UDP是無法控制連接數的, 只能控制每秒多少個UDP包, 這里設置為每秒5個,5個已經不少了,10個就算很高了,這個是封殺P2P的利器,一般設置為每秒3~5個比較合理.
如何查看命令是否生效呢?:
執行  iptables -L FORWARD 就可以看到如下結果:
DROP       tcp  --  anywhere             anywhere            #conn/32 > 100 
DROP       udp  --  anywhere             anywhere            limit: avg 5/sec bu
如果出現了這2個結果,說明限制連接數的語句確實生效了, 如果沒有這2個出現,則說明你的dd-wrt不支持connlimit限制連接數模塊.

現在我想給自己開個后門,不受連接數的限制該怎么做呢?看下面的:
iptables -I FORWARD -s 192.168.1.23 -j RETURN          #意思是向iptables的FORWARD鏈的最頭插入這個規則,這個規則現在成為第一個規則了,23是我的IP,就是說,只要是我的IP的就不在執行下面的連接數限制的規則語句了,利用了iptables鏈的執行順序規則,我的IP被例外了.

告訴大家一個查看所有人的連接數的語句:
sed -n 's%.* src=<span id="MathJax-Span-2" class="mrow"><span id="MathJax-Span-3" class="mn">192.168.<span id="MathJax-Span-4" class="mo">[<span id="MathJax-Span-5" class="mn">0<span id="MathJax-Span-6" class="mo">−<span id="MathJax-Span-7" class="mn">9.<span id="MathJax-Span-8" class="mo">]<span id="MathJax-Span-9" class="mo">∗192.168.[0−9.]∗ .*%\1%p' /proc/net/ip_conntrack | sort | uniq -c    #執行這個就可以看到所有IP當前所占用的連接數

對於上面的腳本,有一些比較疑惑人的地方,現在來講講:
br0 : 這個是一個dd wrt的網橋, 這個網橋橋接了無線和有線的接口, 所以在這上面卡流量,就相當於卡了所有無線和有線的用戶.具體信息可以輸入ifconfig命令進行查看.
規則鏈順序問題 : 在br0上iptables規則鏈的順序是比較奇怪的, 正常的順序 入站的數據包先過 PERROUTING鏈, 出站數據包先過POSTROUTING鏈,但是 dd wrt的br0網橋順序與正常的順序正好相反!
在ddwrt上入站的數據包被當成出站的,出站的數據包被當成入站的,所以上面的腳本會那么寫.

不會不知道在哪里敲命令吧?
登陸ddwrt的web管理界面 ,管理里面, 開啟SSH
用SSH CLIENT ,這里下載 : http://www.onlinedown.net/soft/20089.htm 
輸入路由器IP,用戶密碼,登陸,開始敲吧.

重要提醒: 大家要用ue這樣的編輯器來寫腳本,這樣的編輯器才支持unix格式,windows下的記事本是不行的,因為這2個系統的換行符不一樣,unix/linux下不認

-----------------------

來源3:HTB HOME:http://luxik.cdi.cz/~devik/qos/htb/

HTB Linux queuing discipline manual - user guide
Martin Devera aka devik (devik@cdi.cz)
Manual: devik and Don Cohen
Last updated: 5.5.2002


New text is in red color. Coloring is removed on new text after 3 months. Currently they depicts HTB3 changes

1. Introduction

HTB is meant as a more understandable, intuitive and faster replacement for the CBQ qdisc in Linux. Both CBQ and HTB help you to control the use of the outbound bandwidth on a given link. Both allow you to use one physical link to simulate several slower links and to send different kinds of traffic on different simulated links. In both cases, you have to specify how to divide the physical link into simulated links and how to decide which simulated link to use for a given packet to be sent.

This document shows you how to use HTB. Most sections have examples, charts (with measured data) and discussion of particular problems.

This release of HTB should be also much more scalable. See comparison at HTB home page.

Please read: tc tool (not only HTB) uses shortcuts to denote units of rate. kbps means kilobytes and kbit means kilobits ! This is the most FAQ about tc in linux.

2. Link sharing

Problem: We have two customers, A and B, both connected to the internet via eth0. We want to allocate 60 kbps to B and 40 kbps to A. Next we want to subdivide A's bandwidth 30kbps for WWW and 10kbps for everything else. Any unused bandwidth can be used by any class which needs it (in proportion of its allocated share).

HTB ensures that the amount of service provided to each class is at least the minimum of the amount it requests and the amount assigned to it. When a class requests less than the amount assigned, the remaining (excess) bandwidth is distributed to other classes which request service.

Also see document about HTB internals - it describes goal above in greater details.

Note: In the literature this is called "borrowing" the excess bandwidth. We use that term below to conform with the literature. We mention, however, that this seems like a bad term since there is no obligation to repay the resource that was "borrowed".

The different kinds of traffic above are represented by classes in HTB. The simplest approach is shown in the picture at the right. 
Let's see what commands to use:

tc qdisc add dev eth0 root handle 1: htb default 12

This command attaches queue discipline HTB to eth0 and gives it the "handle"  1:. This is just a name or identifier with which to refer to it below. The  default 12 means that any traffic that is not otherwise classified will be assigned to class 1:12.

Note: In general (not just for HTB but for all qdiscs and classes in tc), handles are written x:y where x is an integer identifying a qdisc and y is an integer identifying a class belonging to that qdisc. The handle for a qdisc must have zero for its y value and the handle for a class must have a non-zero value for its y value. The "1:" above is treated as "1:0".

 

tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps 
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 30kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 60kbps ceil 100kbps

The first line creates a "root" class, 1:1 under the qdisc 1:. The definition of a root class is one with the htb qdisc as its parent. A root class, like other classes under an htb qdisc allows its children to borrow from each other, but one root class cannot borrow from another. We could have created the other three classes directly under the htb qdisc, but then the excess bandwidth from one would not be available to the others. In this case we do want to allow borrowing, so we have to create an extra class to serve as the root and put the classes that will carry the real data under that. These are defined by the next three lines. The ceil parameter is described below.

Note: Sometimes people ask me why they have to repeat dev eth0 when they have already used handle or parent. The reason is that handles are local to an interface, e.g., eth0 and eth1 could each have classes with handle 1:1.

We also have to describe which packets belong in which class. This is really not related to the HTB qdisc. See the tc filter documentation for details. The commands will look something like this:

tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
   match ip src 1.2.3.4 match ip dport 80 0xffff flowid 1:10
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
   match ip src 1.2.3.4 flowid 1:11

(We identify A by its IP address which we imagine here to be 1.2.3.4.)

Note: The U32 classifier has an undocumented design bug which causes duplicate entries to be listed by "tc filter show" when you use U32 classifiers with different prio values.

You may notice that we didn't create a filter for the 1:12 class. It might be more clear to do so, but this illustrates the use of the default. Any packet not classified by the two rules above (any packet not from source address 1.2.3.4) will be put in class 1:12.

Now we can optionally attach queuing disciplines to the leaf classes. If none is specified the default is pfifo.

tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 5
tc qdisc add dev eth0 parent 1:11 handle 30: pfifo limit 5
tc qdisc add dev eth0 parent 1:12 handle 40: sfq perturb 10

That's all the commands we need. Let's see what happens if we send packets of each class at 90kbps and then stop sending packets of one class at a time. Along the bottom of the graph are annotations like "0:90k". The horizontal position at the center of the label (in this case near the 9, also marked with a red "1") indicates the time at which the rate of some traffic class changes. Before the colon is an identifier for the class (0 for class 1:10, 1 for class 1:11, 2 for class 1:12) and after the colon is the new rate starting at the time where the annotation appears. For example, the rate of class 0 is changed to 90k at time 0, 0 (= 0k) at time 3, and back to 90k at time 6.

Initially all classes generate 90kb. Since this is higher than any of the rates specified, each class is limited to its specified rate. At time 3 when we stop sending class 0 packets, the rate allocated to class 0 is reallocated to the other two classes in proportion to their allocations, 1 part class 1 to 6 parts class 2. (The increase in class 1 is hard to see because it's only 4 kbps.) Similarly at time 9 when class 1 traffic stops its bandwidth is reallocated to the other two (and the increase in class 0 is similarly hard to see.) At time 15 it's easier to see that the allocation to class 2 is divided 3 parts for class 0 to 1 part for class 1. At time 18 both class 1 and class 2 stop so class 0 gets all 90 kbps it requests.

It might be good time to touch concept of quantums now. In fact when more classes want to borrow bandwidth they are each given some number of bytes before serving other competing class. This number is called quantum. You should see that if several classes are competing for parent's bandwidth then they get it in proportion of their quantums. It is important to know that for precise operation quantums need to be as small as possible and larger than MTU. 
Normaly you don't need to specify quantums manualy as HTB chooses precomputed values. It computes classe's quantum (when you add or change it) as its rate divided by r2q global parameter. Its default value is 10 and because typical MTU is 1500 the default is good for rates from 15 kBps (120 kbit). For smaller minimal rates specify r2q 1 when creating qdisc - it is good from 12 kbit which should be enough. If you will need you can specify quantum manualy when adding or changing the class. You can avoid warnings in log if precomputed value would be bad. When you specify quantum on command line the r2q is ignored for that class.

This might seem like a good solution if A and B were not different customers. However, if A is paying for 40kbps then he would probably prefer his unused WWW bandwidth to go to his own other service rather than to B. This requirement is represented in HTB by the class hierarchy.

3. Sharing hierarchy

The problem from the previous chapter is solved by the class hierarchy in this picture. Customer A is now explicitly represented by its own class. Recall from above thatthe amount of service provided to each class is at least the minimum of the amount it requests and the amount assigned to it. This applies to htb classes that are not parents of other htb classes. We call these leaf classes. For htb classes that are parents of other htb classes, which we call interior classes, the rule is that the amount of service is at least the minumum of the amount assigned to it and the sum of the amount requested by its children. In this case we assign 40kbps to customer A. That means that if A requests less than the allocated rate for WWW, the excess will be used for A's other traffic (if there is demand for it), at least until the sum is 40kbps.

Notes: Packet classification rules can assign to inner nodes too. Then you have to attach other filter list to inner node. Finally you should reach leaf or special 1:0 class. The rate supplied for a parent should be the sum of the rates of its children.

The commands are now as follows:

tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:2 htb rate 40kbps ceil 100kbps
tc class add dev eth0 parent 1:2 classid 1:10 htb rate 30kbps ceil 100kbps
tc class add dev eth0 parent 1:2 classid 1:11 htb rate 10kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 60kbps ceil 100kbps

We now turn to the graph showing the results of the hierarchical solution. When A's WWW traffic stops, its assigned bandwidth is reallocated to A's other traffic so that A's total bandwidth is still the assigned 40kbps.
If A were to request less than 40kbs in total then the excess would be given to B.

4. Rate ceiling

The ceil argument specifies the maximum bandwidth that a class can use. This limits how much bandwidth that class can borrow. The default ceil is the same as the rate. (That's why we had to specify it in the examples above to show borrowing.) We now change the ceil 100kbps for classes 1:2 (A) and 1:11 (A's other) from the previous chapter to ceil 60kbps and ceil 20kbps.

The graph at right differs from the previous one at time 3 (when WWW traffic stops) because A/other is limited to 20kbps. Therefore customer A gets only 20kbps in total and the unused 20kbps is allocated to B.
The second difference is at time 15 when B stops. Without the ceil, all of its bandwidth was given to A, but now A is only allowed to use 60kbps, so the remaining 40kbps goes unused.

This feature should be useful for ISPs because they probably want to limit the amount of service a given customer gets even when other customers are not requesting service. (ISPs probably want customers to pay more money for better service.) Note that root classes are not allowed to borrow, so there's really no point in specifying a ceil for them.

Notes: The ceil for a class should always be at least as high as the rate. Also, the ceil for a class should always be at least as high as the ceil of any of its children.

5. Burst

Networking hardware can only send one packet at a time and only at a hardware dependent rate. Link sharing software can only use this ability to approximate the effects of multiple links running at different (lower) speeds. Therefore the rate and ceil are not really instantaneous measures but averages over the time that it takes to send many packets. What really happens is that the traffic from one class is sent a few packets at a time at the maximum speed and then other classes are served for a while. The burst and cburst parameters control the amount of data that can be sent at the maximum (hardware) speed without trying to serve another class.

If cburst is smaller (ideally one packet size) it shapes bursts to not exceed ceil rate in the same way as TBF's peakrate does.

When you set burst for parent class smaller than for some child then you should expect the parent class to get stuck sometimes (because child will drain more than parent can handle). HTB will remember these negative bursts up to 1 minute.

You can ask why I want bursts. Well it is cheap and simple way how to improve response times on congested link. For example www traffic is bursty. You ask for page, get it in burst and then read it. During that idle period burst will "charge" again.

Note: The burst and cburst of a class should always be at least as high as that of any of it children.

On graph you can see case from previous chapter where I changed burst for red and yellow (agency A) class to 20kb but cburst remained default (cca 2 kb).
Green hill is at time 13 due to burst setting on SMTP class. A class. It has underlimit since time 9 and accumulated 20 kb of burst. The hill is high up to 20 kbps (limited by ceil because it has cburst near packet size).
Clever reader can think why there is not red and yellow hill at time 7. It is because yellow is already at ceil limit so it has no space for furtner bursts.
There is at least one unwanted artifact - magenta crater at time 4. It is because I intentionaly "forgot" to add burst to root link (1:1) class. It remembered hill from time 1 and when at time 4 blue class wanted to borrow yellow's rate it denied it and compensated itself.

Limitation: when you operate with high rates on computer with low resolution timer you need some minimal burst and cburst to be set for all classes. Timer resolution on i386 systems is 10ms and 1ms on Alphas. The minimal burst can be computed as max_rate*timer_resolution. So that for 10Mbit on plain i386 you needs burst 12kb.

If you set too small burst you will encounter smaller rate than you set. Latest tc tool will compute and set the smallest possible burst when it is not specified.

6. Priorizing bandwidth share

Priorizing traffic has two sides. First it affects how the excess bandwidth is distributed among siblings. Up to now we have seen that excess bandwidth was distibuted according to rate ratios. Now I used basic configuration from chapter 3 (hierarchy without ceiling and bursts) and changed priority of all classes to 1 except SMTP (green) which I set to 0 (higher).
From sharing view you see that the class got all the excess bandwidth. The rule is that classes with higher priority are offered excess bandwidth first. But rules about guaranted rate and ceil are still met.

There is also second face of problem. It is total delay of packet. It is relatively hard to measure on ethernet which is too fast (delay is so neligible). But there is simple help. We can add simple HTB with one class rate limiting to less then 100 kbps and add second HTB (the one we are measuring) as child. Then we can simulate slower link with larger delays.
For simplicity sake I use simple two class scenario:

# qdisc for delay simulation
tc qdisc add dev eth0 root handle 100: htb
tc class add dev eth0 parent 100: classid 100:1 htb rate 90kbps

# real measured qdisc
tc qdisc add dev eth0 parent 100:1 handle 1: htb
AC="tc class add dev eth0 parent"
$AC 1: classid 1:1 htb rate 100kbps
$AC 1:2 classid 1:10 htb rate 50kbps ceil 100kbps prio 1
$AC 1:2 classid 1:11 htb rate 50kbps ceil 100kbps prio 1
tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 2
tc qdisc add dev eth0 parent 1:11 handle 21: pfifo limit 2

Note: HTB as child of another HTB is NOT the same as class under another class within the same HTB. It is because when class in HTB can send it will send as soon as hardware equipment can. So that delay of underlimit class is limited only by equipment and not by ancestors.
In HTB under HTB case the outer HTB simulates new hardware equipment with all consequences (larger delay)

Simulator is set to generate 50 kbps for both classes and at time 3s it executes command:

tc class change dev eth0 parent 1:2 classid 1:10 htb \
 rate 50kbps ceil 100kbps burst 2k prio 0

As you see the delay of WWW class dropped nearly to the zero while SMTP's delay increased. When you priorize to get better delay it always makes other class delays worse.
Later (time 7s) the simulator starts to generate WWW at 60 kbps and SMTP at 40 kbps. There you can observe next interesting behaviour. When class is overlimit (WWW) then HTB priorizes underlimit part of bandwidth first.

What class should you priorize ? Generaly those classes where you really need low delays. The example could be video or audio traffic (and you will really need to use correct rate here to prevent traffic to kill other ones) or interactive (telnet, SSH) traffic which is bursty in nature and will not negatively affect other flows.
Common trick is to priorize ICMP to get nice ping delays even on fully utilized links (but from technical point of view it is not what you want when measuring connectivity).

7. Understanding statistics

The tc tool allows you to gather statistics of queuing disciplines in Linux. Unfortunately statistic results are not explained by authors so that you often can't use them. Here I try to help you to understand HTB's stats.
First whole HTB stats. The snippet bellow is taken during simulation from chapter 3.

# tc -s -d qdisc show dev eth0
 qdisc pfifo 22: limit 5p
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc pfifo 21: limit 5p
 Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0) 

 qdisc pfifo 20: limit 5p
 Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0) 

 qdisc htb 1: r2q 10 default 1 direct_packets_stat 0
 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251) 

First three disciplines are HTB's children. Let's ignore them as PFIFO stats are self explanatory.
overlimits tells you how many times the discipline delayed a packet.  direct_packets_stat tells you how many packets was sent thru direct queue. Other stats are sefl explanatory. Let's look at class' stats:

tc -s -d class show dev eth0
class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 10240 level 3 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 6872 borrowed: 0 giants: 0

class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 4096 level 2 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 1017 borrowed: 6872 giants: 0

class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 2867 level 0 
 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358) 
 rate 14635bps 29pps 
 lended: 2939 borrowed: 1599 giants: 0

I deleted 1:11 and 1:12 class to make output shorter. As you see there are parameters we set. Also there are  level and DRR  quantum informations.
overlimits shows how many times class was asked to send packet but he can't due to rate/ceil constraints (currently counted for leaves only).
rate, pps tells you actual (10 sec averaged) rate going thru class. It is the same rate as used by gating.
lended is # of packets donated by this class (from its  rate) and  borrowed are packets for whose we borrowed from parent. Lends are always computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented).
giants is number of packets larger than mtu set in tc command. HTB will work with these but rates will not be accurate at all. Add mtu to your tc (defaults to 1600 bytes).

8. Making, debugging and sending error reports

If you have kernel 2.4.20 or newer you don't need to patch it - all is in vanilla tarball. The only thing you need is tc tool. Download HTB 3.6 tarball and use tc from it.

You have to patch to make it work with older kernels. Download kernel source and use patch -p1 -i htb3_2.X.X.diff to apply the patch. Then use make menuconfig;make bzImage as before. Don't forget to enable QoS and HTB.
Also you will have to use patched tc tool. The patch is also in downloads or you can download precompiled binary.

If you think that you found an error I will appreciate error report. For oopses I need ksymoops output. For weird qdisc behaviour add parameter debug 3333333 to your tc qdisc add .... htb. It will log many megabytes to syslog facility kern level debug. You will probably want to add line like:
kern.debug -/var/log/debug
to your /etc/syslog.conf. Then bzip and send me the log via email (up to 10MB after bzipping) along with description of problem and its time.

 

==================

一、相關概念

報文分組從輸入網卡(入口)接收進來,經過路由的查找, 以確定是發給本機的,還是需要轉發的。如果是發給本機的,就直接向上遞交給上層的協議,比如TCP,如果是轉發的, 則會從輸出網卡(出口)發出。網絡流量的控制通常發生在輸出網卡處。雖然在路由器的入口處也可以進行流量控制,Linux也具有相關的功能, 但一般說來, 由於我們無法控制自己網絡之外的設備, 入口處的流量控制相對較難。本文將集中介紹出口處的流量控制。流量控制的一個基本概念是隊列(Qdisc),每個網卡都與一個隊列(Qdisc)相聯系, 每當內核需要將報文分組從網卡發送出去, 都會首先將該報文分組添加到該網卡所配置的隊列中, 由該隊列決定報文分組的發送順序。
因此可以說,所有的流量控制都發生在隊列中,詳細流程圖見下圖。


有些隊列的功能是非常簡單的, 它們對報文分組實行先來先走的策略。有些隊列則功能復雜,會將不同的報文分組進行排隊、分類,並根據不同的原則, 以不同的順序發送隊列中的報文分組。為實現這樣的功能,這些復雜的隊列需要使用不同的過濾器(Filter)來把報文分組分成不同的類別(Class)。這里把這些復雜的隊列稱為可分類(ClassfuI)的隊列。通常, 要實現功能強大的流量控制, 可分類的隊列是必不可少的。因此,類別(class)和過濾器(Filter)也是流量控制的另外兩個重要的基本概念。圖2所示的是一個可分類隊列的例子。

 

 

二、端口限流(TC和IPTABLES)

在Linux中,流量控制都是通過TC這個工具來完成的,但是如果對端口限流需要使用IPTABLES對端口綁定TC隊列。通常, 要對網卡端口進行流量控制的配置,需要進行如下的步驟:

  • 為網卡配置一個隊列;
  • 在該隊列上建立分類;
  • 根據需要建立子隊列和子分類;
  • 為每個分類建立過濾器;
  • 使用iptables對端口綁定tc隊列。

接下來具體為80,22端口設置限流:

1.使用命令ifconfig查看服務器上的網卡信息,比如網卡eth0是用來對外的網絡,也就是用戶通過該網卡連接到系統,那么我們就對這個網卡進行帶寬限制。

ifconfig

2.建立eth0隊列

tc qdisc add dev eth0 root handle 1: htb default 20
命令解釋:
add :表示要添加,
dev eth0 :表示要操作的網卡為eth0,
root :表示為網卡eth0添加的是一個根隊列,
handle 1: 表示隊列的句柄為1:,
htb :表示要添加的隊列為HTB隊列。
default 20: 是htb特有的隊列參數,意思是所有未分類的流量都將分配給類別1:20。

3.建立根分類

tc class add dev eth0 parent 1:0 classid 1:1 htb rate 3Mbit
命令解釋:
在隊列1:0上創建根分類1:1 限速,類別htb,限速3Mbit。
rate 3Mbit:表示系統將為該類別確保3Mbit的帶寬。

4.創建分類

tc class add dev eth0 parent 1:1 classid 1:11 htb rate 2Mbit ceil 3Mbit
命令解釋:
以根分類1:1為父類創建分類1:11 ,限速 2Mbit ~3Mbit(htb可借用其它類帶寬)。
ceil 3Mbit:表示該類別的最高可占用帶寬為3mbit。

5.創建過濾器並制定handle

tc filter add dev eth0 parent 1:0 prio 1 protocol ip handle 11 fw flowid 1:11
prio 1:您可以設置額外的帶寬優先級,prio數值越低優先級越高。
protocol ip:表示該過濾器應該檢查報文分組的協議字段。
flowid 1:11: 表示將把該數據流分配給類別1:11。

6.使用iptable對端口綁定tc隊列

iptables -A OUTPUT -t mangle -p tcp --sport 80 -j MARK --set-mark 11
iptables -A OUTPUT -t mangle -p tcp --sport 22 -j MARK --set-mark 11

三、查看和刪除規則

1.列出TC隊列

tc -s qdisc ls dev eth0

2.刪除TC隊列

tc qdisc del dev eth0 root

3.刪除iptables的mangle表

單純的刪除TC隊列,在下次再次設置的時候,原有的mangle表規則仍會生效,所以如果想重新設置整個限流規則應該同時刪除TC隊列和mangle表。

列出mangle規則並分配編號:
iptables -L -t mangle --line-numbers
根據編號刪除規則(x為規則編號):
iptables -t mangle -D POSTROUTING x

參考文章:

Linux tc QOS 詳解
Linux上的TC流量控制幾個例子(80端口流量限制)
LINUX中IPTABLES和TC對端口的帶寬限制 端口限速

 

==================== End

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM