Linux流量控制(TC)之表面


1.1 流量控制是什么

​ 流量控制是路由器上報文的接收和發送機制及排隊系統的統稱。這包括在一個輸入接口上決定以何種速率接收何種報文,在一個輸出接口上以何種速率、何種順序輸出何種報文。

​ 傳統的流量控制涉及到整流(sharping),調度(scheduling), 分類(classifying),監管(policing),dropping(丟棄), 標記(marking)等工作。

  • 整流。整流器通過延遲數據包來使流量保持在一定速率。整流就是讓包在輸出隊列上被發送之前進行延時,然后一定的速率發送,使網絡流量保持在一定的速率之下,這是大部分用戶進行流量控制的目的。
  • 調度。調度就是對隊列中的輸入輸出報文進行排列。最常的調度方法就是FIFO(先進先出),更廣泛的來說,在輸出隊列上的任何流量控制都可以被稱作調度,因為報文被排列以被輸出。
  • 分類。分類就是將流量進行划分以便區別處理,例如拆分后放到不同的輸出隊列中。在報文的接收、路由、發送過程中,網絡設備可以用多種方式來分類報文。分類包括對報文進行標記,標記可以在邊際網絡中由一個單一的控制單元來完成,也可以在每一跳中都進行標記。
  • 監管。監管作為流量控制的一部分,就是用於限制流量。監管常用於網絡邊際設備,使某個節點不能使用多於分配給它的帶寬。監管器以特定的速率接收數據包,當流量超過這一速率時就對接收的數據包執行相應的動作。最嚴格的動作就是丟棄數據包,盡管該數據包可以被重新分類。
  • 丟棄。丟棄就是通過某種機制來選擇哪個數據包被丟掉。如RED。
  • 標記。標記流量控制在數據包中插入了DSCP部分,在一個可管理網絡中,其可被其它路由器利用和識別(通常用於DiffServ,差分服務)。

1.2 為什么需要流量控制

​ 分組交換網絡和電路交換網絡的一個重要不同之處是:分組交換網絡是無狀態的,而電路交換網絡(比如電話網)必須保持其狀態。分組交換網絡和IP網絡一樣被設計成無狀態的,實際上,無狀態是IP的一個根本優勢。

​ 無狀態的缺陷是不能對不同類型數據流進行區分。但通過流量控制,管理員就能夠基於報文的屬性對其進行排隊和區別。它甚至能夠被用於模擬電路交換網絡,將無狀態網絡模擬成有狀態網絡。

​ 有很多實際的理由去考慮使用流量控制,並且流量控制也有很多有意義的應用場景。下面是一些利用流量控制可以解決或改善的問題的例子,下面的列表不是流量控制可以解決的問題的完整列表,此處僅僅介紹了一些能通過流量控制來解決的幾類問題

常用的流量控制解決方案

  • 通過TBF和帶子分類的HTB將帶寬限制在一個數值之下
  • 通過HTB分類(HTB class)和分類(classifying)並配合filter,來限制指定用戶、服務或客戶端的帶寬。
  • 通過提升ACK報文的優先級,以及使用wondershaper來最大化非對稱線路上的TCP吞吐量。
  • 通過帶子分類的HTB和分類(classifying)為某個應用或用戶保留帶寬。
  • 通過HTB分類(HTB class)中的(優先級)PRIO機制來提高延時敏感型應用的性能。
  • 通過HTB的租借機制來管理多余的帶寬。
  • 通過HTB的租借機制來實現所有帶寬的公平分配。
  • 通過監管器(policer)加上帶丟棄動作的過濾器(filter)來使某種類型的流量被丟棄。

1.3 如何進行流量控制

1.3.1 流量控制一般組成

一個流量控制系統,根據需要實現的功能,大致包含一下幾個組件:

  • 調度器
  • 分類器(可選)
  • 監管器
  • 過濾器

其中,分類器不是必須的,如一些無類流量控制系統。下表是Linux中的對應實現的組件概念。

traditional element Linux component
shaping The class offers shaping capabilities.
scheduling A qdisc is a scheduler. Schedulers can be simple such as the FIFO or complex, containing classes and other qdiscs, such as HTB.
classifying The filter object performs the classification through the agency of a classifier object. Strictly speaking, Linux classifiers cannot exist outside of a filter.
policing A policer exists in the Linux traffic control implementation only as part of a filter.
dropping To drop traffic requires a filter with a policer which uses "drop" as an action.
marking The dsmark qdisc is used for marking.

1.3.2 Linux TC

Linux TC包含了強大的流控各方面的功能。在使用之前,先簡單了解一下其中的邏輯。

Linux TC流量控制的相關名詞解釋:

  • Queueing Discipline (qdisc)

    An algorithm that manages the queue of a device, either incoming (ingress) or outgoing (egress).

  • root qdisc

    The root qdisc is the qdisc attached to the device.

  • Classless qdisc

    A qdisc with no configurable internal subdivisions.

  • Classful qdisc

    A classful qdisc contains multiple classes. Some of these classes contains a further qdisc, which may again be classful, but need not be. According to the strict definition, pfifo_fast is classful, because it contains three bands which are, in fact, classes. However, from the user's configuration perspective, it is classless as the classes can't be touched with the tc tool.

  • Classes

    A classful qdisc may have many classes, each of which is internal to the qdisc. A class, in turn, may have several classes added to it. So a class can have a qdisc as parent or an other class. A leaf class is a class with no child classes. This class has 1 qdisc attached to it. This qdisc is responsible to send the data from that class. When you create a class, a fifo qdisc is attached to it. When you add a child class, this qdisc is removed. For a leaf class, this fifo qdisc can be replaced with an other more suitable qdisc. You can even replace this fifo qdisc with a classful qdisc so you can add extra classes.

  • Classifier

    Each classful qdisc needs to determine to which class it needs to send a packet. This is done using the classifier.

  • Filter

    Classification can be performed using filters. A filter contains a number of conditions which if matched, make the filter match.

  • Scheduling

    A qdisc may, with the help of a classifier, decide that some packets need to go out earlier than others. This process is called Scheduling, and is performed for example by the pfifo_fast qdisc mentioned earlier. Scheduling is also called 'reordering', but this is confusing.

  • Shaping

    The process of delaying packets before they go out to make traffic confirm to a configured maximum rate. Shaping is performed on egress. Colloquially, dropping packets to slow traffic down is also often called Shaping.

  • Policing

    Delaying or dropping packets in order to make traffic stay below a configured bandwidth. In Linux, policing can only drop a packet and not delay it - there is no 'ingress queue'.

  • Work-Conserving

    A work-conserving qdisc always delivers a packet if one is available. In other words, it never delays a packet if the network adaptor is ready to send one (in the case of an egress qdisc).

  • non-Work-Conserving

    Some queues, like for example the Token Bucket Filter, may need to hold on to a packet for a certain time in order to limit the bandwidth. This means that they sometimes refuse to pass a packet, even though they have one available.

1.3.2 Linux TC詳解

首先需要注意的是:Linux tc只對egress方向實現了良好的控制,而對ingress方向控制有限,簡而言之,控發不控收。

下面看實現中的幾個重要概念:

  • 隊列。隊列是流控的基礎概念。通過使用隊列和其他機制,可以進行整流,調度等工作。

  • 令牌桶。這是個非常重要的因素。為了控制出隊的速率,一種方式就是直接統計隊列中出隊的報文或字節數,但是為了保證精確性就需要復雜的計算。在流量控制中廣泛應用的另一種方式就是令牌桶,令牌桶以一定的速率產生令牌,報文或字節出隊時從令牌桶中取令牌,只有取到令牌后才能出隊。

    我們可以打一個比方,一群人正排隊等待乘坐游樂場的游覽車。讓我們想象現在有一條固定的道路,游覽車以固定的速度抵達,每個人都必須等待游覽車到達后才能乘坐。游覽車和游客就可以類比為令牌和報文,這種機制就是速率限制或流量整形,在一個固定的時間段內只有一部分人能乘坐游覽車。

    繼續上面的比方,設想有大量的游覽車正停在車站等待游客乘坐,但現在沒有一個游客。如果現在有一大群游客同時過來了,那么他們都可以馬上乘上游覽車。在這里,我們就可以將車站類比為桶,一個桶中包含一定數量的令牌,桶中的令牌可以一次性被使用完而不管數據包到達的時間。

    讓我們來完成這個比方,游覽車以固定的速率抵達車站,如果沒人乘坐就會停滿車站,即令牌以一定的速率進入桶中,如果令牌一直沒被使用那么桶就可以被裝滿,而如果令牌不斷的被使用那么桶就不會滿。令牌桶是處理會產生流量突發應用(比如HTTP)的關鍵思想。

    使用令牌桶過濾器的排隊規則(TBF qdisc,Token Bucket Filter)是流量整形的一個經典例子(在TBF小節中有一個圖表,通過該圖表可以形象化的幫助讀者理解令牌桶)。TBF以給定的速度產生令牌,當桶中有令牌時才發送數據,令牌是整流的基本思想。

Linux tc中主要的組件是qdisc, class, filter。

  • qdisc包含classful qdisc和classless disc。兩者的區別是glassful qdisc可以包含多個分類,可以更加精細的控制流量。

    • 常見的classless qdisc有:choke, codel, p/bfifo,fq, fq_codel, gred, hhf, ingress,mqprio, multiq, netem, pfifo_fast, pie, red, rr, sfb, sfq, tbf。linux默認使用的就是fifo_fast。

    • 常見的classful qdisc有:ATM, CBQ, DRR, DSMARK, HFSC, HTB, PRIO, QFQ

  • 分類只存在於可分類排隊規則(classful qdisc)(例如,HTB和CBQ)中。分類可以很復雜,它可以包含多個子分類,也可以只包含一個子qdisc。在超級復雜的流量控制應用場景中,一個類中再包含一個可分類qdisc也是可以的。

    任何一個分類都可以和任意多個filter相關聯,這樣就可以選擇一個子分類或運用一個filter來重新排列或丟棄進入分類中的數據包。

    葉子分類是qdisc中的最后一個分類,它包含一個qdisc(默認是pfifo)並且不包含任意子分類。任何包含子分類的分類都是內部分類而不是子分類。

  • Linux的過濾器可以允許用戶利用一個或多個過濾器將數據包分類至輸出隊列上。它包含了一個分類器實現,常見的分類器如u32,u32分類器可以允許用戶基於數據包的屬性來選擇數據包。

無論是qdisc,還是class, 都需要有一個唯一標識符。就是所說的句柄。它們都采用major:minor格式來命名,注意他們都是以十六進制解析。對於他們的使用,在栗子中會做具體說明。

接下來我們主要介紹一下classful qdisc的情況。看一下數據包的流程。

  • flow within classful qdisc & class

    When traffic enters a classful qdisc, it needs to be sent to any of the classes within - it needs to be 'classified'. To determine what to do with a packet, the so called 'filters' are consulted. It is important to know that the filters are called from within a qdisc, and not the other way around!

    The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains.

    Besides containing other qdiscs, most classful qdiscs also perform shaping. This is useful to perform both packet scheduling (with SFQ, for example) and rate control. You need this in cases where you have a high speed interface (for example, ethernet) to a slower device (a cable modem).

  • How filters are used to classify traffic

    Recapping, a typical hierarchy might look like this:

                     1:   root qdisc
                      |
                     1:1    child class
                   /  |  \
                  /   |   \
                 /    |    \
                 /    |    \
              1:10  1:11  1:12   child classes
               |      |     | 
               |     11:    |    leaf class
               |            | 
               10:         12:   qdisc
              /   \       /   \
           10:1  10:2   12:1  12:2   leaf classes

​ But don't let this tree fool you! You should not imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to.

​ A packet might get classified in a chain like this: 1: -> 1:1 -> 1:12 -> 12: -> 12:2

​ The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each choosing a branch to take next. This can make sense. However, this is also possible: 1: -> 12:2

​ In this case, a filter attached to the root decided to send the packet directly to 12:2.

  • How packets are dequeued to the hardware

    When the kernel decides that it needs to extract packets to send to the interface, the root qdisc 1: gets a dequeue request, which is passed to 1:1, which is in turn passed to 10:, 11: and 12:, each of which queries its siblings, and tries to dequeue() from them. In this case, the kernel needs to walk the entire tree, because only 12:2 contains a packet.

    In short, nested classes ONLY talk to their parent qdiscs, never to an interface. Only the root qdisc gets dequeued by the kernel!

    The upshot of this is that classes never get dequeued faster than their parents allow. And this is exactly what we want: this way we can have SFQ in an inner class, which doesn't do any shaping, only scheduling, and have a shaping outer qdisc, which does the shaping.

1.3.3 HTB的配置使用

HTB是一種classful qdisc,是一種分層分類流控方法,是Linux常用的一種流控配置。接下來就來看一下使用配置:

配置HTB需要四個步驟:

  • 創建root qdisc
  • 創建class
  • 創建filter,關聯到class
  • 添加leaf class disc(非必需)
#tc qdisc add dev eth0 root handle 1: htb default 30 //添加root qdisc, 1:是 1:0的簡寫
#tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k //以根1:為根,創建class
#tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k 
#tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k 
#tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k 
#tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10 //為leaf class添加qdisc,默認為pfifo
#tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10 
#tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10 
# 添加過濾器 , 直接把流量導向相應的類 : 
#U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32"
#$U32 match ip dport 80 0xffff flowid 1:10 //關聯filter到class
#$U32 match ip sport 25 0xffff flowid 1:20

其中創建class時,其中的參數意義如下:

default

這是HTB排隊規則的一個可選參數,默認值為0, 當值為0時意味着會繞過所有和rootqdisc相關聯的分類,然后以最大的速度出隊任何未分類的流量。

rate

這個參數用來設置流量發送的最小期望速率。這個速率可以被當作承諾信息速率(CIR), 或者給某個葉子分類的保證帶寬。

ceil

這個參數用來設置流量發送的最大期望速率。租借機制將會決定這個參數的實際用處。 這個速率可以被稱作“突發速率”。

burst

這個參數是rate桶的大小(參見令牌桶這一節)。HTB將會在更多令牌到達之前將burst個字節的數據包出隊。

cburst

這個參數是ceil桶的大小(參見令牌桶這一節)。HTB將會更多令牌(ctoken)到達之前將cburst個字節的數據包出隊。

quantum

這個是HTB控制租借機制的關鍵參數。正常情況下,HTB自己會計算合適的quantum值,而不是由用戶來設定。對這個值的輕微調整都會對租借和整形造成巨大的影響,因為HTB不僅會根據這個值向各個子分類分發流量(速率應高於rate,小於ceil),還會根據此值輸出各個子分類中的數據。

r2q

通常,quantum 的值由HTB自己計算,用戶可以通過此參數設置一個值來幫助HTB為某個分類計算一個最優的quantum值。

mtu

prio

1.3.4 入向流控

入向的流控常見做法是通過把接口的流量重定向到ifb設備,然后在ifb的egress上做流控,間接達到控制入向的目的。簡單的使用示例如下:

#modprobe ifb    //需要加載ifb模塊

#ip link set dev ifb0 up txqueuelen 1000

#tc qdisc add dev eth1 ingress  //添加ingress qdisc

#tc filter add dev eth1 parent ffff: protocol ip u32 match u32 0 0flowid 1:1 action mirred egress redirect dev ifb0   //重定向流量到ifb

#tc qdisc add dev ifb0 root netem delay 50ms loss 1%  //在ifb上配置操作,這里使用了netem,也可以和出向一樣,配置qdisc, class, filter。

1.3.5 統計查看

  • 使用tc qdisc show dev xx 查看qdisc
  • 使用tc class show dev xx 查看class
  • 使用tc filter show dev xx 查看filter,注意這里都是查看默認為root,即出向的規則,如果要查看入向的,需要使用tc filter show dev xx ingress 。
The tc tool allows you to gather statistics of queuing disciplines in Linux. Unfortunately statistic results are not explained by authors so that you often can't use them. Here I try to help you to understand HTB's stats.
First whole HTB stats. The snippet bellow is taken during simulation from chapter 3.

# tc -s -d qdisc show dev eth0
 qdisc pfifo 22: limit 5p
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc pfifo 21: limit 5p
 Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0) 

 qdisc pfifo 20: limit 5p
 Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0) 

 qdisc htb 1: r2q 10 default 1 direct_packets_stat 0
 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251) 

First three disciplines are HTB's children. Let's ignore them as PFIFO stats are self explanatory.
overlimits tells you how many times the discipline delayed a packet. direct_packets_stat tells you how many packets was sent thru direct queue. Other stats are sefl explanatory. Let's look at class' stats:

tc -s -d class show dev eth0
class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 10240 level 3 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 6872 borrowed: 0 giants: 0

class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 4096 level 2 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 1017 borrowed: 6872 giants: 0

class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 2867 level 0 
 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358) 
 rate 14635bps 29pps 
 lended: 2939 borrowed: 1599 giants: 0

I deleted 1:11 and 1:12 class to make output shorter. As you see there are parameters we set. Also there are level and DRR quantum informations.
overlimits shows how many times class was asked to send packet but he can't due to rate/ceil constraints (currently counted for leaves only).
rate, pps tells you actual (10 sec averaged) rate going thru class. It is the same rate as used by gating.
lended is # of packets donated by this class (from its rate) and borrowed are packets for whose we borrowed from parent. Lends are always computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented).
giants is number of packets larger than mtu set in tc command. HTB will work with these but rates will not be accurate at all. Add mtu to your tc (defaults to 1600 bytes).

1.3.6 雜項說明

  • 查看統計信息時,看不到統計速度rate等?內核為了性能,默認關閉了顯示,可以通過echo 1 > /sys/module/sch_htb/parameters/htb_rate_est來打開。

1.3.7 參考文檔


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM