[中英對照]User-Space Device Drivers in Linux: A First Look | 初識Linux用戶態設備驅動程序


如對Linux用戶態驅動程序開發有興趣,請閱讀本文,否則請飄過。

User-Space Device Drivers in Linux: A First Look | 初識Linux用戶態設備驅動程序

User-Space Device Drivers in Linux: A First Look
Mats Liljegren
Senior Software Architect
 Device drivers in Linux are traditionally run in kernel space, but can
also be run in user space. This paper will take a look at running
drivers in user space, trying to answer the questions in what degree
the driver can run in user space and what can be gained from this?

Linux設備驅動通常運行在內核空間,但是也可以運行在用戶空間。本文將介紹運行在用戶空間中的設備驅動程序,試圖回答以下兩個問題:驅動程序在用戶空間中運行的程度,以及從中獲得的好處。

In the '90s, user-space drivers in Linux were much about how
to make graphics run faster[1] by avoiding calling the kernel.
These drivers where commonly used by the X-windows server.

User-space driver has become ever more important, as a blog
post by Tedd Hoff[2] illustrates. In his case the kernel is seen as
the problem when trying to achieve high server connection capacity.

Network interface hardware companies like Intel, Texas Instruments
and Freescale have picked up on this and are now providing
software solutions for user-space drivers supporting their hardware.

在上世紀90年代,在Linux中的用戶空間驅動程序集中於如何使圖形運行得更快,通過避免內核調用。這些驅動程序通常在X-windows服務器上使用。用戶空間驅動程序變得越來越重要, 在Tedd Hoff發表的博客中有所論述。在他論述的例子中,內核被認為是問題之所在,當試圖提供高並發連接服務器能力(注:c10k問題)的時候。諸如英特爾、德州儀器公司和飛思卡爾這樣的網絡接口硬件公司已經開始研究這一問題,現在他們正在為支持他們的硬件的用戶空間驅動程序提供軟件解決方案。

1. Problems with kernel-space drivers 內核空間驅動程序存在的問題

Device drivers normally run in kernel space, since handling
interrupts and mapping hardware resources require privileges
that only the kernel space is allowed to have. However, it is not
without drawbacks.

設備驅動程序通常在內核空間中運行,因為中斷處理和硬件資源映射需要特權,對應的特權只有內核空間才允許擁有。然而,它也並非沒有缺點。

1.1 System call overhead 系統調用的開銷

Each call to the kernel must perform a switch from user mode
to supervisor mode, and then back again. This takes time, which can
become a performance bottleneck if the calls are frequent.
Furthermore, the overhead is very much non-predictable, which
has a negative performance impact on real-time applications.

對內核的每一個調用必須從用戶模式切換到超級管理(內核)模式,然后再返回。這顯然需要時間,如果調用頻繁的話,就會成為性能瓶頸。此外,開銷很大程度上是不可預測的,這對實時應用程序將產生負面的性能影響。

1.2 Steep learning curve 學習曲線陡峭

The kernel-space API is different. For example, malloc() needs
to be replaced by one of the several types of memory allocations
that the kernel can offer, such as kmalloc(), vmalloc(), alloc_pages()
or get_zeroed_page(). There is a lot to learn before becoming
productive.

跟用戶空間API相比,內核空間API有所不同。例如,如取代malloc()的話,內核就提供了幾種不同類型的內存分配API,比如kmalloc(), vmalloc(), alloc_pages()或get_zeroed_page()。想在內核編程方面卓有成效,需要學習的東西很多。

1.3 Interface stability 接口穩定性

The kernel-space API is less stable than user-space APIs, making
maintenance a challenge.

與用戶空間的API比較而言, 內核空間API更不穩定,這無疑給代碼維護帶來了很大的挑戰。

1.4 Harder to debug 調試更困難

Debugging is very different in kernel space. Some tools often
used by user-space applications can be used for the kernel.
However, they represent exceptions rather than rule, where
LTTNG[3] is an example of the exception. To compensate for
this, the kernel has a lot of debug, tracing and profiling code
that can be enabled at compile time.

在內核空間中,調試有所不同,而且非常不同於用戶空間調試。在用戶空間應用程序調試中經常使用的一些工具可以用於內核調試。然而,他們代表着異常而非常態, 例如LTTNG[3]就是一個例外。為了彌補這一點,內核存在着許多調試、跟蹤和分析代碼,這些代碼可以在編譯的時候被啟用。

1.5 Bugs more fatal 錯誤更加致命

A crashing or misbehaving kernel tends to have a more severe
impact on the system than a crashing or misbehaving application,
which can affect robustness as well as how easy it is to debug.

內核崩潰或行為不正確對系統的影響比應用程序崩潰或不正確對系統的影響更大,這影響到系統的健壯性以及調試的容易程度。

1.6 Restrictive language choice 編程語言選擇嚴格受限

The kernel space is a very different programming environment
than user space. It is more restricted, for example only C language
is supported. This rules out any script based prototyping.

內核空間與用戶空間的編程環境非常不一樣。它受到的限制更多,例如只支持C語言。這就將任何基於腳本的原型設計排除在外了。

2. User-space drivers 用戶空間驅動

If there are so many problems with having device drivers in
kernel space, is it time to have all drivers in user space instead?
As always, everything has its drawbacks, user-space drivers are
no exception.

Most of the issues with kernel-space drivers are solved by having
the driver in user space, but the issue with interface stability is
only true for very simple user-space drivers.

For more advanced user-space drivers, many of the interfaces
available for kernel-space drivers need to be re-implemented
for user-space drivers. This means that interface stability will still
be an issue.

既然在內核空間中的設備驅動程序存在着很多問題,那么是否應該將所有驅動程序都放在用戶空間中呢?一如既往地,任何解決方案都有缺點,用戶空間驅動程序也不例外。內核空間驅動程序存在的大部分問題都可以通過用戶空間驅動程序給解決掉,但接口穩定性的問題是只適用於那些很簡單的用戶空間驅動程序。對於更高級的用戶空間設備驅動,許多在內核空間才可用的接口需要為用戶空間驅動重新實現一下,這意味着接口的穩定性仍然是一個問題。

3. Challenges with user-space drivers 用戶態設備驅動面臨的挑戰

There is a fear in the Linux kernel community that user-space
drivers are used as a tool to avoid the kernel's GPLv2 license.
This would undermine the idea with free open source software
ideas that GPLv2 has. However, this is outside the scope of
this paper.

在Linux內核社區,有這樣一個恐懼,那就是用戶態驅動程序被當做一個工具來避免了內核GPLv2許可。這無疑將破壞GPLv2一貫主張的開源軟件理念。然而,這一點超出了本文討論的范圍。

Apart from this there are technical challenges for user-space
drivers.

除此之外,用戶態設備驅動還存在技術上的諸多挑戰。

3.1 Interrupt handling 中斷處理

Without question, interrupt handling is the biggest challenge
for a user-space driver. The function handling an interrupt is
called in privileged execution mode, often called supervisor
mode. User-space drivers have no permission to execute in
privileged execution mode, making it impossible for user-space
drivers to implement an interrupt handler.

毫無疑問,中斷處理是用戶態設備驅動面臨的最大的挑戰。中斷處理函數在特權執行模式(又叫做超級管理模式)下才能被調用。用戶態設備驅動程序不允許在特權模式下執行,這使得在用戶態設備驅動里實現一個中斷處理程序是不可能的。

There are two ways to deal with this problem: Either you do not
use interrupts, which means that you have to poll instead.
Or have a small kernel-space driver handling only the interrupt.
In the latter case you can inform the user-space driver of an
interrupt either by a blocking call, which unblocks when
an interrupt occurs, or using POSIX signal to preempt the
user-space driver.

解決這個問題有兩種辦法:要么不使用中斷,要么有一個內核空間的驅動來專門處理中斷。在前一種辦法中,不使用中斷意味着必須使用輪詢。在后一種辦法中,你可以通過阻塞調用來通知用戶態設備驅動程序,在中斷發生時打開阻塞調用,或者使用POSIX信號來搶占用戶態設備驅動。

Polling is beneficial if interrupts are frequent, since there is
considerable overhead associated with each interrupt, due to
the switch from user mode to supervisor mode and back that
it causes. Each poll attempt on the other hand is usually only
a check for a value on a specific memory address.

如果中斷頻繁發生的話,那么輪詢就是有益的,因為每次中斷都有相當大的開銷,這些開銷來源於從用戶模式切換到內核模式,然后再從內核模式返回到用戶模式。另一方面,每一次輪詢通常只是對位於特定內存地址的值進行檢查而已(,所以輪詢有好處,能減少系統開銷)。

When interrupts become scarcer, polling will instead do a lot
of work just to determine that there was no work to do. This is
bad for power saving.

當中斷變得稀少時,輪詢將會做大量的工作來確定沒有什么工作可以做,這不利於節省能源消耗。

To get power saving when using user-space drivers with polling,
you can change the CPU clock frequency, or the number of
CPUs used, depending on work load. Both alternatives will
introduce ramp-up latency when there is a work load spike.

在用戶態設備驅動程序中使用輪詢的時候,如果要省電的話,可以根據工作負載來修改CPU的時鍾頻率,或者更改在用的CPU的個數。當遇到工作負載峰值的時候,這兩種方法都將引入急劇的延遲。

3.2 DMA 直接內存訪問

Many drivers use hardware dedicated to copying memory
areas managed by the CPU to or from memory areas managed
by hardware devices. Such dedicated hardware is called direct
memory access, or DMA. DMA relieves the CPU of such
memory copying.

許多驅動程序使用專門的硬件來做內存拷貝,從CPU管理的內存區域到硬件管理的內存區域,或相反。這種專門的硬件叫做DMA(直接內存訪問)。有了DMA,CPU得以從繁重的內存拷貝工作中解放出來。

There are some restrictions on the memory area used for
DMA. These restrictions are unique for each DMA device.
Common restrictions are that only a certain physical memory
range can be used, and that the physical memory range must be
consecutive.

給DMA使用的內存區域存在着一些限制。這些限制對於每一個DMA設備來說都是獨一無二的。通常的限制是只能使用一定的物理內存范圍,而且物理內存范圍必須是連續的。

Allocating memory that can be used for DMA transfers is
non-trivial for user-space drivers. However, since DMA
memory can be reused, you only need to allocate a pool of
memory to be used for DMA transfers at start-up. This means
that the kernel could help with providing such memory when
the user-space driver starts, but after that no further kernel
interactions would be needed.

分配可用於DMA傳輸的內存,對於用戶態設備驅動程序來說是十分重要的。然而,由於用於DMA傳輸的內存是可以重用的,所以只需要分配一個內存池,以便在DMA傳輸啟動時被使用。這就意味着,當於用戶態設備驅動程序啟動時,內核空間可以提供這樣一段內存,但是在那之后,不再需要進一步的內核交互。

3.3 Device interdependencies 設備的相互依賴關系

Devices are often structured in a hierarchy. For example the
clock might be propagated in a tree-like fashion using different
dividers for different devices and offer the possibility to power
off the clock signal to save power.

通常用層次結構來列舉設備。比如,時鍾可以用樹形方式繁殖,通過對不同的設備使用不同的分頻器。時鍾提供了對時鍾信號提供斷電的可能性以達到省電的目的。

There can be devices acting as a bridge, for example a PCI host
bridge. In this case you need to setup the bridge in order to
have access to any device connected on the other side of
the bridge.

有設備可以用來做網橋,例如PCI主機網橋。在這種情況下,需要設置一個網橋,以便訪問連接在橋上的另一側的任何設備。

In kernel space there are frameworks helping a device driver
programmer to solve these problems, but those frameworks
are not available in user space.

在內核空間中,有幫助驅動程序開發人員解決這些問題的框架。但是,這些框架在用戶空間卻不可用。

Since it is usually only the startup and shutdown phases that
affect other devices, the device interdependencies can be
solved by a kernel-space driver, while the user-space driver
can handle the actual operation of the device.

通常情況下,只有在啟動階段和關機階段才會影響到其他設備,因此,設備之間的相互依賴關系可以為內核態設備驅動程序所解決。而用戶態設備驅動可以對設備做實際的操作。

3.4 Kernel services 內核服務

Network device drivers normally interfaces the kernel network
stack, just like block device drivers normally interfaces the kernel
file system framework.

網絡設備驅動程序通常與內核網絡棧打交道,就像塊設備驅動通常與內核文件系統框架打交道一樣。

User-space drivers have no direct access to such kernel services,
and must re-implement them.

用戶態設備驅動不能夠直接訪問這樣的內核服務,因此必須重新實現。

3.5 Client interface 客戶端接口

The kernel has mechanisms for handling multiple clients
accessing the same resource, and for blocking threads waiting
for events or data from the device. These mechanisms are
available using standard interfaces like file descriptors, sockets,
or pipes.

內核有處理多個客戶端訪問相同的系統資源的機制,也有阻塞線程等待某個設備事件到達或者設備數據到達的機制。這些機制可以使用標准的接口來使用,例如文件描述符,套接口或管道。

To avoid using the kernel, the user-space driver needs to invent
its own interface.

為了避免使用內核,用戶態設備驅動需要發明自己的接口。

4. Implementing user-space drivers | 用戶態設備驅動實現

The picture above shows how a user-space driver might be 
designed. The application interfaces the user-space part of the 
driver. The user-space part handles the hardware, but uses its 
kernel-space part for startup, shutdown, and receiving interrupts.

上面的圖片顯示了用戶態設備驅動可能的設計。設備驅動的用戶空間部分扮演了應用程序的接口。設備驅動的用戶空間部分負責硬件處理,但是其內核空間部分則負責啟動、關閉和接收中斷。

There are several frameworks and software solutions available 
to help designing a user-space driver.

設計一個用戶態設備驅動,可用的框架和軟件解決方案不止一個。

4.1 UIO | Userspace I/O : 用戶空間I/O

There is a framework in the kernel called UIO [5][4] which facilitate 
writing a kernel-space part of the user-space driver. UIO has 
mechanisms for providing memory mapped I/O accessible for 
the user-space part of the driver.

在內核中有一個稱之為UIO(Userspace I/O)的框架,它可以用來幫助開發用戶態設備驅動程序的內核空間部分。 UIO提供一種機制為設備驅動的用戶空間部分提供可訪問的的內存映射I/O。

The allocated memory regions are presented using a device 
file, typically called /dev/uioX, where X is a sequence number 
for the device. The user-space part will then open the file and 
perform mmap() on it. After that, the user-space part has direct 
access to its device.

分配的內存區域用一個設備文件來表示, 典型的設備文件叫做/dev/uioX, 其中X是設備的序列號。 設備驅動的用戶空間部分將打開那個文件並在對應的文件描述符上做mmap()操作。接下來,設備驅動的用戶空間部分就可以直接訪問/dev/uioX文件對應的設備。

By reading from the same file being opened for mmap(), the 
user-space part will block until an interrupt occurs. The content 
read will be the number of interrupts that has occurred. You can 
use select() on the opened file to wait for other events as well.

在讀取已經為mmap()打開的同一文件時,設備驅動的用戶空間部分將阻塞直到有中斷發生,讀取的內容將是已經發生的中斷數。也可以在已經打開的文件上使用select()來等待其他事件發生。

For user-space network drivers there are specialized solutions 
specific for certain hardware.

對於用戶空間的網卡驅動來說,針對特定的硬件有專門的解決方案。(有關UIO的詳細信息, 請閱讀 uio-howtoUserspace I/O drivers in a realtime context.)

4.2 DPDK | 數據平面開發套件

Data Plane Development Kit, DPDK[6], is a solution from Intel 
for user-space network drivers using Intel (x86) hardware. DPDK 
defines an execution environment which contains user-space 
network drivers. This execution environment defines a thread 
for each CPU, called lcore in DPDK. For maximum throughput 
you should not have any other thread running on that CPU.

DPDK(數據平面開發套件)是英特爾公司為其x86硬件開發的用戶態網卡驅動解決方案。DPDK定義了一套運行環境,該環境包括了用戶態網卡驅動。這套運行環境還為每一個CPU都定義了一個線程,在DPDK中稱之為lcore。為了保證吞吐量能夠最大化,在已經運行lcore線程的CPU上就不要再運行任何其他線程。

While this package of libraries focuses on forwarding applications, 
you can implement server applications as well. For server DPDK 
applications you need to implement your own network stack 
and accept a DPDK specific interface for accessing the network.

雖然DPDK庫側重於實現轉發應用,但是也可以用來實現服務端應用。如果要實現DPDK服務端應用,需要自己實現網絡棧,而且能夠在用於訪問網絡的特定的DPDK接口上進行accept操作。

Much effort has been put in memory handling, since this is often 
critical for reaching the best possible performance. There are 
special allocation and deallocation functions that try to minimize 
TLB[10] misses, use the most local memory for NUMA[11] 
systems and ensure even spread on multi-channel memory 
architectures [12]. 

大量的投入放在內存處理上,因為這往往是達到最佳性能的關鍵。 DPDK有專門的內存分配/釋放函數來做這樣的事情,試圖將TLB(Translation Lookaside Buffer:地址變換高速緩存)命不中最小化,在NUMA系統中盡可能地使用特定CPU的本地內存,甚至確保在多通道的內存體系結構中能夠平衡擴頻。

4.3 USDPAA | 用戶態數據平面加速架構

User-space Data Plane Acceleration Architecture, USDPAA[7], is 
a solution from Freescale for the same use case as DPDK but 
designed for their QorIQ architecture (PowerPC and ARM). 
The big difference is that QorIQ uses hardware for allocating, 
de-allocating and queuing network packet buffers. This makes 
memory management easier for the application.

USDPAA(用戶態數據平面加速架構)是Freescale(飛思卡爾)公司提出的與Intel的DPDK類似的解決方案,專門針對其QorIQ架構(PowerPC和ARM)而設計。與DPDK不同的是,QorIQ使用硬件來分配/取消分配和排隊網絡數據包緩沖區。這對應用程序來說,內存管理就容易多了。

4.4 TransportNetLib

TransportNetLib[8] is a solution from Texas Instruments. It 
is similar to USDPAA but for the Keystone architecture (ARM).

TransportNetLib是德州儀器的解決方案。該方案與USDPAA類似,只不過只針對Keystone架構(ARM)。

4.5 Open DataPlane

Open DataPlane, ODP[9], is a solution initiated by Linaro to do 
the same as DPDK, USDPAA and TransportNetLib, but with 
vendor generic interfaces.

(ODP)開放數據平面是由Linaro發起的跟DPDK, USDPAA和TransportNetLib類似的解決方案,但是提供與特定的硬件供應商無關的通用接口。

4.6 Trying out DPDK | DPDK嘗嘗鮮

To get the feeling for the potential performance gain from having 
a user mode network device driver, a DPDK benchmark application 
was designed and executed.

為了真切感受一下在使用了用戶態網卡驅動之后的性能提升,這里設計並執行了一個DPDK基准測試。

The design of the application can be seen in the picture above. 
It executes as four instances each running on its own CPU, or 
lcore, as DPDK calls them.

應用設計見上圖。該設計運行4個實例,每個實例運行在它自己的CPU(在DPDK中稱之為lcore)上。

Each instance is dedicated to its own Ethernet device sending 
and receiving network packets. The packets sent has a magic 
word used for validating the packets and a timestamp used for 
measuring transport latency.

每一個實例都專注於在它自己的以太網設備上發送和接收網絡數據包。發送的數據包包含了一個魔幻單詞和一個時間戳,其中,魔幻單詞用於驗證數據包的有效性,時間戳則用於測量傳輸延遲。

The instances are then paired using loopback cables. To be able 
to compare user-space driver with kernel-space driver, one 
pair accesses the hardware directly using the driver available in 
DPDK, and the other pair uses the pcap[13] interface. All four 
Ethernet devices are on the same PCI network card.

兩個實例通過背靠背連接在一起作為實例對兒使用。為了能夠對用戶態設備驅動與內核態設備驅動做出對比,一組實例對兒通過DPDK的驅動直接訪問硬件,另一組實例對兒則使用pcap接口訪問硬件。所有(4個)的以太網設備都連接在同一個PCI網卡上。

There is a fifth lcore (not shown in the picture above) which 
periodically collects statistics and displays it to the screen.

第5個lcore(未在圖上顯示出來)負責周期性地收集統計數據並輸出到屏幕上。

The hardware used was as follows: 
  o Supermicro A1SAi-2750F mother board using Intel Atom 
    C2750 CPU.  This CPU has 8 cores with no hyperthreading. 
  o 16GB of memory. 
  o Intel Ethernet server adapter i350-T4, 1000 Mbps.

使用的硬件如下:

  • 超微A1SAi-2750F主板,英特爾安騰C2750處理器,8核心,但不支持超線程。
  • 16GB內存。
  • 英特爾(應用於服務器的)千兆以太網卡i350-T4。
The table below shows the throughput and latency for user-space 
driver compared to kernel-space driver.

用戶態驅動和內核態驅動的吞吐量與延遲對比。

A graph showing the throughput:

吞吐量對比圖。

A graph showing the latency:

延遲對比圖。

The theoretical throughput maximum is the sum of the send 
and receives speed for the network interface. In this case this 
is 1000 Mbps in each direction, giving a theoretical maximum 
of 2000 Mbps. The throughput includes packet headers and 
padding.

理論上的吞吐量最大值是發送和接收速度的總和。在這個例子中,發送/接受速度都是1000Mbps, 所以理論上的吞吐量最大值為2000Mbps。吞吐量包括數據包頭和數據包填充。

User-space driver achieved a throughput boost of about four 
times over kernel-space driver.

從吞吐量上看,用戶態設備驅動性能非常好,相當於內核態驅動的4倍。

Latency was calculated by comparing the timestamp value found 
in the network packet with the current clock when packet was 
received. The latency for user-space driver was slightly less than 
for kernel-space driver.

通過比較網絡數據包中的時間戳和收到數據包的那一刻的系統時鍾,就能計算出延遲。用戶態設備驅動的延遲略小於內核態設備驅動的延遲。

Four threads, each continuously running netperf TCP streaming 
test against loop-back interface, were used as a stress while 
running the DPDK benchmark application. This had no noticeable 
impact on the measurements.

在跑DPDK基准應用的時候,有4個線程作為壓力測試在同時運行,每一個線程在loop-back接口上連續地運行netperf TCP流量測試。這些壓力測試對測量結果沒有什么明顯的影響。

5. Conclusion | 總結陳詞

Implementing a user-space driver requires some work 
and knowledge. The major challenges are interrupts versus 
polling, power management and designing interface towards 
driver clients.

實現用戶態設備驅動需要做更多工作和掌握更多的知識。主要的挑戰就是中斷v.s.輪詢電源管理面向設備驅動客戶端設計接口

Support for user-space network drivers is a lot more developed 
than for other kinds of user-space drivers, especially for doing 
data plane forwarding type of applications.

在支持用戶態網卡驅動開發方面,比支持其他類型的用戶態設備驅動要多得多,尤其是針對數據平面轉發類應用所做的開發。

A user-space driver can do everything a kernel-space driver 
can, except for implementing an interrupt handler.

除了無法實現中斷處理程序之外,只要內核態設備驅動能做的事情,用戶態設備驅動都能。

Comparing a user-space network driver with a kernel-space 
network driver showed about four times better throughput 
for the user space driver. Latency did not show a significant 
difference.

我們比較了內核態網卡驅動和用戶態網卡驅動,發現在吞吐量方面,用戶態網卡驅動性能表現比較突出,大約是內核態網卡驅動的4倍。但是在延遲方面,二者沒有什么大的區別。

The real-time characteristics should be good for user-space 
drivers since they do not invoke the kernel. This was not verified 
in this paper, though.

雖然本文沒有進行驗證,但是用戶態設備驅動的實時性應該會表現不錯,因為不涉及到內核調用。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM