DPDK mbuf何時釋放回內存池?


前言:在調試DPDK時,在發送完一定數量的包后,通過內存統計rte_mempool_count()計算出的值,也就是mempool中剩余的mbuf的數量並沒有回升,那么就有一個問題,從mempool中分配的mbuf什么時候才被還回給mempool?沒還的部分都在哪?

  I.

先回答第二個問題,對於多核系統,可能會存在對於mempool的並發訪問,雖然DPDK實現了無鎖的ring,但無鎖的實現是基於CAS(Compare And Set),頻繁的訪問仍然會影響效率。所以,對於每個核,都可以在創建mempool時,選擇為每個核分配Cache,這樣,在對應的核上申請和釋放mbuf時,就會優先從本地Cache中使用,這樣,就會有一部分mbuf駐存在Cache。

  II.

同樣是出於效率問題,在收發包隊列中可以設置長度和一些閾值,tx_free_thresh就是其中一個。具體是因為在網卡硬件讀取完發包描述符后,並且DMA完成傳送后,硬件會根據發送結果回寫標記來通知軟件發包過程完成。這樣,這些發包描述符就可以釋放和再次使用了,對應的mbuf也就可以釋放。問題就在這里,如果每次發送完都進行釋放,就可能會影響效率,這樣,通過設置tx_free_thresh,只有在可以重新分配的發包描述符數量少於tx_free_thresh后,才會釋放那些發送完成的描述符和mbuf。所以,可以看出,一部分mbuf駐存在網卡發送隊列中。

在知道了Mbuf都被存放在哪后,就知道了何時會被重新放到Mempool。針對存在Cache的情況,可以具體參考官方文檔,在Cache,ring,和Mempool的關系中可以找到答案。針對第二種在NIC queue中的情況,則是在低於tx_free_thresh值時釋放描述符和Mbuf。

下面給出這個問題的相關的問答對話:

Q: When are mbufs released back to the mempool?

When running l2fwd the number of available mbufs returned by

rte_mempool_count() starts at 7680 on an idle system.

As traffic commences the count declines from 7680 to 5632 (expected).

When traffic stops the count does not climb back to the starting value,

indicating that idle mbufs are not returned to the mempool.

For the LCORE cache the doc states

“While this may mean a number of buffers may sit idle on some core’s cache,

the speedat which a core can access its own cache for a specific memory pool withoutlocksprovides performance gains”which makes sense.

Is this also true of ring buffers?

We need to understand when packets are released back to the mempool and

with l2fwd it appears that they never are, at least not all of them.

Thanks!

On 12/17/2013 07:13 PM, Schumm, Ken wrote:

When running l2fwd the number of available mbufs returned by

rte_mempool_count() starts at 7680 on an idle system.

As traffic commences the count declines from 7680 to

5632 (expected).

A: You are right, some mbufs are kept at 2 places:

in mempool per-core cache: as you noticed, each lcore has
a cache to avoid a (more) costly access to the common pool.

also, the mbufs stay in the hardware transmission ring of the
NIC. Let’s say the size of your hw ring is 512, it means that

when transmitting the 513th mbuf, you will free the first mbuf

given to your NIC. Therefore, (hw-tx-ring-size * nb-tx-queue)

mbufs can be stored in txhw rings.

Of course, the same applies to rx rings, but it’s easier to see

it as they are filled when initializing the driver.

When choosing the number of mbufs, you need to take a value

greater than (hw-rx-ring-size * nb-rx-queue) + (hw-tx-ring-size *

nb-tx-queue) + (nb-lcores * mbuf-pool-cache-size)

Is this also true of ring buffers?

No, if you talk about rte_ring, there is no cache in this

structure.

Regards,

Olivier

Q: Do you know what the reason is for the tx rings filling up and holding on

to mbufs?

It seems they could be freed when the DMA xfer is acknowledged instead of

waiting until the ring was full.

Thanks!

Ken Schumm

A:Optimization to defer freeing.

Note, there is no interrupts with DPDK so Transmit done can not be detected

until the next transmit.

You should also look at tx_free_thresh value in the rte_eth_txconf

structure.

Several drivers use it to control when to free as in:

ixgbe_rxtx.c:

static inline uint16_t

tx_xmit_pkts(void *tx_queue, structrte_mbuf **tx_pkts,

uint16_t nb_pkts)

{

structigb_tx_queue *txq = (structigb_tx_queue *)tx_queue;

volatile union ixgbe_adv_tx_desc *tx_r = txq->tx_ring;

uint16_t n = 0;

/*

Begin scanning the H/W ring for done descriptors when the
number of available descriptors drops below tx_free_thresh. For

each done descriptor, free the associated buffer.

*/

if (txq->nb_tx_free tx_free_thresh)

ixgbe_tx_free_bufs(txq);


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM