python deque的內在實現 本質上就是雙向鏈表所以用於stack、隊列非常方便


How collections.deque works?

 

前言:在 Python 生態中,我們經常使用 collections.deque 來實現棧、隊列這些只需要進行頭尾操作的數據結構,它的 append/pop 操作都是 O(1) 時間復雜度。list 的 pop(0) 的時間復雜度是 O(n), 在這個場景中,它的效率沒有 deque 高。那 deque 內部是怎樣實現的呢? 我從 GitHub 上挖出了 CPython collections 模塊的第二個 commit 的源碼。

dequeobject 對象定義

注釋寫得優雅了,無法進行更加精簡的總結。

/* The block length may be set to any number over 1. Larger numbers  * reduce the number of calls to the memory allocator but take more  * memory. Ideally, BLOCKLEN should be set with an eye to the  * length of a cache line.  */ #define BLOCKLEN 62 #define CENTER ((BLOCKLEN - 1) / 2)  /* A `dequeobject` is composed of a doubly-linked list of `block` nodes.  * This list is not circular (the leftmost block has leftlink==NULL,  * and the rightmost block has rightlink==NULL). A deque d's first  * element is at d.leftblock[leftindex] and its last element is at  * d.rightblock[rightindex]; note that, unlike as for Python slice  * indices, these indices are inclusive on both ends. By being inclusive  * on both ends, algorithms for left and right operations become  * symmetrical which simplifies the design.  *  * The list of blocks is never empty, so d.leftblock and d.rightblock  * are never equal to NULL.  *  * The indices, d.leftindex and d.rightindex are always in the range  * 0 <= index < BLOCKLEN.  * Their exact relationship is:  * (d.leftindex + d.len - 1) % BLOCKLEN == d.rightindex.  *  * Empty deques have d.len == 0; d.leftblock==d.rightblock;  * d.leftindex == CENTER+1; and d.rightindex == CENTER.  * Checking for d.len == 0 is the intended way to see whether d is empty.  *  * Whenever d.leftblock == d.rightblock,  * d.leftindex + d.len - 1 == d.rightindex.  *  * However, when d.leftblock != d.rightblock, d.leftindex and d.rightindex  * become indices into distinct blocks and either may be larger than the  * other.  */ typedef struct BLOCK { struct BLOCK *leftlink; struct BLOCK *rightlink; PyObject *data[BLOCKLEN]; } block; typedef struct { PyObject_HEAD block *leftblock; block *rightblock; int leftindex; /* in range(BLOCKLEN) */ int rightindex; /* in range(BLOCKLEN) */ int len; long state; /* incremented whenever the indices move */ PyObject *weakreflist; /* List of weak references */ } dequeobject;

下面是我為 Block 結構體畫的一個圖

                +----------------------------------------+
                |          data: 62 objects              |
 +----------+   |                                        |   +-----------+
 | leftlink |---|  | ... | Obj1 | Obj2 | Obj3 | ... |    |---| rightlink |
 +----------+   |           30     31     32             |   +-----------+
                +----------------------------------------+

創建一個 block

static block * newblock(block *leftlink, block *rightlink, int len) { block *b; /* To prevent len from overflowing INT_MAX on 64-bit machines, we  * refuse to allocate new blocks if the current len is dangerously  * close. There is some extra margin to prevent spurious arithmetic  * overflows at various places. The following check ensures that  * the blocks allocated to the deque, in the worst case, can only  * have INT_MAX-2 entries in total.  */ if (len >= INT_MAX - 2*BLOCKLEN) { PyErr_SetString(PyExc_OverflowError, "cannot add more blocks to the deque"); return NULL; } b = PyMem_Malloc(sizeof(block)); if (b == NULL) { PyErr_NoMemory(); return NULL; } b->leftlink = leftlink; b->rightlink = rightlink; return b; }

創建一個 dequeobject

  1. 創建一個 block
  2. 實例化一個 dequeobject Python 對象(這一塊的內在邏輯目前我也不太懂)
  3. leftblock 和 rightblock 指針都指向這個 block
  4. leftindex 是 CENTER+1,rightindex 是 CENTER
  5. 初始化其他一些屬性, len state 等

這個第一步和第四步都有點意思,第一步創建一個 block,也就是說, deque 對象創建的時候,就預先分配了一塊內存。第四步隱約告訴我們, 當元素來的時候,它先會被放在中間,然后逐漸往頭和尾散開。

static PyObject * deque_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { dequeobject *deque; block *b; if (type == &deque_type && !_PyArg_NoKeywords("deque()", kwds)) return NULL; /* create dequeobject structure */ deque = (dequeobject *)type->tp_alloc(type, 0); if (deque == NULL) return NULL; b = newblock(NULL, NULL, 0); if (b == NULL) { Py_DECREF(deque); return NULL; } assert(BLOCKLEN >= 2); deque->leftblock = b; deque->rightblock = b; deque->leftindex = CENTER + 1; deque->rightindex = CENTER; deque->len = 0; deque->state = 0; deque->weakreflist = NULL; return (PyObject *)deque; }

deque.append 實現

步驟:

  1. 如果 rightblock 可以容納更多的元素,則放在 rightblock 中
  2. 如果不能,就新建一個 block,然后更新若干指針,將元素放在更新后的 rightblock 中
static PyObject * deque_append(dequeobject *deque, PyObject *item) { deque->state++; if (deque->rightindex == BLOCKLEN-1) { block *b = newblock(deque->rightblock, NULL, deque->len); if (b == NULL) return NULL; assert(deque->rightblock->rightlink == NULL); deque->rightblock->rightlink = b; deque->rightblock = b; deque->rightindex = -1; } Py_INCREF(item); deque->len++; deque->rightindex++; deque->rightblock->data[deque->rightindex] = item; Py_RETURN_NONE; }

看了 append 實現后,我們可以自行腦補一下 pop 和 popleft 的實現。

小結

deque 內部將一組內存塊組織成雙向鏈表的形式,每個內存塊可以看成一個 Python 對象的數組, 這個數組與普通數據不同,它是從數組中部往頭尾兩邊填充數據,而平常所見數組大都是從頭往后。 得益於 deque 這樣的結構,它的 pop/popleft/append/appendleft 四種操作的時間復雜度均是 O(1), 用它來實現隊列、棧數據結構會非常方便和高效。但也正因為這樣的設計, 它不能像數組那樣通過 index 來訪問、移除元素。鏈表 + 數組、或者鏈表 + 字典 這樣的設計在實踐中有很廣泛的應用,比如 LRUCache, LFUCache,有興趣的同鞋可以繼續探索。

  • PS1: LRUCache 在面試中不要太常見
  • PS2: 出 LFUCache 題的面試官都是變態
  • PS3: 頭圖來自 quora ,圖文不怎么有關系列


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM