Redis源碼剖析之壓縮列表(ziplist)

本文轉載自查看原文 2020-10-05 08:51 412

本來打算只用一篇文章來講解Redis中的list，在實際寫作過程中發現Redis中有多種list的實現，所以准備拆成多篇文章，本文主要講ziplist，ziplist也是quicklist的基礎。另外還有skiplist，skiplist雖然是list，當主要和set命令相關，所以會放到后面。
本文主要涉及到的源碼在ziplist.c

何為ziplist？我們可以在ziplist.c源碼頭部找到一段Redis作者的一段介紹。

The ziplist is a specially encoded dually linked list that is designed to be very memory efficient. It stores both strings and integer values, where integers are encoded as actual integers instead of a series of characters. It allows push and pop operations on either side of the list in O(1) time.However, because every operation requires a reallocation of the memory used by the ziplist, the actual complexity is related to the amount of memory used by the ziplist.

ziplist是為了提高存儲效率而設計的一種特殊編碼的雙向鏈表。它可以存儲字符串或者整數，存儲整數時是采用整數的二進制而不是字符串形式存儲。他能在O(1)的時間復雜度下完成list兩端的push和pop操作。但是因為每次操作都需要重新分配ziplist的內存，所以實際復雜度和ziplist的內存使用量相關。

前半句還好理解，但每次操作都需要重新分配內存…… 就有點耐人尋味了。別急，你看完ziplist的具體實現就懂了。

ziplist在邏輯上是個雙向鏈表，但它是存儲在一大塊連續的內存空間上的。與其說ziplist是個數據結構，倒不如說他是Redis中雙向鏈表的序列化存儲方式。

ziplist結構

整個ziplist在內存中的存儲格式如下：
在這里插入圖片描述
ziplist主要有這么幾個部分：

zlbytes: 32位無符號整型，表示整個ziplist所占的空間大小，包含了zlbytes所占的4個字節。
這個字段可以在重置整個ziplist大小時不需要遍歷整個list來確定大小，空間換時間。
zltail: 32位無符號整型，表示整個list中最后一項所在的偏移量，方便在尾部做pop操作。
zllen: 16位，表示ziplist中所存儲的entry數量，但是注意，這里最多表示$2^{16} -2$個entry，如果是$2^{{16}-1$有特殊含義，$2}{16}-1$表示存儲數量超過了$2^{16}-2$個，但具體是多少個得遍歷一次才能知道。
zlend: 8位，ziplist的末尾表示，值固定是255.
entry: 不定長，可能有多個，list中具體的數據項，下面會詳細介紹。

entry

這里最核心的就是entry的數據格式，entry還真有些復雜，從上圖中可以看出它主要有三個部分。

prelen: 前一個entry的存儲大小，主要是為了方便從后往前遍歷。
encoding: 數據的編碼形式（字符串還是數字，長度是多少）
data: 實際存儲的數據

比較復雜的是Redis為了節省內存空間，對上面三個字段設計了一套比較復雜的編碼方式，本質上就是一套變長的編碼協議，具體規則如下：

prelen

如果prelen數值小於254，那就只用一個字節來表示長度，如果長度大於等於254就用5個字節，第一個字節是固定值254(FE)來標識這是個特殊的數據，剩下的4個字節來表示實際的長度。

encoding

encoding的具體值取決於entry中具體的內容，當entry是個string時，encoding的前兩字節存儲了字符串的長度。當entry是一個整數的時候，前兩字節默認都是1，后面兩字節標識出后面存的是哪種類型的整數，第一個字節就足夠判斷出entry是什么類型了。不同的encoding類型示例如下:

|00pppppp| - 1字節

長度小於或者等於63的String類型，'pppppp'無符號6位數標識string長度。

|01pppppp|qqqqqqqq| - 2字節

長度小於或者等於16383的String類型(14位)，注意：14位'pppppp'采用大端的方式存儲

|10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5字節

長度大於等於16384的String類型，第二字節開始的qqrrsstt都是用來存儲字符串長度的二進制位，可表示的字符串長度最大2^32-1，第一字節的低6位沒有用，所以都是0。
注意: 32位數采用大端的方式存儲

|11000000| - 3字節

存儲int16_t (2字節).

|11010000| - 5字節

存儲int32_t (4字節).

|11100000| - 9字節

存儲int64_t (8字節).

|11110000| - 4字節

24位有符號類型整數 (3字節).

|11111110| - 2字節

8位有符號類型整數 (1字節).

|1111xxxx| - (xxxx在0001和1101之間) 4位無符號整數.

0到12的無符號整數.編碼值實際上是從1到13，因為0000和1111不能使用，要留出一位表示0，所以應該從編碼值中減去1才是准確值

在某些比較小的數值下，具體值可以直接存儲到encoding字段里。

ziplist的API

ziplist.c代碼也較多，雙鏈表操作很多代碼在ziplist中比較多，其實本質上都是它這復雜的存儲格式導致的，實際上理解了它的編碼格式，具體代碼不難理解。這里我只列出幾個我認為比較重要的API，其他可以參考源碼ziplist.c。

ziplist其實只是一種雙向隊列的序列化方式，是在內存中的存儲格式，實際上並不能直接拿過來用，用戶看到的ziplist只是一個char *指針，其中每個entry在實際使用中還需要反序列化成zlentry方便調用。

typedef struct zlentry {
    unsigned int prevrawlensize; /* 內存中編碼后的prevrawlen用了多少字節 */
    unsigned int prevrawlen;     /* 前一個entry占用的長度，主要是為了entry之間跳轉 */
    unsigned int lensize;        /* 內存中編碼后的len用了多少字節 */
    unsigned int len;            /* 當前entry的長度，如果是string則表示string的長度，如果是整數，則len依賴於具體數值大小。*/
    unsigned int headersize;     /* prevrawlensize + lensize. entry的head部分用了多少字節 */
    unsigned char encoding;      /* 當前entry的編碼格式 */
    unsigned char *p;            /* 指向數據域的指針 */
} zlentry;

另外有一點，ziplist在內存中是高度緊湊的連續存儲，這意味着它起始對修改並不友好，如果要對ziplist做修改類的操作，那就需重新分配新的內存來存儲新的ziplist，代價很大，具體插入和刪除的代碼如下。

/* 在p位置插入數據 *s. */
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
    unsigned int prevlensize, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; /* initialized to avoid warning. Using a value
                                    that is easy to see if for some reason
                                    we use it uninitialized. */
    zlentry tail;

    /* 找到前一個節點計算出prevlensize和prevlen */
    if (p[0] != ZIP_END) {
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else {
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        if (ptail[0] != ZIP_END) {
            prevlen = zipRawEntryLength(ptail);
        }
    }

    /* See if the entry can be encoded */
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        /* 'encoding' is set to the appropriate integer encoding */
        reqlen = zipIntSize(encoding);
    } else {
        /* 'encoding' is untouched, however zipStoreEntryEncoding will use the
         * string length to figure out how to encode it. */
        reqlen = slen;
    }
    /* We need space for both the length of the previous entry and
     * the length of the payload. */
    reqlen += zipStorePrevEntryLength(NULL,prevlen);
    reqlen += zipStoreEntryEncoding(NULL,encoding,slen);

    /* When the insert position is not equal to the tail, we need to
     * make sure that the next entry can hold this entry's length in
     * its prevlen field. */
    int forcelarge = 0;
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
    if (nextdiff == -4 && reqlen < 4) {
        nextdiff = 0;
        forcelarge = 1;
    }

    /* Store offset because a realloc may change the address of zl. */
    offset = p-zl;
    // 計算出需要的內存容量，然后重新生成一個新大小的zl替換掉原來的zl。
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;

    /* 遷移數據，然后更新tail的offset */
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        /* Encode this entry's raw length in the next entry. */
        if (forcelarge)
            zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
        else
            zipStorePrevEntryLength(p+reqlen,reqlen);

        /* Update offset for tail */
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        /* When the tail contains more than one entry, we need to take
         * "nextdiff" in account as well. Otherwise, a change in the
         * size of prevlen doesn't have an effect on the *tail* offset. */
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        /* This element will be the new tail. */
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    /* When nextdiff != 0, the raw length of the next entry has changed, so
     * we need to cascade the update throughout the ziplist */
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    /* 寫入數據 */
    p += zipStorePrevEntryLength(p,prevlen);
    p += zipStoreEntryEncoding(p,encoding,slen);
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

ziplist節點刪除

unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
    unsigned int i, totlen, deleted = 0;
    size_t offset;
    int nextdiff = 0;
    zlentry first, tail;

    zipEntry(p, &first);
    for (i = 0; p[0] != ZIP_END && i < num; i++) {
        p += zipRawEntryLength(p);
        deleted++;
    }

    totlen = p-first.p; /* 刪除元素后減少的內存空間(字節) */
    if (totlen > 0) {
        if (p[0] != ZIP_END) {
            /* Storing `prevrawlen` in this entry may increase or decrease the
             * number of bytes required compare to the current `prevrawlen`.
             * There always is room to store this, because it was previously
             * stored by an entry that is now being deleted. */
            nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);

            /* Note that there is always space when p jumps backward: if
             * the new previous entry is large, one of the deleted elements
             * had a 5 bytes prevlen header, so there is for sure at least
             * 5 bytes free and we need just 4. */
            p -= nextdiff;
            zipStorePrevEntryLength(p,first.prevrawlen);

            /* Update offset for tail */
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);

            /* When the tail contains more than one entry, we need to take
             * "nextdiff" in account as well. Otherwise, a change in the
             * size of prevlen doesn't have an effect on the *tail* offset. */
            zipEntry(p, &tail);
            if (p[tail.headersize+tail.len] != ZIP_END) {
                ZIPLIST_TAIL_OFFSET(zl) =
                   intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
            }

            /* 把tail移動到ziplist的前面*/
            memmove(first.p,p,
                intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);
        } else {
            /* The entire tail was deleted. No need to move memory. */
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe((first.p-zl)-first.prevrawlen);
        }

        /* 更新ziplist大小 */
        offset = first.p-zl;
        zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
        ZIPLIST_INCR_LENGTH(zl,-deleted); // 更新zllen 
        p = zl+offset;

        /* When nextdiff != 0, the raw length of the next entry has changed, so
         * we need to cascade the update throughout the ziplist */
        if (nextdiff != 0)
            zl = __ziplistCascadeUpdate(zl,p);
    }
    return zl;
}

插入刪除的基本邏輯都是類似，先定位，然后算插入/刪除后所需的內存空間變化，根據計算出來新的空間大小對zl做ziplistResize()，然后更新zl的元信息。
除了插入刪除外，像ziplistPush ziplistMerge，這這種帶改動的API，最后都調用了 ziplistResize， ziplistResize代碼如下：

unsigned char *ziplistResize(unsigned char *zl, unsigned int len) {
    zl = zrealloc(zl,len);
    ZIPLIST_BYTES(zl) = intrev32ifbe(len);
    zl[len-1] = ZIP_END;
    return zl;
}

看起來很簡短，其實大量的邏輯都在zrealloc中，zrealloc是個宏定義(突然感覺c的宏定義很騷)，其實主要邏輯就是申請一塊長度為len的空間，然后釋放原來zl所指向的空間。這里可以看出 ziplist修改的代價是很高的 ，如果在使用中有頻繁更新list的操作，建議對list相關的配置做些優化。

其他API

具體API定義列表見源碼ziplist.h

unsigned char *ziplistNew(void);  // 新建ziplist
unsigned char *ziplistMerge(unsigned char **first, unsigned char **second);  // 合並兩個ziplist 
unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where); // 在ziplist頭部或者尾部push一個節點 
unsigned char *ziplistIndex(unsigned char *zl, int index); // 找到某個下標的節點  
unsigned char *ziplistNext(unsigned char *zl, unsigned char *p);  // 找到p節點的下一個節點 
unsigned char *ziplistPrev(unsigned char *zl, unsigned char *p);  // 找到p節點的前一個節點  
unsigned int ziplistGet(unsigned char *p, unsigned char **sval, unsigned int *slen, long long *lval);  // 獲取entry中存儲的具體內容
unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen);  // 插入
unsigned char *ziplistDelete(unsigned char *zl, unsigned char **p); // 刪除  
unsigned char *ziplistDeleteRange(unsigned char *zl, int index, unsigned int num); // 刪除某個下標區間內的節點 
unsigned int ziplistCompare(unsigned char *p, unsigned char *s, unsigned int slen);  // 比較兩個節點的大小 
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip); // 找到某個特定值的節點
unsigned int ziplistLen(unsigned char *zl);  // ziplist的長度  
size_t ziplistBlobLen(unsigned char *zl);  // ziplist的存儲空間大小 
void ziplistRepr(unsigned char *zl);   //

結語

ziplist其實是一個邏輯上的雙向鏈表，可以快速找到頭節點和尾節點，然后每個節點(entry)中也包含指向前/后節點的"指針"，但作者為了將內存節省到極致，摒棄了傳統的鏈表設計(前后指針需要16字節的空間，而且會導致內存碎片化嚴重)，設計出了內存非常緊湊的存儲格式。內存是省下來了，但操作復雜性也更新的復雜度上來了，當然Redis作者也考慮了這點，所以也設計出了ziplist和傳統雙向鏈表的折中——quicklist，我們將在下一篇博文中詳細介紹quicklist。

本文是Redis源碼剖析系列博文，同時也有與之對應的Redis中文注釋版，有想深入學習Redis的同學，歡迎star和關注。
Redis中文注解版倉庫：https://github.com/xindoo/Redis
Redis源碼剖析專欄：https://zxs.io/s/1h
本文來自https://blog.csdn.net/xindoo

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Redis 源碼解析 5：壓縮列表 ziplist Redis學習之ziplist壓縮列表源碼分析 Redis之壓縮列表ziplist redis 底層數據結構壓縮列表 ziplist Redis之ziplist源碼分析 Redis04：底層：壓縮列表ziplist、intset、緊湊列表listpack Redis源碼剖析之快速列表(quicklist) Redis核心原理與實踐--列表實現原理之ziplist [Redis] 哈希類型與ziplist redis-ziplist