[原]Memcached源碼剖析系列之內存存儲機制（一）

本文轉載自查看原文 2012-05-14 14:48 3623 memcached

一內存分配管理機制

memcached是一個高性能的，分布式內存對象緩存系統，用於在動態系統中減少數據庫負載，提升性能。memcached有一個很有特色的內存管理方式，為了提高效率，默認情況下采用了名為Slab Allocator的機制分配管理內存空間。

memcached文檔中關於slab allocator有這么一段話：

the primary goal of the slabs subsystem in memcached was to eliminate memory fragmentation issues totally by using fixed-size memory chunks coming from a few predetermined size classes.

由此，我們可以看出，memcached使用預申請內存並分組成特定塊的方式，旨在解決內存碎片的問題。

Memcached的內存管理方式還是比較簡單易懂的，使用的是slab->chunk的組織方式管理內存。Slab是Memcached進行內存申請的最小單位，默認一般為1MB，可使用命令行參數進行自定義設置。然后使用分塊機制將slab分成一定大小分成若干個chunks。如下圖所示(此圖來源於網絡)：

二源碼分析

1 關鍵數據結構

（1）settings結構體原型：

/* When adding a setting, be sure to update process_stat_settings */
/**
 * Globally accessible settings as derived from the commandline.
 */
struct settings {
    //最大內存， 默認64M，最大2G。通過-m 設定
    size_t maxbytes;
    //最大連接數，默認1024 通過-c設定
    int maxconns;
    //tcp 端口號，通過-p 設置
    int port;
    //ucp 端口號，通過-U 設置
    int udpport;
    //監聽IP或SOCKET地址 ，通過-l設定
    char *inter;
    //是否輸出debug信息。由-v,-vvv參數設定
    int verbose;
    //時間設定，當使用flsuh時，只需要修改本值，當取出的值時間小於本值時，將被忽略。
    rel_time_t oldest_live; /* ignore existing items older than this */
    //當內存存滿時，是否淘汰老數據。默認是是。可用-M修改為否。此時內容耗盡時，新插入數據時將返回失敗。
    int evict_to_free;
    //socket模式，使用-s設定。
    char *socketpath;   /* path to unix socket if using local socket */
    //socket文件的文件權限，使用-a設定
    int access;  /* access mask (a la chmod) for unix domain socket */
    //slab分配增量因子，默認圍1.25， 可通過-f設定
    double factor;          /* chunk size growth factor */
    //給一個key+value+flags 分配的最小字節數。 默認值為48. 可通過-n修改。
    int chunk_size;
    //工作線程數。默認圍4， 可通過-t設定
    int num_threads;        /* number of worker (without dispatcher) libevent threads to run */
    //狀態詳情的key前綴
    char prefix_delimiter;  /* character that marks a key prefix (for stats) */
    //開啟狀態詳情記錄
    int detail_enabled;     /* nonzero if we're collecting detailed stats */
    //每個event處理的請求數
    int reqs_per_event;     /* Maximum number of io to process on each  io-event. */
//開啟cas，"cas"是一個存儲檢查操作。用來檢查臟數據的存操作。在取出數據后，如果沒有其他人修改此數據時，本進程才能夠存儲數據。默認為開啟。需要版本：1.3+
    bool use_cas;
    //使用協議， 試過-B參數設定。 可能值為：ascii, binary, or auto， 版本： 1.4.0+
    enum protocol binding_protocol;
    //等待處理的排隊隊列長度。默認值為1024.
    int backlog;
     //單個item最大字計數。默認1M。可通過-I參數修改。在1.4.2版本之后，這個值可以大於1M，必須小於128M。但memcached會拋出警告，大於1M將導致整體運行內存的增加和內存性能的降低。 版本： 1.4.2+
    int item_size_max;        /* Maximum item size, and upper end for slabs */
    //是否開啟sasl
    bool sasl;              /* SASL on/off */
};

（2）item結構體原型：

typedef struct _stritem {
    struct _stritem *next;
    struct _stritem *prev;
    struct _stritem *h_next;    /* hash chain next */
    rel_time_t      time;       /* least recent access */
    rel_time_t      exptime;    /* expire time */
    int             nbytes;     /* size of data */
    unsigned short  refcount;
    uint8_t         nsuffix;    /* length of flags-and-length string */
    uint8_t         it_flags;   /* ITEM_* above */
    uint8_t         slabs_clsid;/* which slab class we're in */
    uint8_t         nkey;       /* key length, w/terminating null and padding */
    /* this odd type prevents type-punning issues when we do
     * the little shuffle to save space when not using CAS. */
    union {
        uint64_t cas;
        char end;
    } data[];
    /* if it_flags & ITEM_CAS we have 8 bytes CAS */
    /* then null-terminated key */
    /* then " flags length\r\n" (no terminating null) */
    /* then data with terminating \r\n (no terminating null; it's binary!) */
} item;

（3）slabclass_t結構體原型

typedef struct {
    unsigned int size;      /* sizes of items */
    unsigned int perslab;   /* how many items per slab */
    void **slots;           /* list of item ptrs */
    unsigned int sl_total;  /* size of previous array */
    unsigned int sl_curr;   /* first free slot */
    void *end_page_ptr;         /* pointer to next free item at end of page, or 0 */
    unsigned int end_page_free; /* number of items remaining at end of last alloced page */
    unsigned int slabs;     /* how many slabs were allocated for this class */
    void **slab_list;       /* array of slab pointers */
    unsigned int list_size; /* size of prev array */
    unsigned int killing;  /* index+1 of dying slab, or zero if none */
    size_t requested; /* The number of requested bytes */
} slabclass_t;

（4）memcatchd.c文件中定義的部分宏

#define POWER_SMALLEST 1
#define POWER_LARGEST  200
#define CHUNK_ALIGN_BYTES 8
#define DONT_PREALLOC_SLABS
#define MAX_NUMBER_OF_SLAB_CLASSES (POWER_LARGEST + 1)

2 分配算法的實現

（1）memcatchd.c中main函數中運行狀態的初始化

int main()
{
    …
    settings_init();
    …
    //利用命令行參數信息，對setting進行設置
    while (-1 != (c = getopt(argc, argv,…)
    {…}
    …
    //settings.factor 初始化為1.25,可以使用命令行參數-f進行設置
    slabs_init(settings.maxbytes, settings.factor, preallocate);
}

settings_init()是初始化全局變量settings函數,在memcatchd.c文件實現

static void settings_init(void) {
    settings.use_cas = true;
    settings.access = 0700;
    settings.port = 11211;
    settings.udpport = 11211;
    /* By default this string should be NULL for getaddrinfo() */
    settings.inter = NULL;
    settings.maxbytes = 64 * 1024 * 1024; /* default is 64MB */
    settings.maxconns = 1024;         /* to limit connections-related memory to about 5MB */
    settings.verbose = 0;
    settings.oldest_live = 0;
    settings.evict_to_free = 1;       /* push old items out of cache when memory runs out */
    settings.socketpath = NULL;       /* by default, not using a unix socket */
    settings.factor = 1.25;
    settings.chunk_size = 48;         /* space for a modest key and value */
    settings.num_threads = 4;         /* N workers */
    settings.num_threads_per_udp = 0;
    settings.prefix_delimiter = ':';
    settings.detail_enabled = 0;
    settings.reqs_per_event = 20;
    settings.backlog = 1024;
    settings.binding_protocol = negotiating_prot;
    settings.item_size_max = 1024 * 1024; /* The famous 1MB upper limit. */
}

從該設置setting的初始化函數可看出，settings.item_size_max = 1024 * 1024; 即每個slab默認的空間大小為1MB，settings.factor = 1.25; 默認設置item的size步長增長因子為1.25。使用命令行參數對setting進行定制后，調用slabs_init函數，根據配置的setting來初始化slabclass。slabs_init函數於Slabs.c文件中實現：

// slabs管理器初始化函數：limit默認64MB，prealloc默認false，可使用命令行參數’L’進行設置。
void slabs_init(const size_t limit, const double factor, const bool prealloc) {
    int i = POWER_SMALLEST - 1;	//#define POWER_SMALLEST 1;i初始化為0
    //item(_stritem):storing items within memcached
    unsigned int size = sizeof(item) + settings.chunk_size;//chunk_size:48 
    mem_limit = limit;  //limit默認64MB
	//預分配為真時:
    if (prealloc) {
        /* Allocate everything in a big chunk with malloc */
        mem_base = malloc(mem_limit);
        if (mem_base != NULL) {
			//mem_current:靜態變量，記錄分配內存塊的基地址
			//mem_avail:可用內存大小
            mem_current = mem_base;
            mem_avail = mem_limit;
        } else {
            fprintf(stderr, "Warning: Failed to allocate requested memory in"
                    " one large chunk.\nWill allocate in smaller chunks\n");
        }
    }
    //static slabclass_t slabclass[MAX_NUMBER_OF_SLAB_CLASSES];
    //#define MAX_NUMBER_OF_SLAB_CLASSES (POWER_LARGEST + 1)
    //#define POWER_LARGEST  200
    memset(slabclass, 0, sizeof(slabclass));
    // /* settings.item_size_max: Maximum item size, and upper end for slabs,默認為1MB */
    //item核心分配算法
    while (++i < POWER_LARGEST && size <= settings.item_size_max / factor) {
        /* Make sure items are always n-byte aligned */
		//#define CHUNK_ALIGN_BYTES 8
        if (size % CHUNK_ALIGN_BYTES)    //確保size為CHUNK_ALIGN_BYTES的倍數，不夠則向補足
            size += CHUNK_ALIGN_BYTES - (size % CHUNK_ALIGN_BYTES);
        slabclass[i].size = size;
        slabclass[i].perslab = settings.item_size_max / slabclass[i].size;  //記錄每個slab中item的個數
        size *= factor;   //每次循環size的大小都增加factor倍
        if (settings.verbose > 1) {
            fprintf(stderr, "slab class %3d: chunk size %9u perslab %7u\n",
                    i, slabclass[i].size, slabclass[i].perslab);
        }
    }
    //補足一塊大小為item_size_max的塊
     power_largest = i; 
    slabclass[power_largest].size = settings.item_size_max;
    slabclass[power_largest].perslab = 1;
    if (settings.verbose > 1) {
        fprintf(stderr, "slab class %3d: chunk size %9u perslab %7u\n",
                i, slabclass[i].size, slabclass[i].perslab);
    }
    /* for the test suite:  faking of how much we've already malloc'd */
    {
        char *t_initial_malloc = getenv("T_MEMD_INITIAL_MALLOC");
        if (t_initial_malloc) {
            mem_malloced = (size_t)atol(t_initial_malloc);
        }
    }
#ifndef DONT_PREALLOC_SLABS  //已經定義了
    {
        char *pre_alloc = getenv("T_MEMD_SLABS_ALLOC");

        if (pre_alloc == NULL || atoi(pre_alloc) != 0) {
            slabs_preallocate(power_largest);
        }
    }
#endif
}

在memcached的內存管理機制中，使用了一個slabclass_t類型（類型聲明見上“關鍵數據結構”一節）的數組slabclass對划分的slab及進行統一的管理

slabclass的聲明：static slabclass_t slabclass[MAX_NUMBER_OF_SLAB_CLASSES]；

每一個slab被划分為若干個chunk，每個chunk里保存一個item，每個item同時包含了item結構體、key和value（注意在memcached中的value是只有字符串的）。slab按照自己的id分別組成鏈表，這些鏈表又按id掛在一個slabclass數組上，整個結構看起來有點像二維數組。

在定位item時，使用slabs_clsid函數，傳入參數為item大小，返回值為classid：

/*
 * Figures out which slab class (chunk size) is required to store an item of
 * a given size.
  * Given object size, return id to use when allocating/freeing memory for object
 * 0 means error: can't store such a large object
 */
unsigned int slabs_clsid(const size_t size) {
    int res = POWER_SMALLEST;
    if (size == 0)
        return 0;
    while (size > slabclass[res].size)
        if (res++ == power_largest)     /* won't fit in the biggest slab */
            return 0;  //分配的值不能滿足
    return res;  //返回第一個大於size的索引值
}

根據返回的索引值即可定位到滿足該size的slabclass項。從源碼中可以看出：chunk的size初始值為sizeof(item)+settings.chunk_size(key 和 value所使用的最小空間，默認為48)；chunk的大小以factor的倍數進行增長，最高為slab的最大值的一半，最后一個slab的大小為slab的最大值，這也是memcached所能允許分配的最大的item值。

本小節到此結束，在下一小節中將繼續分析memcached的存儲機制並分析該機制的優缺點。

注：本系列文章基於memcached-1.4.6版本進行分析。

reference:

[1] http://blog.developers.api.sina.com.cn/?p=124&cpage=1#comment-1506

[2] http://kb.cnblogs.com/page/42732/

作者：lgp88 發表於2012-5-14 14:47:54 原文鏈接

閱讀：176 評論：0 查看評論

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Memcached內存存儲 spark 源碼分析之十六 -- Spark內存存儲剖析 JavaScript學習系列之內存模型篇深入剖析 RocketMQ 源碼 - 負載均衡機制 iOS開發系列—Objective-C之內存管理 [原]tornado源碼分析系列（一）[tornado簡介] [原]tornado源碼分析系列（五）[HTTPServer 層] [原]tornado 源碼分析系列目錄 SQLite剖析之內核研究《深度剖析CPython解釋器》28. Python內存管理與垃圾回收(第二部分)：源碼解密Python中的垃圾回收機制