jemalloc管理塊(arena、bin)

本文轉載自查看原文 2017-05-27 19:15 1533 jemalloc

arena是jemalloc的總的管理塊，一個進程中可以有多個arena，arena的最大個可以通過靜態變量narenas_auto，。

可通過靜態數組arenas獲取進程中所有arena的指針：

(gdb) p  narenas_auto
$359 = 2
(gdb) p *je_arenas@2
$360 = {0x7f93e02200, 0x7f93f12280}

可知，目前進程中arena的最大個數是2，它們的指針分別為0x7f93e02200，0x7f93f12280。

arena的聲明如下:

typedef struct arena_s arena_t;

struct arena_s {
    /* This arena's index within the arenas array. */
    unsigned        ind;

    /*
     * Number of threads currently assigned to this arena.  This field is
     * synchronized via atomic operations.
     */
    unsigned        nthreads;

    /*
     * There are three classes of arena operations from a locking
     * perspective:
     * 1) Thread assignment (modifies nthreads) is synchronized via atomics.
     * 2) Bin-related operations are protected by bin locks.
     * 3) Chunk- and run-related operations are protected by this mutex.
     */
    malloc_mutex_t        lock;

    arena_stats_t        stats;
    /*
     * List of tcaches for extant threads associated with this arena.
     * Stats from these are merged incrementally, and at exit if
     * opt_stats_print is enabled.
     */
    ql_head(tcache_t)    tcache_ql;

    uint64_t        prof_accumbytes;

    /*
     * PRNG state for cache index randomization of large allocation base
     * pointers.
     */
    uint64_t        offset_state;

    dss_prec_t        dss_prec;

    /*
     * In order to avoid rapid chunk allocation/deallocation when an arena
     * oscillates right on the cusp of needing a new chunk, cache the most
     * recently freed chunk.  The spare is left in the arena's chunk trees
     * until it is deleted.
     *
     * There is one spare chunk per arena, rather than one spare total, in
     * order to avoid interactions between multiple threads that could make
     * a single spare inadequate.
     */
    arena_chunk_t        *spare;

    /* Minimum ratio (log base 2) of nactive:ndirty. */
    ssize_t            lg_dirty_mult;

    /* True if a thread is currently executing arena_purge_to_limit(). */
    bool            purging;

    /* Number of pages in active runs and huge regions. */
    size_t            nactive;

    /*
     * Current count of pages within unused runs that are potentially
     * dirty, and for which madvise(... MADV_DONTNEED) has not been called.
     * By tracking this, we can institute a limit on how much dirty unused
     * memory is mapped for each arena.
     */
    size_t            ndirty;

    /*
     * Unused dirty memory this arena manages.  Dirty memory is conceptually
     * tracked as an arbitrarily interleaved LRU of dirty runs and cached
     * chunks, but the list linkage is actually semi-duplicated in order to
     * avoid extra arena_chunk_map_misc_t space overhead.
     *
     *   LRU-----------------------------------------------------------MRU
     *
     *        /-- arena ---\
     *        |            |
     *        |            |
     *        |------------|                             /- chunk -\
     *   ...->|chunks_cache|<--------------------------->|  /----\ |<--...
     *        |------------|                             |  |node| |
     *        |            |                             |  |    | |
     *        |            |    /- run -\    /- run -\   |  |    | |
     *        |            |    |       |    |       |   |  |    | |
     *        |            |    |       |    |       |   |  |    | |
     *        |------------|    |-------|    |-------|   |  |----| |
     *   ...->|runs_dirty  |<-->|rd     |<-->|rd     |<---->|rd  |<----...
     *        |------------|    |-------|    |-------|   |  |----| |
     *        |            |    |       |    |       |   |  |    | |
     *        |            |    |       |    |       |   |  \----/ |
     *        |            |    \-------/    \-------/   |         |
     *        |            |                             |         |
     *        |            |                             |         |
     *        \------------/                             \---------/
     */
    arena_runs_dirty_link_t    runs_dirty;
    extent_node_t        chunks_cache;

    /*
     * Approximate time in seconds from the creation of a set of unused
     * dirty pages until an equivalent set of unused dirty pages is purged
     * and/or reused.
     */
    ssize_t            decay_time;
    /* decay_time / SMOOTHSTEP_NSTEPS. */
    nstime_t        decay_interval;
    /*
     * Time at which the current decay interval logically started.  We do
     * not actually advance to a new epoch until sometime after it starts
     * because of scheduling and computation delays, and it is even possible
     * to completely skip epochs.  In all cases, during epoch advancement we
     * merge all relevant activity into the most recently recorded epoch.
     */
    nstime_t        decay_epoch;
    /* decay_deadline randomness generator. */
    uint64_t        decay_jitter_state;
    /*
     * Deadline for current epoch.  This is the sum of decay_interval and
     * per epoch jitter which is a uniform random variable in
     * [0..decay_interval).  Epochs always advance by precise multiples of
     * decay_interval, but we randomize the deadline to reduce the
     * likelihood of arenas purging in lockstep.
     */
    nstime_t        decay_deadline;
    /*
     * Number of dirty pages at beginning of current epoch.  During epoch
     * advancement we use the delta between decay_ndirty and ndirty to
     * determine how many dirty pages, if any, were generated, and record
     * the result in decay_backlog.
     */
    size_t            decay_ndirty;
    /*
     * Memoized result of arena_decay_backlog_npages_limit() corresponding
     * to the current contents of decay_backlog, i.e. the limit on how many
     * pages are allowed to exist for the decay epochs.
     */
    size_t            decay_backlog_npages_limit;
    /*
     * Trailing log of how many unused dirty pages were generated during
     * each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last
     * element is the most recent epoch.  Corresponding epoch times are
     * relative to decay_epoch.
     */
    size_t            decay_backlog[SMOOTHSTEP_NSTEPS];

    /* Extant huge allocations. */
    ql_head(extent_node_t)    huge;
    /* Synchronizes all huge allocation/update/deallocation. */
    malloc_mutex_t        huge_mtx;

    /*
     * Trees of chunks that were previously allocated (trees differ only in
     * node ordering).  These are used when allocating chunks, in an attempt
     * to re-use address space.  Depending on function, different tree
     * orderings are needed, which is why there are two trees with the same
     * contents.
     */
    extent_tree_t        chunks_szad_cached;
    extent_tree_t        chunks_ad_cached;
    extent_tree_t        chunks_szad_retained;
    extent_tree_t        chunks_ad_retained;

    malloc_mutex_t        chunks_mtx;
    /* Cache of nodes that were allocated via base_alloc(). */
    ql_head(extent_node_t)    node_cache;
    malloc_mutex_t        node_cache_mtx;

    /* User-configurable chunk hook functions. */
    chunk_hooks_t        chunk_hooks;

    /* bins is used to store trees of free regions. */
    arena_bin_t        bins[NBINS];

    /*
     * Quantized address-ordered trees of this arena's available runs.  The
     * trees are used for first-best-fit run allocation.
     */
    arena_run_tree_t    runs_avail[1]; /* Dynamically sized. */
};

其他成員暫時不關注，這里我們先討論bins這個arena_bin_t數組，數組大小是36，對應36種大小的region和run。

binind表示bins中的偏移，每一個binind對應一個固定大小的region。其對應關系為：

usize = index2size(binind);

size_t index2size(szind_t index)
{
    return (index2size_lookup(index));
}

size_t index2size_lookup(szind_t index)
{
    size_t ret = (size_t)index2size_tab[index];
    return (ret);
}

其中index2size_tab是靜態變量，用於表示index和size的對應關系。

通過binid，我們還可以通過靜態變量arena_bin_info獲取對應bin的其他信息：

(gdb) p je_arena_bin_info[5]
$375 = {
  reg_size = 80, 
  redzone_size = 0, 
  reg_interval = 80, 
  run_size = 20480, 
  nregs = 256, 
  bitmap_info = {
    nbits = 256, 
    ngroups = 4
  }, 
  reg0_offset = 0
}

從bin_info中可以看出這個index為5的bin，它對應的region大小(region_size)為80，run大小(run_size)為20K，region個數(nregs)為256，region需要4個bitmap來表示(ngroups)。

arena_bin_t的聲明如下：

typedef struct arena_bin_s arena_bin_t;

struct arena_bin_s {
    /*
     * All operations on runcur, runs, and stats require that lock be
     * locked.  Run allocation/deallocation are protected by the arena lock,
     * which may be acquired while holding one or more bin locks, but not
     * vise versa.
     */
    malloc_mutex_t        lock;

    /*
     * Current run being used to service allocations of this bin's size
     * class.
     */
    arena_run_t        *runcur;

    /*
     * Tree of non-full runs.  This tree is used when looking for an
     * existing run when runcur is no longer usable.  We choose the
     * non-full run that is lowest in memory; this policy tends to keep
     * objects packed well, and it can also help reduce the number of
     * almost-empty chunks.
     */
    arena_run_tree_t    runs;

    /* Bin statistics. */
    malloc_bin_stats_t    stats;
};

runcur是當前可用於分配的run，

runs是個紅黑樹，鏈接當前arena中，所有可用的相同大小region對應的run，如果runcur已滿，就會從runs里找可用的run。

stats是當前bin對應的run和region的狀態信息。

我們看一下實際運行過程中，bins中的index為5的一個arena_bin_t:

(gdb) p (*je_arenas[0])->bins[5]
$365 = {
  lock = {
    lock = {
      __private = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
    }
  }, 
  runcur = 0x7f68408ad0, 
  runs = {
    rbt_root = 0x7f78e06e38
  }, 
  stats = {
    nmalloc = 236529, 
    ndalloc = 229379, 
    nrequests = 1181919, 
    curregs = 7150, 
    nfills = 60225, 
    nflushes = 42510, 
    nruns = 64, 
    reruns = 5402, 
    curruns = 31
  }
}

其中runcur值為：

(gdb)  p /x *(*je_arenas[0])->bins[5].runcur
$373 = {
  binind = 0x5, 
  nfree = 0x87, 
  bitmap = {0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd, 0x0, 0x0, 0x0, 0x0}
}

其中，nfree表示的是當前run中空閑的region個數。

申請內存時的相關代碼如下：

static void *
arena_malloc_small(tsd_t *tsd, arena_t *arena, szind_t binind, bool zero)
{
    void *ret;
    arena_bin_t *bin;
    size_t usize;
    arena_run_t *run;

    assert(binind < NBINS);
    bin = &arena->bins[binind];
    usize = index2size(binind);

    malloc_mutex_lock(&bin->lock);
    if ((run = bin->runcur) != NULL && run->nfree > 0)
        ret = arena_run_reg_alloc(run, &arena_bin_info[binind]);
    else
        ret = arena_bin_malloc_hard(arena, bin);

bitmap中的每一bit位表示對應region的空閑狀態，0表示已使用，1表示空閑。

由於arena_bin_info[5]中的ngroup的值為4，binind為5的run，它的bitmap數組中的前4個bitmap是有效的。

即上面runcur對應的bitmap為：

0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd,

轉成二進制：

0000000000000000000000000000000000000000000000000000000000000000
1110111111110111001000000000000000000000000000000000000000000000
1111111111111101111011111111111111111111111101111111110111110111
1111111111111111111111111111111111111101111011111111111111111101

可見，bitmap中一共有135個1，也就是說free的region有135個，即0x87個。

申請內存的相關代碼如下：

JEMALLOC_INLINE_C void *
arena_run_reg_alloc(arena_run_t *run, arena_bin_info_t *bin_info)
{
    void *ret;
    size_t regind;
    arena_chunk_map_misc_t *miscelm;
    void *rpages;

    regind = (unsigned)bitmap_sfu(run->bitmap, &bin_info->bitmap_info);
    miscelm = arena_run_to_miscelm(run);
    rpages = arena_miscelm_to_rpages(miscelm);
    ret = (void *)((uintptr_t)rpages + (uintptr_t)bin_info->reg0_offset + (uintptr_t)(bin_info->reg_interval * regind));
    run->nfree--;
    return (ret);
}

其中bitmap_sfu()返回bitmap中第一個1的位置，並且將該位置0。

接下來就是通過run找對應的miscelm，再通過miscelm找到run對應的page，它的起始位置rpages。

rpages的值+regind*reg_interval（同reg_size），就能得出這個空閑的region的實際地址了。

最后再將run的nfree減一，整個內存申請過程就結束了。

中間紅色部分內容需要了解chunk header相關內容。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ProtoBuf - Arena Inside of Jemalloc LevelDB 源碼解析之 Arena jemalloc/jemalloc.h: No such file or directory windows下 jemalloc編譯 006 管理Ceph的RBD塊設備 6.7 塊管理器BlockManager Proto3：Arena分配指南 jemalloc和內存管里 jemalloc學習筆記