arena是jemalloc的總的管理塊,一個進程中可以有多個arena,arena的最大個可以通過靜態變量narenas_auto,。
可通過靜態數組arenas獲取進程中所有arena的指針:
(gdb) p narenas_auto $359 = 2 (gdb) p *je_arenas@2 $360 = {0x7f93e02200, 0x7f93f12280}
可知,目前進程中arena的最大個數是2,它們的指針分別為0x7f93e02200,0x7f93f12280。
arena的聲明如下:
typedef struct arena_s arena_t; struct arena_s { /* This arena's index within the arenas array. */ unsigned ind; /* * Number of threads currently assigned to this arena. This field is * synchronized via atomic operations. */ unsigned nthreads; /* * There are three classes of arena operations from a locking * perspective: * 1) Thread assignment (modifies nthreads) is synchronized via atomics. * 2) Bin-related operations are protected by bin locks. * 3) Chunk- and run-related operations are protected by this mutex. */ malloc_mutex_t lock; arena_stats_t stats; /* * List of tcaches for extant threads associated with this arena. * Stats from these are merged incrementally, and at exit if * opt_stats_print is enabled. */ ql_head(tcache_t) tcache_ql; uint64_t prof_accumbytes; /* * PRNG state for cache index randomization of large allocation base * pointers. */ uint64_t offset_state; dss_prec_t dss_prec; /* * In order to avoid rapid chunk allocation/deallocation when an arena * oscillates right on the cusp of needing a new chunk, cache the most * recently freed chunk. The spare is left in the arena's chunk trees * until it is deleted. * * There is one spare chunk per arena, rather than one spare total, in * order to avoid interactions between multiple threads that could make * a single spare inadequate. */ arena_chunk_t *spare; /* Minimum ratio (log base 2) of nactive:ndirty. */ ssize_t lg_dirty_mult; /* True if a thread is currently executing arena_purge_to_limit(). */ bool purging; /* Number of pages in active runs and huge regions. */ size_t nactive; /* * Current count of pages within unused runs that are potentially * dirty, and for which madvise(... MADV_DONTNEED) has not been called. * By tracking this, we can institute a limit on how much dirty unused * memory is mapped for each arena. */ size_t ndirty; /* * Unused dirty memory this arena manages. Dirty memory is conceptually * tracked as an arbitrarily interleaved LRU of dirty runs and cached * chunks, but the list linkage is actually semi-duplicated in order to * avoid extra arena_chunk_map_misc_t space overhead. * * LRU-----------------------------------------------------------MRU * * /-- arena ---\ * | | * | | * |------------| /- chunk -\ * ...->|chunks_cache|<--------------------------->| /----\ |<--... * |------------| | |node| | * | | | | | | * | | /- run -\ /- run -\ | | | | * | | | | | | | | | | * | | | | | | | | | | * |------------| |-------| |-------| | |----| | * ...->|runs_dirty |<-->|rd |<-->|rd |<---->|rd |<----... * |------------| |-------| |-------| | |----| | * | | | | | | | | | | * | | | | | | | \----/ | * | | \-------/ \-------/ | | * | | | | * | | | | * \------------/ \---------/ */ arena_runs_dirty_link_t runs_dirty; extent_node_t chunks_cache; /* * Approximate time in seconds from the creation of a set of unused * dirty pages until an equivalent set of unused dirty pages is purged * and/or reused. */ ssize_t decay_time; /* decay_time / SMOOTHSTEP_NSTEPS. */ nstime_t decay_interval; /* * Time at which the current decay interval logically started. We do * not actually advance to a new epoch until sometime after it starts * because of scheduling and computation delays, and it is even possible * to completely skip epochs. In all cases, during epoch advancement we * merge all relevant activity into the most recently recorded epoch. */ nstime_t decay_epoch; /* decay_deadline randomness generator. */ uint64_t decay_jitter_state; /* * Deadline for current epoch. This is the sum of decay_interval and * per epoch jitter which is a uniform random variable in * [0..decay_interval). Epochs always advance by precise multiples of * decay_interval, but we randomize the deadline to reduce the * likelihood of arenas purging in lockstep. */ nstime_t decay_deadline; /* * Number of dirty pages at beginning of current epoch. During epoch * advancement we use the delta between decay_ndirty and ndirty to * determine how many dirty pages, if any, were generated, and record * the result in decay_backlog. */ size_t decay_ndirty; /* * Memoized result of arena_decay_backlog_npages_limit() corresponding * to the current contents of decay_backlog, i.e. the limit on how many * pages are allowed to exist for the decay epochs. */ size_t decay_backlog_npages_limit; /* * Trailing log of how many unused dirty pages were generated during * each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last * element is the most recent epoch. Corresponding epoch times are * relative to decay_epoch. */ size_t decay_backlog[SMOOTHSTEP_NSTEPS]; /* Extant huge allocations. */ ql_head(extent_node_t) huge; /* Synchronizes all huge allocation/update/deallocation. */ malloc_mutex_t huge_mtx; /* * Trees of chunks that were previously allocated (trees differ only in * node ordering). These are used when allocating chunks, in an attempt * to re-use address space. Depending on function, different tree * orderings are needed, which is why there are two trees with the same * contents. */ extent_tree_t chunks_szad_cached; extent_tree_t chunks_ad_cached; extent_tree_t chunks_szad_retained; extent_tree_t chunks_ad_retained; malloc_mutex_t chunks_mtx; /* Cache of nodes that were allocated via base_alloc(). */ ql_head(extent_node_t) node_cache; malloc_mutex_t node_cache_mtx; /* User-configurable chunk hook functions. */ chunk_hooks_t chunk_hooks; /* bins is used to store trees of free regions. */ arena_bin_t bins[NBINS]; /* * Quantized address-ordered trees of this arena's available runs. The * trees are used for first-best-fit run allocation. */ arena_run_tree_t runs_avail[1]; /* Dynamically sized. */ };
其他成員暫時不關注,這里我們先討論bins這個arena_bin_t數組,數組大小是36,對應36種大小的region和run。
binind表示bins中的偏移,每一個binind對應一個固定大小的region。其對應關系為:
usize = index2size(binind); size_t index2size(szind_t index) { return (index2size_lookup(index)); } size_t index2size_lookup(szind_t index) { size_t ret = (size_t)index2size_tab[index]; return (ret); }
其中index2size_tab是靜態變量,用於表示index和size的對應關系。
通過binid,我們還可以通過靜態變量arena_bin_info獲取對應bin的其他信息:
(gdb) p je_arena_bin_info[5] $375 = { reg_size = 80, redzone_size = 0, reg_interval = 80, run_size = 20480, nregs = 256, bitmap_info = { nbits = 256, ngroups = 4 }, reg0_offset = 0 }
從bin_info中可以看出這個index為5的bin,它對應的region大小(region_size)為80,run大小(run_size)為20K,region個數(nregs)為256,region需要4個bitmap來表示(ngroups)。
arena_bin_t的聲明如下:
typedef struct arena_bin_s arena_bin_t; struct arena_bin_s { /* * All operations on runcur, runs, and stats require that lock be * locked. Run allocation/deallocation are protected by the arena lock, * which may be acquired while holding one or more bin locks, but not * vise versa. */ malloc_mutex_t lock; /* * Current run being used to service allocations of this bin's size * class. */ arena_run_t *runcur; /* * Tree of non-full runs. This tree is used when looking for an * existing run when runcur is no longer usable. We choose the * non-full run that is lowest in memory; this policy tends to keep * objects packed well, and it can also help reduce the number of * almost-empty chunks. */ arena_run_tree_t runs; /* Bin statistics. */ malloc_bin_stats_t stats; };
runcur是當前可用於分配的run,
runs是個紅黑樹,鏈接當前arena中,所有可用的相同大小region對應的run,如果runcur已滿,就會從runs里找可用的run。
stats是當前bin對應的run和region的狀態信息。
我們看一下實際運行過程中,bins中的index為5的一個arena_bin_t:
(gdb) p (*je_arenas[0])->bins[5] $365 = { lock = { lock = { __private = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0} } }, runcur = 0x7f68408ad0, runs = { rbt_root = 0x7f78e06e38 }, stats = { nmalloc = 236529, ndalloc = 229379, nrequests = 1181919, curregs = 7150, nfills = 60225, nflushes = 42510, nruns = 64, reruns = 5402, curruns = 31 } }
其中runcur值為:
(gdb) p /x *(*je_arenas[0])->bins[5].runcur $373 = { binind = 0x5, nfree = 0x87, bitmap = {0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd, 0x0, 0x0, 0x0, 0x0} }
其中,nfree表示的是當前run中空閑的region個數。
申請內存時的相關代碼如下:
static void * arena_malloc_small(tsd_t *tsd, arena_t *arena, szind_t binind, bool zero) { void *ret; arena_bin_t *bin; size_t usize; arena_run_t *run; assert(binind < NBINS); bin = &arena->bins[binind]; usize = index2size(binind); malloc_mutex_lock(&bin->lock); if ((run = bin->runcur) != NULL && run->nfree > 0) ret = arena_run_reg_alloc(run, &arena_bin_info[binind]); else ret = arena_bin_malloc_hard(arena, bin);
bitmap中的每一bit位表示對應region的空閑狀態,0表示已使用,1表示空閑。
由於arena_bin_info[5]中的ngroup的值為4,binind為5的run,它的bitmap數組中的前4個bitmap是有效的。
即上面runcur對應的bitmap為:
0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd,
轉成二進制:
0000000000000000000000000000000000000000000000000000000000000000 1110111111110111001000000000000000000000000000000000000000000000 1111111111111101111011111111111111111111111101111111110111110111 1111111111111111111111111111111111111101111011111111111111111101
可見,bitmap中一共有135個1,也就是說free的region有135個,即0x87個。
申請內存的相關代碼如下:
JEMALLOC_INLINE_C void * arena_run_reg_alloc(arena_run_t *run, arena_bin_info_t *bin_info) { void *ret; size_t regind; arena_chunk_map_misc_t *miscelm; void *rpages; regind = (unsigned)bitmap_sfu(run->bitmap, &bin_info->bitmap_info); miscelm = arena_run_to_miscelm(run); rpages = arena_miscelm_to_rpages(miscelm); ret = (void *)((uintptr_t)rpages + (uintptr_t)bin_info->reg0_offset + (uintptr_t)(bin_info->reg_interval * regind)); run->nfree--; return (ret); }
其中bitmap_sfu()返回bitmap中第一個1的位置,並且將該位置0。
接下來就是通過run找對應的miscelm,再通過miscelm找到run對應的page,它的起始位置rpages。
rpages的值+regind*reg_interval(同reg_size),就能得出這個空閑的region的實際地址了。
最后再將run的nfree減一,整個內存申請過程就結束了。
中間紅色部分內容需要了解chunk header相關內容。