2. Linux-3.14.12內存管理筆記【系統啟動階段的memblock算法(2)】


memory:表示可用可分配的內存;
結束完memblock算法初始化前的准備工作,回到memblock算法初始化及其算法實現上面。memblock是一個很簡單的算法。

memblock算法的實現是,它將所有狀態都保存在一個全局變量__initdata_memblock中,算法的初始化以及內存的申請釋放都是在將內存塊的狀態做變更。那么從數據結構入手,

__initdata_memblock是一個memblock結構體。其結構體定義:

【file:/include/linux/memblock.h】
struct memblock {
    bool bottom_up; /* is bottom up direction? */
    phys_addr_t current_limit;
    struct memblock_type memory;
    struct memblock_type reserved;
};

結構體內各成員的意思:

  • bottom_up:用來表示分配器分配內存是自低地址(低地址指的是內核映像尾部,下同)向高地址還是自高地址向低地址來分配的;
  • current_limit:用來表示用來限制memblock_alloc()和memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE)的內存申請;
  • memory:表示可用可分配的內存;
  • reserved:表示已經分配出去了的內存;

memory和reserved是很關鍵的一個數據結構,memblock算法的內存初始化和申請釋放都是圍繞着它們轉。

往下看看memory和reserved的結構體memblock_type定義:

【file:/include/linux/memblock.h】
struct memblock_type {
    unsigned long cnt; /* number of regions */
    unsigned long max; /* size of the allocated array */
    phys_addr_t total_size; /* size of all regions */
    struct memblock_region *regions;
};

cnt和max分別表示當前狀態(memory/reserved)的內存塊可用數和可支持的最大數,total_size則表示當前狀態(memory/reserved)的空間大小(也就是可用的內存塊信息大小總和),而regions則是用於保存內存塊信息的結構(包括基址、大小和標記等):

【file:/include/linux/memblock.h】
struct memblock_region {
    phys_addr_t base;
    phys_addr_t size;
    unsigned long flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
    int nid;
#endif
};

memblock算法的主要結構體也就這么多了,總的關系如圖:

image

回去看看__initdata_memblock的定義:

【file:/mm/memblock.c】
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
struct memblock memblock __initdata_memblock = {
    .memory.regions = memblock_memory_init_regions,
    .memory.cnt = 1, /* empty dummy entry */
    .memory.max = INIT_MEMBLOCK_REGIONS,
    
    .reserved.regions = memblock_reserved_init_regions,
    .reserved.cnt = 1, /* empty dummy entry */
    .reserved.max = INIT_MEMBLOCK_REGIONS,
 
    .bottom_up = false,
    .current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};

它初始化了部分成員,表示內存申請自高地址向低地址,且current_limit設為~0,即0xFFFFFFFF,同時通過全局變量定義為memblock的算法管理中的memory和reserved准備了內存空間。

接下來分析一下memblock算法初始化,其初始化函數為memblock_x86_fill(),初始化調用棧位置:

start_kernel()                          #/init/main.c

└->setup_arch()                        #/arch/x86/kernel/setup.c

└->memblock_x86_fill()                #/arch/x86/kernel/e820.c

函數實現:

【file:/arch/x86/kernel/e820.c】
void __init memblock_x86_fill(void)
{
    int i;
    u64 end;
 
    /*
     * EFI may have more than 128 entries
     * We are safe to enable resizing, beause memblock_x86_fill()
     * is rather later for x86
     */
    memblock_allow_resize();
 
    for (i = 0; i < e820.nr_map; i++) {
        struct e820entry *ei = &e820.map[i];
 
        end = ei->addr + ei->size;
        if (end != (resource_size_t)end)
            continue;
 
        if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
            continue;
 
        memblock_add(ei->addr, ei->size);
    }
 
    /* throw away partial pages */
    memblock_trim_memory(PAGE_SIZE);
 
    memblock_dump_all();
}

該函數的實現中,調用了memblock_allow_resize() 僅是用於置memblock_can_resize的值;里面的for則是用於循環遍歷e820的內存布局信息,將信息做memblock_add的操作;最后循環退出后,將調用memblock_trim_memory()和memblock_dump_all()做后處理。這里首先看一下memblock_add()的函數實現:

【file:/mm/memblock.c】
int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
{
    return memblock_add_region(&memblock.memory, base, size,
                   MAX_NUMNODES, 0);
}

memblock_add()主要封裝了memblock_add_region(),特別需要留意它操作對象是memblock.memory(可用可分配的內存),可以推測其意圖是將e820的內存信息往這里添加,接着往下看memblock_add_region()的實現:

【file:/mm/memblock.c】
/**
 * memblock_add_region - add new memblock region
 * @type: memblock type to add new region into
 * @base: base address of the new region
 * @size: size of the new region
 * @nid: nid of the new region
 * @flags: flags of the new region
 *
 * Add new memblock region [@base,@base+@size) into @type. The new region
 * is allowed to overlap with existing ones - overlaps don't affect already
 * existing regions. @type is guaranteed to be minimal (all neighbouring
 * compatible regions are merged) after the addition.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
static int __init_memblock memblock_add_region(struct memblock_type *type,
                phys_addr_t base, phys_addr_t size,
                int nid, unsigned long flags)
{
    bool insert = false;
    phys_addr_t obase = base;
    phys_addr_t end = base + memblock_cap_size(base, &size);
    int i, nr_new;
 
    if (!size)
        return 0;
 
    /* special case for empty array */
    if (type->regions[0].size == 0) {
        WARN_ON(type->cnt != 1 || type->total_size);
        type->regions[0].base = base;
        type->regions[0].size = size;
        type->regions[0].flags = flags;
        memblock_set_region_node(&type->regions[0], nid);
        type->total_size = size;
        return 0;
    }
repeat:
    /*
     * The following is executed twice. Once with %false @insert and
     * then with %true. The first counts the number of regions needed
     * to accomodate the new area. The second actually inserts them.
     */
    base = obase;
    nr_new = 0;
 
    for (i = 0; i < type->cnt; i++) {
        struct memblock_region *rgn = &type->regions[i];
        phys_addr_t rbase = rgn->base;
        phys_addr_t rend = rbase + rgn->size;
 
        if (rbase >= end)
            break;
        if (rend <= base)
            continue;
        /*
         * @rgn overlaps. If it separates the lower part of new
         * area, insert that portion.
         */
        if (rbase > base) {
            nr_new++;
            if (insert)
                memblock_insert_region(type, i++, base,
                               rbase - base, nid,
                               flags);
        }
        /* area below @rend is dealt with, forget about it */
        base = min(rend, end);
    }
 
    /* insert the remaining portion */
    if (base < end) {
        nr_new++;
        if (insert)
            memblock_insert_region(type, i, base, end - base,
                           nid, flags);
    }
 
    /*
     * If this was the first round, resize array and repeat for actual
     * insertions; otherwise, merge and return.
     */
    if (!insert) {
        while (type->cnt + nr_new > type->max)
            if (memblock_double_array(type, obase, size) < 0)
                return -ENOMEM;
        insert = true;
        goto repeat;
    } else {
        memblock_merge_regions(type);
        return 0;
    }
}

分析一下memblock_add_region()函數的行為流程:

  1. 如果memblock算法管理內存為空的時候,則將當前空間添加進去;
  2. 不為空的情況下,則先檢查是否存在內存重疊的情況,如果有的話,則剔除重疊部分,然后將其余非重疊的部分添加進去;
  3. 如果出現region[]數組空間不夠的情況,則通過memblock_double_array()添加新的region[]空間;
  4. 最后通過memblock_merge_regions()把緊挨着的內存合並了。

現在很明了,可以看到其功能作用是把e820圖里面的內存布局轉換到memblock管理算法當中的memblock.memory進行管理,表示該內存可用。

接着回到memblock_x86_fill()退出for循環的兩個后處理函數memblock_trim_memory()和memblock_dump_all(),其中memblock_trim_memory()的實現:

【file:/mm/memblock.c】
void __init_memblock memblock_trim_memory(phys_addr_t align)
{
    int i;
    phys_addr_t start, end, orig_start, orig_end;
    struct memblock_type *mem = &memblock.memory;
 
    for (i = 0; i < mem->cnt; i++) {
        orig_start = mem->regions[i].base;
        orig_end = mem->regions[i].base + mem->regions[i].size;
        start = round_up(orig_start, align);
        end = round_down(orig_end, align);
 
        if (start == orig_start && end == orig_end)
            continue;
 
        if (start < end) {
            mem->regions[i].base = start;
            mem->regions[i].size = end - start;
        } else {
            memblock_remove_region(mem, i);
            i--;
        }
    }
}

該函數主要用於對memblock.memory做修整,剔除不對齊的部分。而最后memblock_dump_all則是將整理的信息做dump輸出,這里就不分析了。

至此memblock內存管理算是初始化完畢了。接下來看一下算法的內存申請和釋放的,memblock算法下的內存申請和釋放的接口分別為:

memblock_alloc()和memblock_free()。

memblock_alloc()的函數實現(入參為size大小和align用於字節對齊):

【file:/mm/memblock.c】
phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
{
    return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

加上標示MEMBLOCK_ALLOC_ACCESSIBLE表示申請內存可訪問,封裝調用memblock_alloc_base():

【file:/mm/memblock.c】
phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
    phys_addr_t alloc;
 
    alloc = __memblock_alloc_base(size, align, max_addr);
 
    if (alloc == 0)
        panic("ERROR: Failed to allocate 0x%llx bytes below 0x%llx.\n",
              (unsigned long long) size, (unsigned long long) max_addr);
 
    return alloc;
}

繼續__memblock_alloc_base()(封裝了memblock_alloc_base_nid(),新增NUMA_NO_NODE入參表示無NUMA的節點,畢竟當前還沒初始化到那一步):

【file:/mm/memblock.c】
phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
    return memblock_alloc_base_nid(size, align, max_addr, NUMA_NO_NODE);
}

繼續memblock_alloc_base_nid():

【file:/mm/memblock.c】
static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
                    phys_addr_t align, phys_addr_t max_addr,
                    int nid)
{
    phys_addr_t found;
 
    if (!align)
        align = SMP_CACHE_BYTES;
 
    found = memblock_find_in_range_node(size, align, 0, max_addr, nid);
    if (found && !memblock_reserve(found, size))
        return found;
 
    return 0;
}

這里主要留意兩個關鍵函數memblock_find_in_range_node()和memblock_reserve()。

先看一下memblock_find_in_range_node()的實現:

【file:/mm/memblock.c】
/**
 * memblock_find_in_range_node - find free area in given range and node
 * @size: size of free area to find
 * @align: alignment of free area to find
 * @start: start of candidate range
 * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
 * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
 *
 * Find @size free area aligned to @align in the specified range and node.
 *
 * When allocation direction is bottom-up, the @start should be greater
 * than the end of the kernel image. Otherwise, it will be trimmed. The
 * reason is that we want the bottom-up allocation just near the kernel
 * image so it is highly likely that the allocated memory and the kernel
 * will reside in the same node.
 *
 * If bottom-up allocation failed, will try to allocate memory top-down.
 *
 * RETURNS:
 * Found address on success, 0 on failure.
 */
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
                    phys_addr_t align, phys_addr_t start,
                    phys_addr_t end, int nid)
{
    int ret;
    phys_addr_t kernel_end;
 
    /* pump up @end */
    if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
        end = memblock.current_limit;
 
    /* avoid allocating the first page */
    start = max_t(phys_addr_t, start, PAGE_SIZE);
    end = max(start, end);
    kernel_end = __pa_symbol(_end);
 
    /*
     * try bottom-up allocation only when bottom-up mode
     * is set and @end is above the kernel image.
     */
    if (memblock_bottom_up() && end > kernel_end) {
        phys_addr_t bottom_up_start;
 
        /* make sure we will allocate above the kernel */
        bottom_up_start = max(start, kernel_end);
 
        /* ok, try bottom-up allocation first */
        ret = __memblock_find_range_bottom_up(bottom_up_start, end,
                              size, align, nid);
        if (ret)
            return ret;
 
        /*
         * we always limit bottom-up allocation above the kernel,
         * but top-down allocation doesn't have the limit, so
         * retrying top-down allocation may succeed when bottom-up
         * allocation failed.
         *
         * bottom-up allocation is expected to be fail very rarely,
         * so we use WARN_ONCE() here to see the stack trace if
         * fail happens.
         */
        WARN_ONCE(1, "memblock: bottom-up allocation failed, "
                 "memory hotunplug may be affected\n");
    }
 
    return __memblock_find_range_top_down(start, end, size, align, nid);
}

粗略講解一下,判斷end的范圍,從前面調用關系跟下來,end其實就是MEMBLOCK_ALLOC_ACCESSIBLE,由此一來,將會設置為memblock.current_limit。緊接着對start做調整,為的是避免申請到第一個頁面。memblock_bottom_up()返回的是memblock.bottom_up,前面初始化的時候也知道這個值是false(這不是一定的,在numa初始化時會設置為true),所以最后應該調用的是__memblock_find_range_top_down()去查找內存。看一下__memblock_find_range_top_down()的實現:

【file:/mm/memblock.c】
/**
 * __memblock_find_range_top_down - find free area utility, in top-down
 * @start: start of candidate range
 * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
 * @size: size of free area to find
 * @align: alignment of free area to find
 * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
 *
 * Utility called from memblock_find_in_range_node(), find free area top-down.
 *
 * RETURNS:
 * Found address on success, 0 on failure.
 */
static phys_addr_t __init_memblock
__memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
                   phys_addr_t size, phys_addr_t align, int nid)
{
    phys_addr_t this_start, this_end, cand;
    u64 i;
 
    for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
        this_start = clamp(this_start, start, end);
        this_end = clamp(this_end, start, end);
 
        if (this_end < size)
            continue;
 
        cand = round_down(this_end - size, align);
        if (cand >= this_start)
            return cand;
    }
 
    return 0;
}

memblock_find_range_top_down()通過使用for_each_free_mem_range_reverse宏封裝調用__next_free_mem_range_rev()函數,此函數逐一將memblock.memory里面的內存塊信息提取出來與memblock.reserved的各項信息進行檢驗,確保返回的this_start和this_end不會與reserved的內存存在交叉重疊的情況。然后通過clamp取中間值,判斷大小是否滿足,滿足的情況下,將自末端向前(因為這是top-down申請方式)的size大小的空間的起始地址(前提該地址不會超出this_start)返回回去。至此滿足要求的內存塊算是找到了。

多說一些,其實__memblock_find_range_bottom_up()和__memblock_find_range_top_down()的查找內存實現是完全類似的,僅在down-top和top-down上面存在差異罷了。

既然滿足條件的內存塊找到了,那么回到memblock_alloc_base_nid()調用的另一個關鍵函數memblock_reserve():

【file:/mm/memblock.c】
int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
{
    return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
}

接着看一下memblock_reserve_region():

【file:/mm/memblock.c】
static int __init_memblock memblock_reserve_region(phys_addr_t base,
                           phys_addr_t size,
                           int nid,
                           unsigned long flags)
{
    struct memblock_type *_rgn = &memblock.reserved;
 
    memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
             (unsigned long long)base,
             (unsigned long long)base + size - 1,
             flags, (void *)_RET_IP_);
 
    return memblock_add_region(_rgn, base, size, nid, flags);
}

可以看到memblock_reserve_region()是通過memblock_add_region()函數往memblock.reserved里面添加內存塊信息。

最后看看memblock算法的memblock_free()實現:

【file:/mm/memblock.c】
int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
{
    memblock_dbg(" memblock_free: [%#016llx-%#016llx] %pF\n",
             (unsigned long long)base,
             (unsigned long long)base + size - 1,
             (void *)_RET_IP_);
 
    return __memblock_remove(&memblock.reserved, base, size);
}

該函數主要封裝了__memblock_remove()用於對memblock.reserved的操作。

接着看__memblock_remove():

【file:/mm/memblock.c】
static int __init_memblock __memblock_remove(struct memblock_type *type,
                         phys_addr_t base, phys_addr_t size)
{
    int start_rgn, end_rgn;
    int i, ret;
 
    ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
    if (ret)
        return ret;
 
    for (i = end_rgn - 1; i >= start_rgn; i--)
        memblock_remove_region(type, i);
    return 0;
}

該函數主要調用兩個關鍵函數memblock_isolate_range()和memblock_remove_region()。先看一下memblock_isolate_range():

【file:/mm/memblock.c】
/**
 * memblock_isolate_range - isolate given range into disjoint memblocks
 * @type: memblock type to isolate range for
 * @base: base of range to isolate
 * @size: size of range to isolate
 * @start_rgn: out parameter for the start of isolated region
 * @end_rgn: out parameter for the end of isolated region
 *
 * Walk @type and ensure that regions don't cross the boundaries defined by
 * [@base,@base+@size). Crossing regions are split at the boundaries,
 * which may create at most two more regions. The index of the first
 * region inside the range is returned in *@start_rgn and end in *@end_rgn.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
static int __init_memblock memblock_isolate_range(struct memblock_type *type,
                    phys_addr_t base, phys_addr_t size,
                    int *start_rgn, int *end_rgn)
{
    phys_addr_t end = base + memblock_cap_size(base, &size);
    int i;
 
    *start_rgn = *end_rgn = 0;
 
    if (!size)
        return 0;
 
    /* we'll create at most two more regions */
    while (type->cnt + 2 > type->max)
        if (memblock_double_array(type, base, size) < 0)
            return -ENOMEM;
 
    for (i = 0; i < type->cnt; i++) {
        struct memblock_region *rgn = &type->regions[i];
        phys_addr_t rbase = rgn->base;
        phys_addr_t rend = rbase + rgn->size;
 
        if (rbase >= end)
            break;
        if (rend <= base)
            continue;
 
        if (rbase < base) {
            /*
             * @rgn intersects from below. Split and continue
             * to process the next region - the new top half.
             */
            rgn->base = base;
            rgn->size -= base - rbase;
            type->total_size -= base - rbase;
            memblock_insert_region(type, i, rbase, base - rbase,
                           memblock_get_region_node(rgn),
                           rgn->flags);
        } else if (rend > end) {
            /*
             * @rgn intersects from above. Split and redo the
             * current region - the new bottom half.
             */
            rgn->base = end;
            rgn->size -= end - rbase;
            type->total_size -= end - rbase;
            memblock_insert_region(type, i--, rbase, end - rbase,
                           memblock_get_region_node(rgn),
                           rgn->flags);
        } else {
            /* @rgn is fully contained, record it */
            if (!*end_rgn)
                *start_rgn = i;
            *end_rgn = i + 1;
        }
    }
 
    return 0;
}

可以看到memblock_isolate_range()主要是找到覆蓋了指定的內存塊的內存項的下標索引給找到並以出參返回回去。接着看memblock_remove_region的實現:

【file:/mm/memblock.c】
static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
{
    type->total_size -= type->regions[r].size;
    memmove(&type->regions[r], &type->regions[r + 1],
        (type->cnt - (r + 1)) * sizeof(type->regions[r]));
    type->cnt--;
 
    /* Special case for empty arrays */
    if (type->cnt == 0) {
        WARN_ON(type->total_size != 0);
        type->cnt = 1;
        type->regions[0].base = 0;
        type->regions[0].size = 0;
        type->regions[0].flags = 0;
        memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
    }
}

其主要功能是將指定下標索引的內存項從memblock.reserved管理結構中移除。

兩者結合起來,更容易理解。在__memblock_remove()里面,memblock_isolate_range()主要作用是基於被釋放的內存信息將memblock.reserved划分為兩段,將memblock.reserved覆蓋了被釋放的內存項自開始項到結束項的下標索引以start_rgn和end_rgn返回回去。memblock_isolate_range()返回后,接着memblock_remove_region()則借助於start_rgn和end_rgn把這幾項從memblock.reserved管理結構中移除。至此內存釋放完畢。

簡單點做個小結:memblock管理算法將可用可分配的內存在memblock.memory進行管理起來,已分配的內存在memblock.reserved進行管理,只要內存塊加入到memblock.reserved里面就表示該內存已經被申請占用了。所以有個關鍵點需要注意,內存申請的時候,僅是把被申請到的內存加入到memblock.reserved中,並不會在memblock.memory里面有相關的刪除或改動的操作,這也就是為什么申請和釋放的操作都集中在memblock.reserved的原因了。這個算法效率並不高,但是這是合理的,畢竟在初始化階段沒有那么多復雜的內存操作場景,甚至很多地方都是申請了內存做永久使用的


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM