CoreCLR源碼探索(三) GC內存分配器的內部實現

本文轉載自查看原文 2017-02-08 17:21 1896 CoreCLR源碼探索

在前一篇中我講解了new是怎么工作的, 但是卻一筆跳過了內存分配相關的部分.
在這一篇中我將詳細講解GC內存分配器的內部實現.
在看這一篇之前請必須先看完微軟BOTR文檔中的"Garbage Collection Design",
原文地址是: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md
譯文可以看知平軟件的譯文或我后來的譯文
請務必先看完"Garbage Collection Design", 否則以下內容你很可能會無法理解

服務器GC和工作站GC

關於服務器GC和工作站GC的區別, 網上已經有很多資料講解這篇就不再說明了.
我們來看服務器GC和工作站GC的代碼是怎么區別開來的.
默認編譯CoreCLR會對同一份代碼以使用服務器GC還是工作站GC的區別編譯兩次, 分別在SVR和WKS命名空間中:

源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcsvr.cpp

#define SERVER_GC 1

namespace SVR { 
#include "gcimpl.h"
#include "gc.cpp"
}

源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcwks.cpp

#ifdef SERVER_GC
#undef SERVER_GC
#endif

namespace WKS { 
#include "gcimpl.h"
#include "gc.cpp"
}

當定義了SERVER_GC時, MULTIPLE_HEAPS和會被同時定義.
定義了MULTIPLE_HEAPS會使用多個堆(Heap), 服務器GC每個cpu核心都會對應一個堆(默認), 工作站GC則全局使用同一個堆.

源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcimpl.h

#ifdef SERVER_GC
#define MULTIPLE_HEAPS 1
#endif // SERVER_GC

后台GC無論是服務器GC還是工作站GC都會默認支持, 但運行時不一定會啟用.

源代碼: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcpriv.h

#define BACKGROUND_GC //concurrent background GC (requires WRITE_WATCH)

我們從https://www.microsoft.com/net下回來的CoreCLR安裝包中已經包含了服務器GC和后台GC的支持，但默認不會開啟.
開啟它們可以修改project.json中的·runtimeOptions·節, 例子如下:

{
  "runtimeOptions": {
    "configProperties": {
      "System.GC.Server": true,
      "System.GC.Concurrent": true
    }
  }
}

設置后發布項目可以看到coreapp.runtimeconfig.json, 運行時會只看這個文件.
微軟官方的文檔: https://docs.microsoft.com/en-us/dotnet/articles/core/tools/project-json

GC相關的類和它們的關系

我先用兩張圖來解釋服務器GC和工作站GC下GC相關的類的關系

圖中一共有5個類型

GCHeap
- 實現了IGCHeap接口, 公開GC層的接口給EE(運行引擎)層調用
- 在工作站GC下只有一個實例, 不會關聯gc_heap對象, 因為工作站GC下gc_heap的所有成員都會被定義為靜態變量
- 在服務器GC下有1+cpu核心數個實例(默認), 第一個實例用於當接口, 其它對應cpu核心的實例都會各關聯一個gc_heap實例
gc_heap
- 內部的使用的堆類型, 用於負責內存的分配和回收
- 在工作站GC下無實例, 所有成員都會定義為靜態變量
- 在工作站GC下generation_table這個成員不會被定義, 而是使用全局變量generation_table
- 在服務器GC下有cpu核心數個實例(默認), 各關聯一個GCHeap實例
generation
- 儲存各個代的信息, 例如地址范圍和使用的段
- 儲存在generation_table中, 一個generation_table包含了5個generation, 前面的是0 1 2 3代, 最后一個不會被初始化和使用
- 在工作站GC下只有1個generation_table, 就是全局變量generation_table
- 在服務器GC下generation_table是gc_heap的成員, 有多少個gc_heap就有多少個generation_table
heap_segment
- 堆段, 供分配器使用的一段內存, 用鏈表形式保存
- 每個gc_heap中都有一個或一個以上的segment
- 每個gc_heap中都有一個ephemeral heap segment(用於存放最年輕對象)
- 每個gc_heap中都有一個large heap segment(用於存放大對象)
- 在工作站GC下segment的默認大小是256M(0x10000000字節)
- 在服務器GC下segment的默認大小是4G(0x100000000字節)
alloc_context
- 分配上下文, 指向segment中的一個范圍, 用於實際分配對象
- 每個線程都有自己的分配上下文, 因為指向的范圍不一樣所以只要當前范圍還有足夠空間, 分配對象時不需要線程鎖
- 分配上下文的默認范圍是8K, 也叫分配單位(Allocation Quantum)
- 分配小對象時會從這8K中分配, 分配大對象時則會直接從段(segment)中分配
- 代0(gen 0)還有一個默認的分配上下文供內部使用, 和線程無關

GCHeap的源代碼摘要:

GCHeap的定義: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcimpl.h#L61
這里我只列出這篇文章涉及到的成員

// WKS::GCHeap或SVR::GCHeap繼承全局命名空間下的GCHeap
class GCHeap : public ::GCHeap
{
#ifdef MULTIPLE_HEAPS
    // 服務器GC每個GCHeap實例都會和一個gc_heap實例互相關聯
    gc_heap*    pGenGCHeap;
#else
    // 工作站GC下gc_heap所有字段和函數都是靜態的, 所以可以用((gc_heap*)nullptr)->xxx來訪問
    // 嚴格來說是UB(未定義動作), 但是實際可以工作
    #define pGenGCHeap ((gc_heap*)0)
#endif //MULTIPLE_HEAPS
};

全局的GCHeap實例: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gc.h#L105
這里是1.1.0的代碼, 1.2.0全局GCHeap會分別保存到gcheaputilities.h(g_pGCHeap)和gc.cpp(g_theGCHeap), 兩處地方都指向同一個實例.

// 相當於extern GCHeap* g_pGCHeap;
GPTR_DECL(GCHeap, g_pGCHeap);

gc_heap的源代碼摘要:

gc_heap的定義: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcpriv.h#L1079
這個類有300多個成員(從ephemeral_low開始), 這里我只列出這篇文章涉及到的成員

class gc_heap
{
#ifdef MULTIPLE_HEAPS
    // 對應的GCHeap實例
    PER_HEAP GCHeap* vm_heap;
    // 序號
    PER_HEAP int heap_number;
    // 給分配上下文設置內存范圍的次數
    PER_HEAP VOLATILE(int) alloc_context_count;
#else //MULTIPLE_HEAPS
    // 工作站GC時對應全局的GCHeap實例
    #define vm_heap ((GCHeap*) g_pGCHeap)
    // 工作站GC時序號為0
    #define heap_number (0)
#endif //MULTIPLE_HEAPS

#ifndef MULTIPLE_HEAPS
    // 當前使用的短暫的堆段(用於分配新對象的堆段)
    SPTR_DECL(heap_segment,ephemeral_heap_segment);
#else
    // 同上
    PER_HEAP heap_segment* ephemeral_heap_segment;
#endif // !MULTIPLE_HEAPS

    // 全局GC線程鎖, 靜態變量
    PER_HEAP_ISOLATED GCSpinLock gc_lock; //lock while doing GC
    // 分配上下文用完, 需要為分配上下文指定新的范圍時使用的線程鎖
    PER_HEAP GCSpinLock more_space_lock; //lock while allocating more space

#ifdef MULTIPLE_HEAPS
    // 儲存各個代的信息
    // NUMBERGENERATIONS+1=5, 代分別有0 1 2 3, 最后一個元素不會被使用
    // 工作站GC時不會定義, 而是使用全局變量generation_table
    PER_HEAP generation generation_table [NUMBERGENERATIONS+1];
#endif

#ifdef MULTIPLE_HEAPS
    // 全局gc_heap的數量, 靜態變量
    // 服務器GC默認是cpu核心數, 工作站GC是0
    SVAL_DECL(int, n_heaps);
    // 全局gc_heap的數組, 靜態變量
    SPTR_DECL(PTR_gc_heap, g_heaps);
#endif
};

generation的源代碼摘要:

generation的定義: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcpriv.h#L754
這里我只列出這篇文章涉及到的成員

class generation
{
public:
    // 默認的分配上下文
    alloc_context   allocation_context;
    // 用於分配的最新的堆段
    heap_segment*   allocation_segment;
    // 開始的堆段
    PTR_heap_segment start_segment;
    // 用於區分對象在哪個代的指針, 在此之后的對象都屬於這個代, 或比這個代更年輕的代
    uint8_t*        allocation_start;
    // 用於儲存和分配自由對象（Free Object, 又名Unused Array, 可以理解為碎片空間)的分配器
    allocator       free_list_allocator;
    // 這個代是第幾代
    int gen_num;
};

heap_segment的源代碼摘要:

heap_segment的定義: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gcpriv.h#L4166
這里我只列出這篇文章涉及到的成員

class heap_segment
{
public:
    // 已實際分配地址 (mem + 已分配大小)
    // 更新有可能會延遲
    uint8_t*        allocated;
    // 已提交到物理內存的地址 (this + SEGMENT_INITIAL_COMMIT)
    uint8_t*        committed;
    // 預留到的分配地址 (this + size)
    uint8_t*        reserved;
    // 已使用地址 (mem + 已分配大小 - 對象頭大小)
    uint8_t*        used;
    // 初始分配地址 (服務器gc開啟時: this + OS_PAGE_SIZE, 否則: this + sizeof(*this) + alignment)
    uint8_t*        mem;
    // 下一個堆段
    PTR_heap_segment next;
    // 屬於的gc_heap實例
    gc_heap*        heap;
};

alloc_context的源代碼摘要:

alloc_context的定義: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gc.h#L162
這里是1.1.0的代碼, 1.2.0這些成員移動到了gcinterface.h的gc_alloc_context, 但是成員還是一樣的

struct alloc_context 
{
    // 下一次分配對象的開始地址
    uint8_t*       alloc_ptr;
    // 可以分配到的最終地址
    uint8_t*       alloc_limit;
    // 歷史分配的小對象大小合計
    int64_t        alloc_bytes; //Number of bytes allocated on SOH by this context
    // 歷史分配的大對象大小合計
    int64_t        alloc_bytes_loh; //Number of bytes allocated on LOH by this context
#if defined(FEATURE_SVR_GC)
    // 空間不夠需要獲取更多空間時使用的GCHeap
    // 分alloc_heap和home_heap的作用是平衡各個heap的使用量，這樣並行回收時可以減少處理各個heap的時間差異
    SVR::GCHeap*   alloc_heap;
    // 原來的GCHeap
    SVR::GCHeap*   home_heap;
#endif // defined(FEATURE_SVR_GC)
    // 歷史分配對象次數
    int            alloc_count;
};

堆段的物理結構

為了更好理解下面即將講解的代碼，請先看這兩張圖片

分配對象內存的代碼流程

還記得上篇我提到過的AllocateObject函數嗎? 這個函數由JIT_New調用, 負責分配一個普通的對象.
讓我們來繼續跟蹤這個函數的內部吧:

AllocateObject函數的內容: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/gchelpers.cpp#L931
AllocateObject的其他版本同樣也會調用AllocAlign8或Alloc函數, 下面就不再貼出其他版本的函數代碼了.

OBJECTREF AllocateObject(MethodTable *pMT
#ifdef FEATURE_COMINTEROP
                         , bool fHandleCom
#endif
    )
{
    // 省略部分代碼......
    Object     *orObject = NULL;
    
    // 調用gc的幫助函數分配內存，如果需要向8對齊則調用AllocAlign8，否則調用Alloc
    if (pMT->RequiresAlign8())
    {
        // 省略部分代碼......
        orObject = (Object *) AllocAlign8(baseSize,
                                          pMT->HasFinalizer(),
                                          pMT->ContainsPointers(),
                                          pMT->IsValueType());
    }
    else
    {
        orObject = (Object *) Alloc(baseSize,
                                    pMT->HasFinalizer(),
                                    pMT->ContainsPointers());
    }
    
    // 省略部分代碼......
    return UNCHECKED_OBJECTREF_TO_OBJECTREF(oref);
}

Alloc函數的內容: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/gchelpers.cpp#L931

inline Object* Alloc(size_t size, BOOL bFinalize, BOOL bContainsPointers )
{
    // 省略部分代碼......
    // 如果啟用分配上下文，則使用當前線程的分配上下文進行分配
    // 否則使用代(generation)中默認的分配上下文進行分配
    // 按官方的說法絕大部分情況下都會啟用分配上下文
    // 實測的機器上UseAllocationContexts函數會不經過判斷直接返回true
    if (GCHeap::UseAllocationContexts())
        retVal = GCHeap::GetGCHeap()->Alloc(GetThreadAllocContext(), size, flags);
    else
        retVal = GCHeap::GetGCHeap()->Alloc(size, flags);
    // 省略部分代碼......
    return retVal;
}

GetGCHeap函數的內容: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/gc/gc.h#L377

static GCHeap *GetGCHeap()
{
    LIMITED_METHOD_CONTRACT;
    // 返回全局的GCHeap實例
    // 注意這個實例只作為接口使用，不和具體的gc_heap實例關聯
    _ASSERTE(g_pGCHeap != NULL);
    return g_pGCHeap;
}

GetThreadAllocContext函數的內容: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/vm/gchelpers.cpp#L54

inline alloc_context* GetThreadAllocContext()
{
    WRAPPER_NO_CONTRACT;
    
    assert(GCHeap::UseAllocationContexts());
    // 獲取當前線程並返回m_alloc_context成員的地址
    return & GetThread()->m_alloc_context;
}

GCHeap::Alloc函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp

Object*
GCHeap::Alloc(alloc_context* acontext, size_t size, uint32_t flags REQD_ALIGN_DCL)
{
    // 省略部分代碼......
    Object* newAlloc = NULL;

    // 如果分配上下文是第一次使用，使用AssignHeap函數先給它對應一個GCHeap實例
#ifdef MULTIPLE_HEAPS
    if (acontext->alloc_heap == 0)
    {
        AssignHeap (acontext);
        assert (acontext->alloc_heap);
    }
#endif //MULTIPLE_HEAPS

    // 必要時觸發GC
#ifndef FEATURE_REDHAWK
    GCStress<gc_on_alloc>::MaybeTrigger(acontext);
#endif // FEATURE_REDHAWK

    // 服務器GC使用GCHeap對應的gc_heap, 工作站GC使用nullptr
#ifdef MULTIPLE_HEAPS
    gc_heap* hp = acontext->alloc_heap->pGenGCHeap;
#else
    gc_heap* hp = pGenGCHeap;
    // 省略部分代碼......
#endif //MULTIPLE_HEAPS

    // 分配小對象時使用allocate函數, 分配大對象時使用allocate_large_object函數
    if (size < LARGE_OBJECT_SIZE)
    {

#ifdef TRACE_GC
        AllocSmallCount++;
#endif //TRACE_GC
        // 分配小對象內存
        newAlloc = (Object*) hp->allocate (size + ComputeMaxStructAlignPad(requiredAlignment), acontext);
#ifdef FEATURE_STRUCTALIGN
        // 對齊指針
        newAlloc = (Object*) hp->pad_for_alignment ((uint8_t*) newAlloc, requiredAlignment, size, acontext);
#endif // FEATURE_STRUCTALIGN
//        ASSERT (newAlloc);
    }
    else 
    {
        // 分配大對象內存
        newAlloc = (Object*) hp->allocate_large_object (size + ComputeMaxStructAlignPadLarge(requiredAlignment), acontext->alloc_bytes_loh);
#ifdef FEATURE_STRUCTALIGN
        // 對齊指針
        newAlloc = (Object*) hp->pad_for_alignment_large ((uint8_t*) newAlloc, requiredAlignment, size);
#endif // FEATURE_STRUCTALIGN
    }

    // 省略部分代碼......
    return newAlloc;
}

分配小對象內存的代碼流程

讓我們來看一下小對象的內存是如何分配的

allocate函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數嘗試從分配上下文分配內存, 失敗時調用allocate_more_space為分配上下文指定新的空間
這里的前半部分的處理還有匯編版本, 可以看上一篇分析的JIT_TrialAllocSFastMP_InlineGetThread函數

inline
CObjectHeader* gc_heap::allocate (size_t jsize, alloc_context* acontext)
{
    size_t size = Align (jsize);
    assert (size >= Align (min_obj_size));
    {
    retry:
        // 嘗試把對象分配到alloc_ptr
        uint8_t*  result = acontext->alloc_ptr;
        acontext->alloc_ptr+=size;
        // 如果alloc_ptr + 對象大小 > alloc_limit, 則表示這個分配上下文是第一次使用或者剩余空間已經不夠用了
        if (acontext->alloc_ptr <= acontext->alloc_limit)
        {
            // 分配成功, 這里返回的地址就是+=size之前的alloc_ptr
            CObjectHeader* obj = (CObjectHeader*)result;
            assert (obj != 0);
            return obj;
        }
        else
        {
            // 分配失敗, 把size減回去
            acontext->alloc_ptr -= size;

#ifdef _MSC_VER
#pragma inline_depth(0)
#endif //_MSC_VER
            // 嘗試為分配上下文重新指定一塊范圍
            if (! allocate_more_space (acontext, size, 0))
                return 0;

#ifdef _MSC_VER
#pragma inline_depth(20)
#endif //_MSC_VER
            // 重試
            goto retry;
        }
    }
}

allocate_more_space函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會在有多個heap時調用balance_heaps平衡各個heap的使用量, 然后再調用try_allocate_more_space函數

BOOL gc_heap::allocate_more_space(alloc_context* acontext, size_t size,
                                  int alloc_generation_number)
{
    int status;
    do
    { 
        // 如果有多個heap需要先平衡它們的使用量以減少並行回收時的處理時間差
#ifdef MULTIPLE_HEAPS
        if (alloc_generation_number == 0)
        {
            // 平衡各個heap的使用量
            balance_heaps (acontext);
            // 調用try_allocate_more_space函數
            status = acontext->alloc_heap->pGenGCHeap->try_allocate_more_space (acontext, size, alloc_generation_number);
        }
        else
        {
            // 平衡各個heap的使用量(大對象)
            gc_heap* alloc_heap = balance_heaps_loh (acontext, size);
            // 調用try_allocate_more_space函數
            status = alloc_heap->try_allocate_more_space (acontext, size, alloc_generation_number);
        }
#else
        // 只有一個heap時直接調用try_allocate_more_space函數
        status = try_allocate_more_space (acontext, size, alloc_generation_number);
#endif //MULTIPLE_HEAPS
    }
    while (status == -1);
    
    return (status != 0);
}

try_allocate_more_space函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會獲取MSL鎖, 檢查是否有必要觸發GC, 然后根據gen_number參數調用allocate_small或allocate_large函數

int gc_heap::try_allocate_more_space (alloc_context* acontext, size_t size,
                                   int gen_number)
{
    // gc已經開始時等待gc完成並重試
    // allocate函數會跑到retry再調用這個函數
    if (gc_heap::gc_started)
    {
        wait_for_gc_done();
        return -1;
    }

    // 獲取more_space_lock鎖
    // 並且統計獲取鎖需要的時間是否多或者少
#ifdef SYNCHRONIZATION_STATS
    unsigned int msl_acquire_start = GetCycleCount32();
#endif //SYNCHRONIZATION_STATS
    enter_spin_lock (&more_space_lock);
    add_saved_spinlock_info (me_acquire, mt_try_alloc);
    dprintf (SPINLOCK_LOG, ("[%d]Emsl for alloc", heap_number));
#ifdef SYNCHRONIZATION_STATS
    unsigned int msl_acquire = GetCycleCount32() - msl_acquire_start;
    total_msl_acquire += msl_acquire;
    num_msl_acquired++;
    if (msl_acquire > 200)
    {
        num_high_msl_acquire++;
    }
    else
    {
        num_low_msl_acquire++;
    }
#endif //SYNCHRONIZATION_STATS

    // 這部分的代碼被注釋了
    // 因為獲取msl(more space lock)鎖已經可以防止問題出現
    /*
    // We are commenting this out 'cause we don't see the point - we already
    // have checked gc_started when we were acquiring the msl - no need to check
    // again. This complicates the logic in bgc_suspend_EE 'cause that one would
    // need to release msl which causes all sorts of trouble.
    if (gc_heap::gc_started)
    {
#ifdef SYNCHRONIZATION_STATS
        good_suspension++;
#endif //SYNCHRONIZATION_STATS
        BOOL fStress = (g_pConfig->GetGCStressLevel() & EEConfig::GCSTRESS_TRANSITION) != 0;
        if (!fStress)
        {
            //Rendez vous early (MP scaling issue)
            //dprintf (1, ("[%d]waiting for gc", heap_number));
            wait_for_gc_done();
#ifdef MULTIPLE_HEAPS
            return -1;
#endif //MULTIPLE_HEAPS
        }
    }
    */

    dprintf (3, ("requested to allocate %d bytes on gen%d", size, gen_number));
    // 獲取對齊使用的值
    // 小對象3(0b11)或者7(0b111), 大對象7(0b111)
    int align_const = get_alignment_constant (gen_number != (max_generation+1));

    // 必要時觸發GC
    if (fgn_maxgen_percent)
    {
        check_for_full_gc (gen_number, size);
    }

    // 再次檢查必要時觸發GC
    if (!(new_allocation_allowed (gen_number)))
    {
        if (fgn_maxgen_percent && (gen_number == 0))
        {
            // We only check gen0 every so often, so take this opportunity to check again.
            check_for_full_gc (gen_number, size);
        }

        // 后台GC運行中並且物理內存占用率在95%以上時等待后台GC完成
#ifdef BACKGROUND_GC
        wait_for_bgc_high_memory (awr_gen0_alloc);
#endif //BACKGROUND_GC

#ifdef SYNCHRONIZATION_STATS
        bad_suspension++;
#endif //SYNCHRONIZATION_STATS
        dprintf (/*100*/ 2, ("running out of budget on gen%d, gc", gen_number));

        // 必要時原地觸發GC
        if (!settings.concurrent || (gen_number == 0))
        {
            vm_heap->GarbageCollectGeneration (0, ((gen_number == 0) ? reason_alloc_soh : reason_alloc_loh));
#ifdef MULTIPLE_HEAPS
            // 觸發GC后會釋放MSL鎖, 需要重新獲取
            enter_spin_lock (&more_space_lock);
            add_saved_spinlock_info (me_acquire, mt_try_budget);
            dprintf (SPINLOCK_LOG, ("[%d]Emsl out budget", heap_number));
#endif //MULTIPLE_HEAPS
        }
    }

    // 根據是第幾代調用不同的函數, 函數里面會給分配上下文指定新的范圍
    // 參數gen_number只能是0或者3
    BOOL can_allocate = ((gen_number == 0) ?
        allocate_small (gen_number, size, acontext, align_const) :
        allocate_large (gen_number, size, acontext, align_const));
   
    // 成功時檢查是否要觸發ETW(Event Tracing for Windows)事件
    if (can_allocate)
    {
        // 記錄給了分配上下文多少字節
        //ETW trace for allocation tick
        size_t alloc_context_bytes = acontext->alloc_limit + Align (min_obj_size, align_const) - acontext->alloc_ptr;
        int etw_allocation_index = ((gen_number == 0) ? 0 : 1);

        etw_allocation_running_amount[etw_allocation_index] += alloc_context_bytes;

        // 超過一定量時觸發ETW事件
        if (etw_allocation_running_amount[etw_allocation_index] > etw_allocation_tick)
        {
#ifdef FEATURE_REDHAWK
            FireEtwGCAllocationTick_V1((uint32_t)etw_allocation_running_amount[etw_allocation_index], 
                                    ((gen_number == 0) ? ETW::GCLog::ETW_GC_INFO::AllocationSmall : ETW::GCLog::ETW_GC_INFO::AllocationLarge), 
                                    GetClrInstanceId());
#else
            // Unfortunately some of the ETW macros do not check whether the ETW feature is enabled.
            // The ones that do are much less efficient.
#if defined(FEATURE_EVENT_TRACE)
            if (EventEnabledGCAllocationTick_V2())
            {
                fire_etw_allocation_event (etw_allocation_running_amount[etw_allocation_index], gen_number, acontext->alloc_ptr);
            }
#endif //FEATURE_EVENT_TRACE
#endif //FEATURE_REDHAWK
            // 重置量
            etw_allocation_running_amount[etw_allocation_index] = 0;
        }
    }

    return (int)can_allocate;
}

allocate_small函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
循環嘗試進行各種回收內存的處理和調用soh_try_fit函數, soh_try_fit函數分配成功或手段已經用盡時跳出循環

BOOL gc_heap::allocate_small (int gen_number,
                              size_t size, 
                              alloc_context* acontext,
                              int align_const)
{
    // 工作站GC且后台GC運行時140次(bgc_alloc_spin_count)休眠1次, 休眠時間2ms(bgc_alloc_spin)
#if defined (BACKGROUND_GC) && !defined (MULTIPLE_HEAPS)
    if (recursive_gc_sync::background_running_p())
    {
        background_soh_alloc_count++;
        if ((background_soh_alloc_count % bgc_alloc_spin_count) == 0)
        {
            Thread* current_thread = GetThread();
            add_saved_spinlock_info (me_release, mt_alloc_small);
            dprintf (SPINLOCK_LOG, ("[%d]spin Lmsl", heap_number));
            leave_spin_lock (&more_space_lock);
            BOOL cooperative_mode = enable_preemptive (current_thread);
            GCToOSInterface::Sleep (bgc_alloc_spin);
            disable_preemptive (current_thread, cooperative_mode);
            enter_spin_lock (&more_space_lock);
            add_saved_spinlock_info (me_acquire, mt_alloc_small);
            dprintf (SPINLOCK_LOG, ("[%d]spin Emsl", heap_number));
        }
        else
        {
            //GCToOSInterface::YieldThread (0);
        }
    }
#endif //BACKGROUND_GC && !MULTIPLE_HEAPS

    gc_reason gr = reason_oos_soh;
    oom_reason oom_r = oom_no_failure;

    // No variable values should be "carried over" from one state to the other. 
    // That's why there are local variable for each state

    allocation_state soh_alloc_state = a_state_start;

    // 開始循環切換狀態, 請關注soh_alloc_state
    // If we can get a new seg it means allocation will succeed.
    while (1)
    {
        dprintf (3, ("[h%d]soh state is %s", heap_number, allocation_state_str[soh_alloc_state]));
        switch (soh_alloc_state)
        {
            // 成功或失敗時跳出循環
            case a_state_can_allocate:
            case a_state_cant_allocate:
            {
                goto exit;
            }
            // 開始時切換狀態到a_state_try_fit
            case a_state_start:
            {
                soh_alloc_state = a_state_try_fit;
                break;
            }
            // 調用soh_try_fit函數
            // 成功時切換狀態到a_state_can_allocate
            // 失敗時切換狀態到a_state_trigger_full_compact_gc或a_state_trigger_ephemeral_gc
            case a_state_try_fit:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = soh_try_fit (gen_number, size, acontext,
                                                  align_const, &commit_failed_p,
                                                  NULL);
                soh_alloc_state = (can_use_existing_p ?
                                        a_state_can_allocate : 
                                        (commit_failed_p ? 
                                            a_state_trigger_full_compact_gc :
                                            a_state_trigger_ephemeral_gc));
                break;
            }
            // 后台GC完成后調用soh_try_fit函數
            // 成功時切換狀態到a_state_can_allocate
            // 失敗時切換狀態到a_state_trigger_2nd_ephemeral_gc或a_state_trigger_full_compact_gc
            case a_state_try_fit_after_bgc:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;
                BOOL short_seg_end_p = FALSE;

                can_use_existing_p = soh_try_fit (gen_number, size, acontext,
                                                  align_const, &commit_failed_p,
                                                  &short_seg_end_p);
                soh_alloc_state = (can_use_existing_p ? 
                                        a_state_can_allocate : 
                                        (short_seg_end_p ? 
                                            a_state_trigger_2nd_ephemeral_gc : 
                                            a_state_trigger_full_compact_gc));
                break;
            }
            // 壓縮GC完成后調用soh_try_fit函數
            // 如果壓縮后仍分配失敗則切換狀態到a_state_cant_allocate
            // 成功時切換狀態到a_state_can_allocate
            case a_state_try_fit_after_cg:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;
                BOOL short_seg_end_p = FALSE;

                can_use_existing_p = soh_try_fit (gen_number, size, acontext,
                                                  align_const, &commit_failed_p,
                                                  &short_seg_end_p);
                if (short_seg_end_p)
                {
                    soh_alloc_state = a_state_cant_allocate;
                    oom_r = oom_budget;
                }
                else
                {
                    if (can_use_existing_p)
                    {
                        soh_alloc_state = a_state_can_allocate;
                    }
                    else
                    {
#ifdef MULTIPLE_HEAPS
                        if (!commit_failed_p)
                        {
                            // some other threads already grabbed the more space lock and allocated
                            // so we should attemp an ephemeral GC again.
                            assert (heap_segment_allocated (ephemeral_heap_segment) < alloc_allocated);
                            soh_alloc_state = a_state_trigger_ephemeral_gc; 
                        }
                        else
#endif //MULTIPLE_HEAPS
                        {
                            assert (commit_failed_p);
                            soh_alloc_state = a_state_cant_allocate;
                            oom_r = oom_cant_commit;
                        }
                    }
                }
                break;
            }
            // 等待后台GC完成
            // 如果執行了壓縮則切換狀態到a_state_try_fit_after_cg
            // 否則切換狀態到a_state_try_fit_after_bgc
            case a_state_check_and_wait_for_bgc:
            {
                BOOL bgc_in_progress_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;

                bgc_in_progress_p = check_and_wait_for_bgc (awr_gen0_oos_bgc, &did_full_compacting_gc);
                soh_alloc_state = (did_full_compacting_gc ? 
                                        a_state_try_fit_after_cg : 
                                        a_state_try_fit_after_bgc);
                break;
            }
            // 觸發第0和1代的GC
            // 如果有壓縮則切換狀態到a_state_try_fit_after_cg
            // 否則重試soh_try_fit, 成功時切換狀態到a_state_can_allocate, 失敗時切換狀態到等待后台GC或觸發其他GC
            case a_state_trigger_ephemeral_gc:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;
                BOOL short_seg_end_p = FALSE;
                BOOL bgc_in_progress_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;

                did_full_compacting_gc = trigger_ephemeral_gc (gr);
                if (did_full_compacting_gc)
                {
                    soh_alloc_state = a_state_try_fit_after_cg;
                }
                else
                {
                    can_use_existing_p = soh_try_fit (gen_number, size, acontext,
                                                      align_const, &commit_failed_p,
                                                      &short_seg_end_p);
#ifdef BACKGROUND_GC
                    bgc_in_progress_p = recursive_gc_sync::background_running_p();
#endif //BACKGROUND_GC

                    if (short_seg_end_p)
                    {
                        soh_alloc_state = (bgc_in_progress_p ? 
                                                a_state_check_and_wait_for_bgc : 
                                                a_state_trigger_full_compact_gc);

                        if (fgn_maxgen_percent)
                        {
                            dprintf (2, ("FGN: doing last GC before we throw OOM"));
                            send_full_gc_notification (max_generation, FALSE);
                        }
                    }
                    else
                    {
                        if (can_use_existing_p)
                        {
                            soh_alloc_state = a_state_can_allocate;
                        }
                        else
                        {
#ifdef MULTIPLE_HEAPS
                            if (!commit_failed_p)
                            {
                                // some other threads already grabbed the more space lock and allocated
                                // so we should attemp an ephemeral GC again.
                                assert (heap_segment_allocated (ephemeral_heap_segment) < alloc_allocated);
                                soh_alloc_state = a_state_trigger_ephemeral_gc;
                            }
                            else
#endif //MULTIPLE_HEAPS
                            {
                                soh_alloc_state = a_state_trigger_full_compact_gc;
                                if (fgn_maxgen_percent)
                                {
                                    dprintf (2, ("FGN: failed to commit, doing full compacting GC"));
                                    send_full_gc_notification (max_generation, FALSE);
                                }
                            }
                        }
                    }
                }
                break;
            }
            // 第二次觸發第0和1代的GC
            // 如果有壓縮則切換狀態到a_state_try_fit_after_cg
            // 否則重試soh_try_fit, 成功時切換狀態到a_state_can_allocate, 失敗時切換狀態到a_state_trigger_full_compact_gc
            case a_state_trigger_2nd_ephemeral_gc:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;
                BOOL short_seg_end_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;


                did_full_compacting_gc = trigger_ephemeral_gc (gr);
                
                if (did_full_compacting_gc)
                {
                    soh_alloc_state = a_state_try_fit_after_cg;
                }
                else
                {
                    can_use_existing_p = soh_try_fit (gen_number, size, acontext,
                                                      align_const, &commit_failed_p,
                                                      &short_seg_end_p);
                    if (short_seg_end_p || commit_failed_p)
                    {
                        soh_alloc_state = a_state_trigger_full_compact_gc;
                    }
                    else
                    {
                        assert (can_use_existing_p);
                        soh_alloc_state = a_state_can_allocate;
                    }
                }
                break;
            }
            // 觸發第0和1和2代的壓縮GC
            // 成功時切換狀態到a_state_try_fit_after_cg, 失敗時切換狀態到a_state_cant_allocate
            case a_state_trigger_full_compact_gc:
            {
                BOOL got_full_compacting_gc = FALSE;

                got_full_compacting_gc = trigger_full_compact_gc (gr, &oom_r);
                soh_alloc_state = (got_full_compacting_gc ? a_state_try_fit_after_cg : a_state_cant_allocate);
                break;
            }
            default:
            {
                assert (!"Invalid state!");
                break;
            }
        }
    }

exit:
    // 分配失敗時處理OOM(Out Of Memory)
    if (soh_alloc_state == a_state_cant_allocate)
    {
        assert (oom_r != oom_no_failure);
        handle_oom (heap_number, 
                    oom_r, 
                    size,
                    heap_segment_allocated (ephemeral_heap_segment),
                    heap_segment_reserved (ephemeral_heap_segment));

        dprintf (SPINLOCK_LOG, ("[%d]Lmsl for oom", heap_number));
        add_saved_spinlock_info (me_release, mt_alloc_small_cant);
        leave_spin_lock (&more_space_lock);
    }

    return (soh_alloc_state == a_state_can_allocate);
}

soh_try_fit函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會先嘗試調用a_fit_free_list_p從自由對象列表中分配, 然后嘗試調用a_fit_segment_end_p從堆段結尾分配

BOOL gc_heap::soh_try_fit (int gen_number,
                           size_t size, 
                           alloc_context* acontext,
                           int align_const,
                           BOOL* commit_failed_p, // 返回參數, 把虛擬內存提交到物理內存是否失敗(物理內存不足)
                           BOOL* short_seg_end_p) // 返回參數, 堆段的結尾是否不夠用
{
    BOOL can_allocate = TRUE;
    // 有傳入short_seg_end_p時先設置它的值為false
    if (short_seg_end_p)
    {
        *short_seg_end_p = FALSE;
    }

    // 先嘗試從自由對象列表中分配
    can_allocate = a_fit_free_list_p (gen_number, size, acontext, align_const);
    if (!can_allocate)
    {
        // 不能從自由對象列表中分配, 嘗試從堆段的結尾分配
        // 檢查ephemeral_heap_segment的結尾空間是否足夠
        if (short_seg_end_p)
        {
            *short_seg_end_p = short_on_end_of_seg (gen_number, ephemeral_heap_segment, align_const);
        }
        // 如果空間足夠, 或者調用時不傳入short_seg_end_p參數(傳入nullptr), 則調用a_fit_segment_end_p函數
        // If the caller doesn't care, we always try to fit at the end of seg;
        // otherwise we would only try if we are actually not short at end of seg.
        if (!short_seg_end_p || !(*short_seg_end_p))
        {
            can_allocate = a_fit_segment_end_p (gen_number, ephemeral_heap_segment, size, 
                                                acontext, align_const, commit_failed_p);
        }
    }

    return can_allocate;
}

a_fit_free_list_p函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會嘗試從自由對象列表中找到足夠大小的空間, 如果找到則把分配上下文指向這個空間

inline
BOOL gc_heap::a_fit_free_list_p (int gen_number, 
                                 size_t size, 
                                 alloc_context* acontext,
                                 int align_const)
{
    BOOL can_fit = FALSE;
    // 獲取指定的代中的自由對象列表
    generation* gen = generation_of (gen_number);
    allocator* gen_allocator = generation_allocator (gen);
    // 列表會按大小分為多個bucket(用鏈表形式鏈接)
    // 大小會*2遞增, 例如first_bucket的大小是256那第二個bucket的大小則為512
    size_t sz_list = gen_allocator->first_bucket_size();
    for (unsigned int a_l_idx = 0; a_l_idx < gen_allocator->number_of_buckets(); a_l_idx++)
    {
        if ((size < sz_list) || (a_l_idx == (gen_allocator->number_of_buckets()-1)))
        {
            uint8_t* free_list = gen_allocator->alloc_list_head_of (a_l_idx);
            uint8_t* prev_free_item = 0;

            while (free_list != 0)
            {
                dprintf (3, ("considering free list %Ix", (size_t)free_list));
                size_t free_list_size = unused_array_size (free_list);
                if ((size + Align (min_obj_size, align_const)) <= free_list_size)
                {
                    dprintf (3, ("Found adequate unused area: [%Ix, size: %Id",
                                 (size_t)free_list, free_list_size));

                    // 大小足夠時從該bucket的鏈表中pop出來
                    gen_allocator->unlink_item (a_l_idx, free_list, prev_free_item, FALSE);
                    // We ask for more Align (min_obj_size)
                    // to make sure that we can insert a free object
                    // in adjust_limit will set the limit lower
                    size_t limit = limit_from_size (size, free_list_size, gen_number, align_const);

                    uint8_t*  remain = (free_list + limit);
                    size_t remain_size = (free_list_size - limit);
                    // 如果分配完還有剩余空間, 在剩余空間生成一個自由對象並塞回自由對象列表
                    if (remain_size >= Align(min_free_list, align_const))
                    {
                        make_unused_array (remain, remain_size);
                        gen_allocator->thread_item_front (remain, remain_size);
                        assert (remain_size >= Align (min_obj_size, align_const));
                    }
                    else
                    {
                        //absorb the entire free list
                        limit += remain_size;
                    }
                    generation_free_list_space (gen) -= limit;

                    // 給分配上下文設置新的范圍
                    adjust_limit_clr (free_list, limit, acontext, 0, align_const, gen_number);

                    // 分配成功跳出循環
                    can_fit = TRUE;
                    goto end;
                }
                else if (gen_allocator->discard_if_no_fit_p())
                {
                    assert (prev_free_item == 0);
                    dprintf (3, ("couldn't use this free area, discarding"));
                    generation_free_obj_space (gen) += free_list_size;

                    gen_allocator->unlink_item (a_l_idx, free_list, prev_free_item, FALSE);
                    generation_free_list_space (gen) -= free_list_size;
                }
                else
                {
                    prev_free_item = free_list;
                }
                // 同一bucket的下一個自由對象
                free_list = free_list_slot (free_list); 
            }
        }
        // 當前bucket的大小不夠, 下一個bucket的大小會是當前bucket的兩倍
        sz_list = sz_list * 2;
    }
end:
    return can_fit;
}

a_fit_segment_end_p函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會嘗試在堆段的結尾找到一塊足夠大小的空間, 如果找到則把分配上下文指向這個空間

BOOL gc_heap::a_fit_segment_end_p (int gen_number,
                                   heap_segment* seg,
                                   size_t size, 
                                   alloc_context* acontext,
                                   int align_const,
                                   BOOL* commit_failed_p)
{
    *commit_failed_p = FALSE;
    size_t limit = 0;
#ifdef BACKGROUND_GC
    int cookie = -1;
#endif //BACKGROUND_GC

    // 開始分配的地址
    uint8_t*& allocated = ((gen_number == 0) ?
                        alloc_allocated : 
                        heap_segment_allocated(seg));

    size_t pad = Align (min_obj_size, align_const);

#ifdef FEATURE_LOH_COMPACTION
    if (gen_number == (max_generation + 1))
    {
        pad += Align (loh_padding_obj_size, align_const);
    }
#endif //FEATURE_LOH_COMPACTION

    // 最多能分配到的地址 = 已提交到物理內存的地址 - 對齊大小
    uint8_t* end = heap_segment_committed (seg) - pad;

    // 如果空間足夠則跳到found_fit
    if (a_size_fit_p (size, allocated, end, align_const))
    {
        limit = limit_from_size (size, 
                                 (end - allocated), 
                                 gen_number, align_const);
        goto found_fit;
    }

    // 已提交到物理內存的地址不夠用, 需要提交新的地址
    // 最多能分配到的地址 = 堆段預留的末尾地址 - 對齊大小
    end = heap_segment_reserved (seg) - pad;

    // 如果空間足夠則調用grow_heap_segment
    // 調用grow_heap_segment成功則跳到found_fit, 否則設置commit_failed_p的值等於true
    if (a_size_fit_p (size, allocated, end, align_const))
    {
        limit = limit_from_size (size, 
                                 (end - allocated), 
                                 gen_number, align_const);
        if (grow_heap_segment (seg, allocated + limit))
        {
            goto found_fit;
        }
        else
        {
            dprintf (2, ("can't grow segment, doing a full gc"));
            *commit_failed_p = TRUE;
        }
    }
    goto found_no_fit;

found_fit:

    // 如果啟用了后台GC, 並且正在分配大對象, 需要檢測后台GC是否正在標記對象
#ifdef BACKGROUND_GC
    if (gen_number != 0)
    {
        cookie = bgc_alloc_lock->loh_alloc_set (allocated);
    }
#endif //BACKGROUND_GC

    uint8_t* old_alloc;
    old_alloc = allocated;
    // 如果是第3代(大對象)則往對齊的空間添加一個自由對象
#ifdef FEATURE_LOH_COMPACTION
    if (gen_number == (max_generation + 1))
    {
        size_t loh_pad = Align (loh_padding_obj_size, align_const);
        make_unused_array (old_alloc, loh_pad);
        old_alloc += loh_pad;
        allocated += loh_pad;
        limit -= loh_pad;
    }
#endif //FEATURE_LOH_COMPACTION

    // 清空SyncBlock
    // 正常不需要, 因為前一個對象已經清零並預留好空間
#if defined (VERIFY_HEAP) && defined (_DEBUG)
        ((void**) allocated)[-1] = 0;     //clear the sync block
#endif //VERIFY_HEAP && _DEBUG
    // 增加開始分配的地址, 下一次將會從這里分配
    // 注意這個不是本地變量而是引用
    allocated += limit;

    dprintf (3, ("found fit at end of seg: %Ix", old_alloc));

#ifdef BACKGROUND_GC
    if (cookie != -1)
    {
        // 如果后台GC正在標記對象需要調用bgc_loh_alloc_clr給分配上下文設置新的范圍
        // 這個函數會在下一節(分配大對象內存的代碼流程)解釋
        bgc_loh_alloc_clr (old_alloc, limit, acontext, align_const, cookie, TRUE, seg);
    }
    else
#endif //BACKGROUND_GC
    {
        // 給分配上下文設置新的范圍
        adjust_limit_clr (old_alloc, limit, acontext, seg, align_const, gen_number);
    }

    return TRUE;

found_no_fit:

    return FALSE;
}

adjust_limit_clr函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會給分配上下文設置新的范圍
不管是從自由列表還是堆段的結尾分配都會調用這個函數, 從自由列表分配時seg參數會是nullptr
調用完這個函數以后分配上下文就有足夠的空間了, 回到gc_heap::allocate的retry就可以成功的分配到對象的內存

void gc_heap::adjust_limit_clr (uint8_t* start, size_t limit_size,
                                alloc_context* acontext, heap_segment* seg,
                                int align_const, int gen_number)
{
    size_t aligned_min_obj_size = Align(min_obj_size, align_const);

    //probably should pass seg==0 for free lists.
    if (seg)
    {
        assert (heap_segment_used (seg) <= heap_segment_committed (seg));
    }

    dprintf (3, ("Expanding segment allocation [%Ix, %Ix[", (size_t)start,
               (size_t)start + limit_size - aligned_min_obj_size));

    // 如果分配上下文的開始地址改變了, 並且原來的空間未用完(只是不夠用), 應該在這個空間創建一個自由對象
    // 這里就是BOTR中說的如果剩下30bytes但是要分配40bytes時會在原來的30bytes創建一個自由對象
    // 但如果只是結束地址改變了, 開始地址未改變則不需要
    if ((acontext->alloc_limit != start) &&
        (acontext->alloc_limit + aligned_min_obj_size)!= start)
    {
        uint8_t*  hole = acontext->alloc_ptr;
        if (hole != 0)
        {
            size_t  size = (acontext->alloc_limit - acontext->alloc_ptr);
            dprintf (3, ("filling up hole [%Ix, %Ix[", (size_t)hole, (size_t)hole + size + Align (min_obj_size, align_const)));
            // when we are finishing an allocation from a free list
            // we know that the free area was Align(min_obj_size) larger
            acontext->alloc_bytes -= size;
            size_t free_obj_size = size + aligned_min_obj_size;
            make_unused_array (hole, free_obj_size);
            generation_free_obj_space (generation_of (gen_number)) += free_obj_size;
        }
        // 設置新的開始地址
        acontext->alloc_ptr = start;
    }
    // 設置新的結束地址
    acontext->alloc_limit = (start + limit_size - aligned_min_obj_size);
    // 添加已分配的字節數
    acontext->alloc_bytes += limit_size - ((gen_number < max_generation + 1) ? aligned_min_obj_size : 0);

#ifdef FEATURE_APPDOMAIN_RESOURCE_MONITORING
    if (g_fEnableARM)
    {
        AppDomain* alloc_appdomain = GetAppDomain();
        alloc_appdomain->RecordAllocBytes (limit_size, heap_number);
    }
#endif //FEATURE_APPDOMAIN_RESOURCE_MONITORING

    uint8_t* saved_used = 0;

    if (seg)
    {
        saved_used = heap_segment_used (seg);
    }

    // 如果傳入了seg參數, 調整heap_segment::used的位置
    if (seg == ephemeral_heap_segment)
    {
        //Sometimes the allocated size is advanced without clearing the
        //memory. Let's catch up here
        if (heap_segment_used (seg) < (alloc_allocated - plug_skew))
        {
#ifdef MARK_ARRAY
#ifndef BACKGROUND_GC
            clear_mark_array (heap_segment_used (seg) + plug_skew, alloc_allocated);
#endif //BACKGROUND_GC
#endif //MARK_ARRAY
            heap_segment_used (seg) = alloc_allocated - plug_skew;
        }
    }
#ifdef BACKGROUND_GC
    else if (seg)
    {
        uint8_t* old_allocated = heap_segment_allocated (seg) - plug_skew - limit_size;
#ifdef FEATURE_LOH_COMPACTION
        old_allocated -= Align (loh_padding_obj_size, align_const);
#endif //FEATURE_LOH_COMPACTION

        assert (heap_segment_used (seg) >= old_allocated);
    }
#endif //BACKGROUND_GC

    // 對設置的空間進行清0
    // plug_skew其實就是SyncBlock的大小, 這里會把start前面的一個SyncBlock也清0
    // 對大塊內存的清0會比較耗費時間, 清0之前會釋放掉MSL鎖
    if ((seg == 0) ||
        (start - plug_skew + limit_size) <= heap_segment_used (seg))
    {
        dprintf (SPINLOCK_LOG, ("[%d]Lmsl to clear memory(1)", heap_number));
        add_saved_spinlock_info (me_release, mt_clr_mem);
        leave_spin_lock (&more_space_lock);
        dprintf (3, ("clearing memory at %Ix for %d bytes", (start - plug_skew), limit_size));
        memclr (start - plug_skew, limit_size);
    }
    else
    {
        uint8_t* used = heap_segment_used (seg);
        heap_segment_used (seg) = start + limit_size - plug_skew;

        dprintf (SPINLOCK_LOG, ("[%d]Lmsl to clear memory", heap_number));
        add_saved_spinlock_info (me_release, mt_clr_mem);
        leave_spin_lock (&more_space_lock);
        if ((start - plug_skew) < used)
        {
            if (used != saved_used)
            {
                FATAL_GC_ERROR ();
            }

            dprintf (2, ("clearing memory before used at %Ix for %Id bytes", 
                (start - plug_skew), (plug_skew + used - start)));
            memclr (start - plug_skew, used - (start - plug_skew));
        }
    }

    // 設置BrickTable
    // BrickTable中屬於start的塊會設置為alloc_ptr距離塊開始地址的大小
    // 之后一直到start + limit的塊會設置為-1
    //this portion can be done after we release the lock
    if (seg == ephemeral_heap_segment)
    {
#ifdef FFIND_OBJECT
        if (gen0_must_clear_bricks > 0)
        {
            //set the brick table to speed up find_object
            size_t b = brick_of (acontext->alloc_ptr);
            set_brick (b, acontext->alloc_ptr - brick_address (b));
            b++;
            dprintf (3, ("Allocation Clearing bricks [%Ix, %Ix[",
                         b, brick_of (align_on_brick (start + limit_size))));
            volatile short* x = &brick_table [b];
            short* end_x = &brick_table [brick_of (align_on_brick (start + limit_size))];

            for (;x < end_x;x++)
                *x = -1;
        }
        else
#endif //FFIND_OBJECT
        {
            gen0_bricks_cleared = FALSE;
        }
    }

    // verifying the memory is completely cleared.
    //verify_mem_cleared (start - plug_skew, limit_size);
}

總結小對象內存的代碼流程

allocate: 嘗試從分配上下文分配內存, 失敗時調用allocate_more_space為分配上下文指定新的空間
- allocate_more_space: 調用try_allocate_more_space函數
  - try_allocate_more_space: 檢查是否有必要觸發GC, 然后根據gen_number參數調用allocate_small或allocate_large函數
    - allocate_small: 循環嘗試進行各種回收內存的處理和調用soh_try_fit函數
      - soh_try_fit: 先嘗試調用a_fit_free_list_p從自由對象列表中分配, 然后嘗試調用a_fit_segment_end_p從堆段結尾分配
        
        a_fit_free_list_p: 嘗試從自由對象列表中找到足夠大小的空間, 如果找到則把分配上下文指向這個空間
        
        adjust_limit_clr: 給分配上下文設置新的范圍
        
        a_fit_segment_end_p: 嘗試在堆段的結尾找到一塊足夠大小的空間, 如果找到則把分配上下文指向這個空間
        
        adjust_limit_clr: 給分配上下文設置新的范圍

分配大對象內存的代碼流程

讓我們來看一下大對象的內存是如何分配的
分配小對象我們從gc_heap::allocate開始跟蹤, 這里我們從gc_heap::allocate_large_object開始跟蹤

allocate_large_object函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數和allocate函數不同的是它不會嘗試從分配上下文中分配, 而是直接從堆段中分配

CObjectHeader* gc_heap::allocate_large_object (size_t jsize, int64_t& alloc_bytes)
{
    // 創建一個空的分配上下文
    //create a new alloc context because gen3context is shared.
    alloc_context acontext;
    acontext.alloc_ptr = 0;
    acontext.alloc_limit = 0;
    acontext.alloc_bytes = 0;
#ifdef MULTIPLE_HEAPS
    acontext.alloc_heap = vm_heap;
#endif //MULTIPLE_HEAPS

#ifdef MARK_ARRAY
    uint8_t* current_lowest_address = lowest_address;
    uint8_t* current_highest_address = highest_address;
#ifdef BACKGROUND_GC
    if (recursive_gc_sync::background_running_p())
    {
        current_lowest_address = background_saved_lowest_address;
        current_highest_address = background_saved_highest_address;
    }
#endif //BACKGROUND_GC
#endif // MARK_ARRAY

    // 檢查對象大小是否超過了最大允許的對象大小
    // 超過時分配失敗
    size_t maxObjectSize = (INT32_MAX - 7 - Align(min_obj_size));

#ifdef BIT64
    if (g_pConfig->GetGCAllowVeryLargeObjects())
    {
        maxObjectSize = (INT64_MAX - 7 - Align(min_obj_size));
    }
#endif

    if (jsize >= maxObjectSize)
    {
        if (g_pConfig->IsGCBreakOnOOMEnabled())
        {
            GCToOSInterface::DebugBreak();
        }

#ifndef FEATURE_REDHAWK
        ThrowOutOfMemoryDimensionsExceeded();
#else
        return 0;
#endif
    }

    // 計算對齊
    size_t size = AlignQword (jsize);
    int align_const = get_alignment_constant (FALSE);
#ifdef FEATURE_LOH_COMPACTION
    size_t pad = Align (loh_padding_obj_size, align_const);
#else
    size_t pad = 0;
#endif //FEATURE_LOH_COMPACTION

    // 調用allocate_more_space函數
    // 因為分配上下文是空的, 這里我們給分配上下文指定的空間就是這個大對象使用的空間
    assert (size >= Align (min_obj_size, align_const));
#ifdef _MSC_VER
#pragma inline_depth(0)
#endif //_MSC_VER
    if (! allocate_more_space (&acontext, (size + pad), max_generation+1))
    {
        return 0;
    }

#ifdef _MSC_VER
#pragma inline_depth(20)
#endif //_MSC_VER

#ifdef FEATURE_LOH_COMPACTION
    // The GC allocator made a free object already in this alloc context and
    // adjusted the alloc_ptr accordingly.
#endif //FEATURE_LOH_COMPACTION

    // 對象分配到剛才獲取到的空間的開始地址
    uint8_t*  result = acontext.alloc_ptr;

    // 空間大小應該等於對象大小
    assert ((size_t)(acontext.alloc_limit - acontext.alloc_ptr) == size);

    // 返回結果
    CObjectHeader* obj = (CObjectHeader*)result;

#ifdef MARK_ARRAY
    if (recursive_gc_sync::background_running_p())
    {
        // 如果對象不在掃描范圍中清掉標記的bit
        if ((result < current_highest_address) && (result >= current_lowest_address))
        {
            dprintf (3, ("Clearing mark bit at address %Ix",
                     (size_t)(&mark_array [mark_word_of (result)])));

            mark_array_clear_marked (result);
        }
#ifdef BACKGROUND_GC
        //the object has to cover one full mark uint32_t
        assert (size > mark_word_size);
        if (current_c_gc_state == c_gc_state_marking)
        {
            dprintf (3, ("Concurrent allocation of a large object %Ix",
                        (size_t)obj));
            // 如果對象在掃描范圍中則設置標記bit防止它被回收
            //mark the new block specially so we know it is a new object
            if ((result < current_highest_address) && (result >= current_lowest_address))
            {
                dprintf (3, ("Setting mark bit at address %Ix",
                            (size_t)(&mark_array [mark_word_of (result)])));
    
                mark_array_set_marked (result);
            }
        }
#endif //BACKGROUND_GC
    }
#endif //MARK_ARRAY

    assert (obj != 0);
    assert ((size_t)obj == Align ((size_t)obj, align_const));

    alloc_bytes += acontext.alloc_bytes;
    return obj;
}

allocate_more_space這個函數我們在之前已經看過了, 忘掉的可以向前翻
這個函數會調用try_allocate_more_space函數
try_allocate_more_space函數在分配大對象時會調用allocate_large函數

allocate_large函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數的結構和alloc_small相似但是內部處理的細節不一樣

BOOL gc_heap::allocate_large (int gen_number,
                              size_t size, 
                              alloc_context* acontext,
                              int align_const)
{
    // 后台GC運行時且不在計划階段
    // 原來是16次處理1次但是現在if被注釋了
#ifdef BACKGROUND_GC
    if (recursive_gc_sync::background_running_p() && (current_c_gc_state != c_gc_state_planning))
    {
        background_loh_alloc_count++;
        //if ((background_loh_alloc_count % bgc_alloc_spin_count_loh) == 0)
        {
            // 如果合適在后台GC完成前分配對象
            if (bgc_loh_should_allocate())
            {
                // 如果記錄的LOH(Large Object Heap)增長比較大則這個線程需要暫停一下, 先安排其他線程工作
                // 釋放MSL鎖並調用YieldThread, 如果switchCount參數(bgc_alloc_spin_loh)較大還有可能休眠1ms
                if (!bgc_alloc_spin_loh)
                {
                    Thread* current_thread = GetThread();
                    add_saved_spinlock_info (me_release, mt_alloc_large);
                    dprintf (SPINLOCK_LOG, ("[%d]spin Lmsl loh", heap_number));
                    leave_spin_lock (&more_space_lock);
                    BOOL cooperative_mode = enable_preemptive (current_thread);
                    GCToOSInterface::YieldThread (bgc_alloc_spin_loh);
                    disable_preemptive (current_thread, cooperative_mode);
                    enter_spin_lock (&more_space_lock);
                    add_saved_spinlock_info (me_acquire, mt_alloc_large);
                    dprintf (SPINLOCK_LOG, ("[%d]spin Emsl loh", heap_number));
                }
            }
            // 不合適時等待后台GC完成
            else
            {
                wait_for_background (awr_loh_alloc_during_bgc);
            }
        }
    }
#endif //BACKGROUND_GC

    gc_reason gr = reason_oos_loh;
    generation* gen = generation_of (gen_number);
    oom_reason oom_r = oom_no_failure;
    size_t current_full_compact_gc_count = 0;

    // No variable values should be "carried over" from one state to the other. 
    // That's why there are local variable for each state
    allocation_state loh_alloc_state = a_state_start;
#ifdef RECORD_LOH_STATE
    EEThreadId current_thread_id;
    current_thread_id.SetToCurrentThread();
#endif //RECORD_LOH_STATE

    // 開始循環切換狀態, 請關注loh_alloc_state
    // If we can get a new seg it means allocation will succeed.
    while (1)
    {
        dprintf (3, ("[h%d]loh state is %s", heap_number, allocation_state_str[loh_alloc_state]));

#ifdef RECORD_LOH_STATE
        add_saved_loh_state (loh_alloc_state, current_thread_id);
#endif //RECORD_LOH_STATE
        switch (loh_alloc_state)
        {
            // 成功或失敗時跳出循環
            case a_state_can_allocate:
            case a_state_cant_allocate:
            {
                goto exit;
            }
            // 開始時切換狀態到a_state_try_fit
            case a_state_start:
            {
                loh_alloc_state = a_state_try_fit;
                break;
            }
            // 調用loh_try_fit函數
            // 成功時切換狀態到a_state_can_allocate
            // 失敗時切換狀態到a_state_trigger_full_compact_gc或a_state_acquire_seg
            case a_state_try_fit:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = loh_try_fit (gen_number, size, acontext, 
                                                  align_const, &commit_failed_p, &oom_r);
                loh_alloc_state = (can_use_existing_p ?
                                        a_state_can_allocate : 
                                        (commit_failed_p ? 
                                            a_state_trigger_full_compact_gc :
                                            a_state_acquire_seg));
                assert ((loh_alloc_state == a_state_can_allocate) == (acontext->alloc_ptr != 0));
                break;
            }
            // 在創建了一個新的堆段以后調用loh_try_fit函數
            // 成功時切換狀態到a_state_can_allocate
            // 失敗時切換狀態到a_state_try_fit
            case a_state_try_fit_new_seg:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = loh_try_fit (gen_number, size, acontext, 
                                                  align_const, &commit_failed_p, &oom_r);
                // 即使我們創建了一個新的堆段也不代表分配一定會成功，例如被其他線程搶走了，如果這樣我們需要重試
                // Even after we got a new seg it doesn't necessarily mean we can allocate,
                // another LOH allocating thread could have beat us to acquire the msl so 
                // we need to try again.
                loh_alloc_state = (can_use_existing_p ? a_state_can_allocate : a_state_try_fit);
                assert ((loh_alloc_state == a_state_can_allocate) == (acontext->alloc_ptr != 0));
                break;
            }
            // 在壓縮GC后創建一個新的堆段成功, 調用loh_try_fit函數在這個堆段上分配
            // 成功時切換狀態到a_state_can_allocate
            // 失敗時如果提交到物理內存失敗(物理內存不足)則切換狀態到a_state_cant_allocate
            // 否則再嘗試一次創建一個新的堆段
            case a_state_try_fit_new_seg_after_cg:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = loh_try_fit (gen_number, size, acontext, 
                                                  align_const, &commit_failed_p, &oom_r);
                // Even after we got a new seg it doesn't necessarily mean we can allocate,
                // another LOH allocating thread could have beat us to acquire the msl so 
                // we need to try again. However, if we failed to commit, which means we 
                // did have space on the seg, we bail right away 'cause we already did a 
                // full compacting GC.
                loh_alloc_state = (can_use_existing_p ? 
                                        a_state_can_allocate : 
                                        (commit_failed_p ? 
                                            a_state_cant_allocate :
                                            a_state_acquire_seg_after_cg));
                assert ((loh_alloc_state == a_state_can_allocate) == (acontext->alloc_ptr != 0));
                break;
            }
            // 這個狀態目前不會被其他狀態切換到
            // 簡單的調用loh_try_fit函數成功則切換到a_state_can_allocate失敗則切換到a_state_cant_allocate
            case a_state_try_fit_no_seg:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = loh_try_fit (gen_number, size, acontext, 
                                                  align_const, &commit_failed_p, &oom_r);
                loh_alloc_state = (can_use_existing_p ? a_state_can_allocate : a_state_cant_allocate);
                assert ((loh_alloc_state == a_state_can_allocate) == (acontext->alloc_ptr != 0));
                assert ((loh_alloc_state != a_state_cant_allocate) || (oom_r != oom_no_failure));
                break;
            }
            // 壓縮GC完成后調用loh_try_fit函數
            // 成功時切換狀態到a_state_can_allocate
            // 如果壓縮后仍分配失敗, 並且提交內存到物理內存失敗(物理內存不足)則切換狀態到a_state_cant_allocate
            // 如果壓縮后仍分配失敗, 但是提交內存到物理內存並無失敗則嘗試再次創建一個新的堆段
            case a_state_try_fit_after_cg:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = loh_try_fit (gen_number, size, acontext, 
                                                  align_const, &commit_failed_p, &oom_r);
                loh_alloc_state = (can_use_existing_p ?
                                        a_state_can_allocate : 
                                        (commit_failed_p ? 
                                            a_state_cant_allocate :
                                            a_state_acquire_seg_after_cg));
                assert ((loh_alloc_state == a_state_can_allocate) == (acontext->alloc_ptr != 0));
                break;
            }
            // 在后台GC完成后調用loh_try_fit函數
            // 成功時切換狀態到a_state_can_allocate
            // 如果提交內存到物理內存失敗(物理內存不足)則切換狀態到a_state_trigger_full_compact_gc
            // 如果提交內存到物理內存並無失敗則嘗試創建一個新的堆段
            case a_state_try_fit_after_bgc:
            {
                BOOL commit_failed_p = FALSE;
                BOOL can_use_existing_p = FALSE;

                can_use_existing_p = loh_try_fit (gen_number, size, acontext, 
                                                  align_const, &commit_failed_p, &oom_r);
                loh_alloc_state = (can_use_existing_p ?
                                        a_state_can_allocate : 
                                        (commit_failed_p ? 
                                            a_state_trigger_full_compact_gc :
                                            a_state_acquire_seg_after_bgc));
                assert ((loh_alloc_state == a_state_can_allocate) == (acontext->alloc_ptr != 0));
                break;
            }
            // 嘗試創建一個新的堆段
            // 成功時切換狀態到a_state_try_fit_new_seg
            // 失敗時如果已執行了壓縮則切換狀態到a_state_check_retry_seg, 否則切換狀態到a_state_check_and_wait_for_bgc
            case a_state_acquire_seg:
            {
                BOOL can_get_new_seg_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;

                current_full_compact_gc_count = get_full_compact_gc_count();

                can_get_new_seg_p = loh_get_new_seg (gen, size, align_const, &did_full_compacting_gc, &oom_r);
                loh_alloc_state = (can_get_new_seg_p ? 
                                        a_state_try_fit_new_seg : 
                                        (did_full_compacting_gc ? 
                                            a_state_check_retry_seg :
                                            a_state_check_and_wait_for_bgc));
                break;
            }
            // 嘗試在壓縮GC后創建一個新的堆段
            // 成功時切換狀態到a_state_try_fit_new_seg_after_cg
            // 失敗時切換狀態到a_state_check_retry_seg
            case a_state_acquire_seg_after_cg:
            {
                BOOL can_get_new_seg_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;

                current_full_compact_gc_count = get_full_compact_gc_count();

                can_get_new_seg_p = loh_get_new_seg (gen, size, align_const, &did_full_compacting_gc, &oom_r);
                // Since we release the msl before we try to allocate a seg, other
                // threads could have allocated a bunch of segments before us so
                // we might need to retry.
                loh_alloc_state = (can_get_new_seg_p ? 
                                        a_state_try_fit_new_seg_after_cg : 
                                        a_state_check_retry_seg);
                break;
            }
            // 后台GC完成后嘗試創建一個新的堆段
            // 成功時切換狀態到a_state_try_fit_new_seg
            // 失敗時如果已執行了壓縮則切換狀態到a_state_check_retry_seg, 否則切換狀態到a_state_trigger_full_compact_gc
            case a_state_acquire_seg_after_bgc:
            {
                BOOL can_get_new_seg_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;
             
                current_full_compact_gc_count = get_full_compact_gc_count();

                can_get_new_seg_p = loh_get_new_seg (gen, size, align_const, &did_full_compacting_gc, &oom_r); 
                loh_alloc_state = (can_get_new_seg_p ? 
                                        a_state_try_fit_new_seg : 
                                        (did_full_compacting_gc ? 
                                            a_state_check_retry_seg :
                                            a_state_trigger_full_compact_gc));
                assert ((loh_alloc_state != a_state_cant_allocate) || (oom_r != oom_no_failure));
                break;
            }
            // 等待后台GC完成
            // 如果后台GC不在運行狀態中則切換狀態到a_state_trigger_full_compact_gc
            // 如果執行了壓縮則切換狀態到a_state_try_fit_after_cg, 否則切換狀態到a_state_try_fit_after_bgc
            case a_state_check_and_wait_for_bgc:
            {
                BOOL bgc_in_progress_p = FALSE;
                BOOL did_full_compacting_gc = FALSE;

                if (fgn_maxgen_percent)
                {
                    dprintf (2, ("FGN: failed to acquire seg, may need to do a full blocking GC"));
                    send_full_gc_notification (max_generation, FALSE);
                }

                bgc_in_progress_p = check_and_wait_for_bgc (awr_loh_oos_bgc, &did_full_compacting_gc);
                loh_alloc_state = (!bgc_in_progress_p ?
                                        a_state_trigger_full_compact_gc : 
                                        (did_full_compacting_gc ? 
                                            a_state_try_fit_after_cg :
                                            a_state_try_fit_after_bgc));
                break;
            }
            // 觸發第0和1和2代的壓縮GC
            // 成功時切換狀態到a_state_try_fit_after_cg, 失敗時切換狀態到a_state_cant_allocate
            case a_state_trigger_full_compact_gc:
            {
                BOOL got_full_compacting_gc = FALSE;

                got_full_compacting_gc = trigger_full_compact_gc (gr, &oom_r);
                loh_alloc_state = (got_full_compacting_gc ? a_state_try_fit_after_cg : a_state_cant_allocate);
                assert ((loh_alloc_state != a_state_cant_allocate) || (oom_r != oom_no_failure));
                break;
            }
            // 檢查是否應該重試GC或申請新的堆段
            // 應該重試GC時切換狀態到a_state_trigger_full_compact_gc
            // 應該重試申請新的堆段時切換狀態到a_state_acquire_seg_after_cg
            // 否則切換狀態到a_state_cant_allocate
            // 如果不能獲取一個新的堆段, 但是對原來的堆段執行了壓縮GC那就應該重試
            case a_state_check_retry_seg:
            {
                BOOL should_retry_gc = retry_full_compact_gc (size);
                BOOL should_retry_get_seg = FALSE;
                if (!should_retry_gc)
                {
                    size_t last_full_compact_gc_count = current_full_compact_gc_count;
                    current_full_compact_gc_count = get_full_compact_gc_count();

                    if (current_full_compact_gc_count > (last_full_compact_gc_count + 1))
                    {
                        should_retry_get_seg = TRUE;
                    }
                }
    
                loh_alloc_state = (should_retry_gc ? 
                                        a_state_trigger_full_compact_gc : 
                                        (should_retry_get_seg ?
                                            a_state_acquire_seg_after_cg :
                                            a_state_cant_allocate));
                assert ((loh_alloc_state != a_state_cant_allocate) || (oom_r != oom_no_failure));
                break;
            }
            default:
            {
                assert (!"Invalid state!");
                break;
            }
        }
    }

exit:
    // 分配失敗時處理OOM(Out Of Memory)
    if (loh_alloc_state == a_state_cant_allocate)
    {
        assert (oom_r != oom_no_failure);
        handle_oom (heap_number, 
                    oom_r, 
                    size,
                    0,
                    0);

        add_saved_spinlock_info (me_release, mt_alloc_large_cant);
        dprintf (SPINLOCK_LOG, ("[%d]Lmsl for loh oom", heap_number));
        leave_spin_lock (&more_space_lock);
    }

    return (loh_alloc_state == a_state_can_allocate);
}

loh_try_fit函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
處理和soh_try_fit差不多, 先嘗試調用a_fit_free_list_large_p從自由對象列表中分配, 然后嘗試調用loh_a_fit_segment_end_p從堆段結尾分配

BOOL gc_heap::loh_try_fit (int gen_number,
                           size_t size, 
                           alloc_context* acontext,
                           int align_const,
                           BOOL* commit_failed_p,
                           oom_reason* oom_r)
{
    BOOL can_allocate = TRUE;

    // 嘗試從自由對象列表分配
    if (!a_fit_free_list_large_p (size, acontext, align_const))
    {
        // 嘗試從堆段結尾分配
        can_allocate = loh_a_fit_segment_end_p (gen_number, size, 
                                                acontext, align_const, 
                                                commit_failed_p, oom_r);

        // 后台GC運行時, 統計在堆段結尾分配的大小
#ifdef BACKGROUND_GC
        if (can_allocate && recursive_gc_sync::background_running_p())
        {
            bgc_loh_size_increased += size;
        }
#endif //BACKGROUND_GC
    }
#ifdef BACKGROUND_GC
    else
    {
        // 后台GC運行時, 統計在自由對象列表分配的大小
        if (recursive_gc_sync::background_running_p())
        {
            bgc_loh_allocated_in_free += size;
        }
    }
#endif //BACKGROUND_GC

    return can_allocate;
}

a_fit_free_list_large_p函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
和a_fit_free_list_p的處理基本相同, 但是在支持LOH壓縮時會生成填充對象, 並且有可能會調用bgc_loh_alloc_clr函數

BOOL gc_heap::a_fit_free_list_large_p (size_t size, 
                                       alloc_context* acontext,
                                       int align_const)
{
    // 如果后台GC在計划階段, 等待計划完成
#ifdef BACKGROUND_GC
    wait_for_background_planning (awr_loh_alloc_during_plan);
#endif //BACKGROUND_GC

    // 獲取第3代的自由對象列表
    BOOL can_fit = FALSE;
    int gen_number = max_generation + 1;
    generation* gen = generation_of (gen_number);
    allocator* loh_allocator = generation_allocator (gen); 

    // 支持LOH壓縮時需要在大對象前塞一個填充對象
#ifdef FEATURE_LOH_COMPACTION
    size_t loh_pad = Align (loh_padding_obj_size, align_const);
#endif //FEATURE_LOH_COMPACTION

#ifdef BACKGROUND_GC
    int cookie = -1;
#endif //BACKGROUND_GC
    // 列表會按大小分為多個bucket(用鏈表形式鏈接)
    // 大小會*2遞增, 例如first_bucket的大小是256那第二個bucket的大小則為512
    size_t sz_list = loh_allocator->first_bucket_size();
    for (unsigned int a_l_idx = 0; a_l_idx < loh_allocator->number_of_buckets(); a_l_idx++)
    {
        if ((size < sz_list) || (a_l_idx == (loh_allocator->number_of_buckets()-1)))
        {
            uint8_t* free_list = loh_allocator->alloc_list_head_of (a_l_idx);
            uint8_t* prev_free_item = 0;
            while (free_list != 0)
            {
                dprintf (3, ("considering free list %Ix", (size_t)free_list));

                size_t free_list_size = unused_array_size(free_list);

#ifdef FEATURE_LOH_COMPACTION
                if ((size + loh_pad) <= free_list_size)
#else
                if (((size + Align (min_obj_size, align_const)) <= free_list_size)||
                    (size == free_list_size))
#endif //FEATURE_LOH_COMPACTION
                {
                    // 如果啟用了后台GC, 並且正在分配大對象, 需要檢測后台GC是否正在標記對象
#ifdef BACKGROUND_GC
                    cookie = bgc_alloc_lock->loh_alloc_set (free_list);
#endif //BACKGROUND_GC

                    // 大小足夠時從該bucket的鏈表中pop出來
                    //unlink the free_item
                    loh_allocator->unlink_item (a_l_idx, free_list, prev_free_item, FALSE);

                    // Substract min obj size because limit_from_size adds it. Not needed for LOH
                    size_t limit = limit_from_size (size - Align(min_obj_size, align_const), free_list_size, 
                                                    gen_number, align_const);

                    // 支持LOH壓縮時需要在大對象前塞一個填充對象
#ifdef FEATURE_LOH_COMPACTION
                    make_unused_array (free_list, loh_pad);
                    limit -= loh_pad;
                    free_list += loh_pad;
                    free_list_size -= loh_pad;
#endif //FEATURE_LOH_COMPACTION

                    // 如果分配完還有剩余空間, 在剩余空間生成一個自由對象並塞回自由對象列表
                    uint8_t*  remain = (free_list + limit);
                    size_t remain_size = (free_list_size - limit);
                    if (remain_size != 0)
                    {
                        assert (remain_size >= Align (min_obj_size, align_const));
                        make_unused_array (remain, remain_size);
                    }
                    if (remain_size >= Align(min_free_list, align_const))
                    {
                        loh_thread_gap_front (remain, remain_size, gen);
                        assert (remain_size >= Align (min_obj_size, align_const));
                    }
                    else
                    {
                        generation_free_obj_space (gen) += remain_size;
                    }
                    generation_free_list_space (gen) -= free_list_size;
                    dprintf (3, ("found fit on loh at %Ix", free_list));
#ifdef BACKGROUND_GC
                    if (cookie != -1)
                    {
                        // 如果后台GC正在標記對象需要調用bgc_loh_alloc_clr給分配上下文設置新的范圍
                        bgc_loh_alloc_clr (free_list, limit, acontext, align_const, cookie, FALSE, 0);
                    }
                    else
#endif //BACKGROUND_GC
                    {
                        // 給分配上下文設置新的范圍
                        adjust_limit_clr (free_list, limit, acontext, 0, align_const, gen_number);
                    }

                    //fix the limit to compensate for adjust_limit_clr making it too short 
                    acontext->alloc_limit += Align (min_obj_size, align_const);
                    can_fit = TRUE;
                    goto exit;
                }
                // 同一bucket的下一個自由對象
                prev_free_item = free_list;
                free_list = free_list_slot (free_list); 
            }
        }
        // 當前bucket的大小不夠, 下一個bucket的大小會是當前bucket的兩倍
        sz_list = sz_list * 2;
    }
exit:
    return can_fit;
}

adjust_limit_clr這個函數我們在看小對象的代碼流程時已經看過
這里看bgc_loh_alloc_clr函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數是在后台GC運行時分配大對象使用的, 需要照顧到運行中的后台GC

#ifdef BACKGROUND_GC
void gc_heap::bgc_loh_alloc_clr (uint8_t* alloc_start,
                                 size_t size, 
                                 alloc_context* acontext,
                                 int align_const, 
                                 int lock_index,
                                 BOOL check_used_p,
                                 heap_segment* seg)
{
    // 一開始就在這片空間創建一個自由對象
    // 因為等會要釋放在bgc_alloc_lock中的鎖再清0內存所以要先創建一個自由對象防止GC使用這塊空間
    // 這個自由對象在最后重新上鎖后會被重置回空白的空間
    make_unused_array (alloc_start, size);

#ifdef FEATURE_APPDOMAIN_RESOURCE_MONITORING
    if (g_fEnableARM)
    {
        AppDomain* alloc_appdomain = GetAppDomain();
        alloc_appdomain->RecordAllocBytes (size, heap_number);
    }
#endif //FEATURE_APPDOMAIN_RESOURCE_MONITORING

    size_t size_of_array_base = sizeof(ArrayBase);

    // 釋放cookie對應的鎖 (設置數組中lock_index位置的值為0)
    bgc_alloc_lock->loh_alloc_done_with_index (lock_index);

    // 開始對內存進行清0
    // 計算清0的的范圍
    // clear memory while not holding the lock. 
    size_t size_to_skip = size_of_array_base;
    size_t size_to_clear = size - size_to_skip - plug_skew;
    size_t saved_size_to_clear = size_to_clear;
    if (check_used_p)
    {
        uint8_t* end = alloc_start + size - plug_skew;
        uint8_t* used = heap_segment_used (seg);
        if (used < end)
        {
            if ((alloc_start + size_to_skip) < used)
            {
                size_to_clear = used - (alloc_start + size_to_skip);
            }
            else
            {
                size_to_clear = 0;
            }
            // 調整heap_segment::used的位置
            dprintf (2, ("bgc loh: setting used to %Ix", end));
            heap_segment_used (seg) = end;
        }

        dprintf (2, ("bgc loh: used: %Ix, alloc: %Ix, end of alloc: %Ix, clear %Id bytes",
                     used, alloc_start, end, size_to_clear));
    }
    else
    {
        dprintf (2, ("bgc loh: [%Ix-[%Ix(%Id)", alloc_start, alloc_start+size, size));
    }

#ifdef VERIFY_HEAP
    // since we filled in 0xcc for free object when we verify heap,
    // we need to make sure we clear those bytes.
    if (g_pConfig->GetHeapVerifyLevel() & EEConfig::HEAPVERIFY_GC)
    {
        if (size_to_clear < saved_size_to_clear)
        {
            size_to_clear = saved_size_to_clear;
        }
    }
#endif //VERIFY_HEAP
    
    // 釋放MSL鎖並清0內存
    dprintf (SPINLOCK_LOG, ("[%d]Lmsl to clear large obj", heap_number));
    add_saved_spinlock_info (me_release, mt_clr_large_mem);
    leave_spin_lock (&more_space_lock);
    memclr (alloc_start + size_to_skip, size_to_clear);

    // 重新找一個鎖鎖上
    // 這里的鎖會在PublishObject時釋放
    bgc_alloc_lock->loh_alloc_set (alloc_start);

    // 設置分配上下文指向的范圍
    acontext->alloc_ptr = alloc_start;
    acontext->alloc_limit = (alloc_start + size - Align (min_obj_size, align_const));

    // 把自由對象重新變回一塊空白的空間
    // need to clear the rest of the object before we hand it out.
    clear_unused_array(alloc_start, size);
}
#endif //BACKGROUND_GC

loh_a_fit_segment_end_p函數的內容: https://raw.githubusercontent.com/dotnet/coreclr/release/1.1.0/src/gc/gc.cpp
這個函數會遍歷第3代的堆段鏈表逐個調用a_fit_segment_end_p函數嘗試分配

BOOL gc_heap::loh_a_fit_segment_end_p (int gen_number,
                                       size_t size, 
                                       alloc_context* acontext,
                                       int align_const,
                                       BOOL* commit_failed_p,
                                       oom_reason* oom_r)
{
    *commit_failed_p = FALSE;
    // 獲取代中第一個堆段節點用於接下來的分配
    heap_segment* seg = generation_allocation_segment (generation_of (gen_number));
    BOOL can_allocate_p = FALSE;

    while (seg)
    {
        // 調用a_fit_segment_end_p嘗試在這個堆段的結尾分配
        if (a_fit_segment_end_p (gen_number, seg, (size - Align (min_obj_size, align_const)), 
                                 acontext, align_const, commit_failed_p))
        {
            acontext->alloc_limit += Align (min_obj_size, align_const);
            can_allocate_p = TRUE;
            break;
        }
        else
        {
            if (*commit_failed_p)
            {
                // 如果堆段還有剩余空間但不能提交到物理內存, 則返回內存不足錯誤
                *oom_r = oom_cant_commit;
                break;
            }
            else
            {
                // 如果堆段已無剩余空間, 看鏈表中的下一個堆段
                seg = heap_segment_next_rw (seg);
            }
        }
    }

    return can_allocate_p;
}

總結大對象內存的代碼流程

allocate_large_object: 調用allocate_more_space為一個空的分配上下文指定新的空間, 空間大小會等於對象的大小
- allocate_more_space: 調用try_allocate_more_space函數
  - try_allocate_more_space: 檢查是否有必要觸發GC, 然后根據gen_number參數調用allocate_small或allocate_large函數
    - allocate_large: 循環嘗試進行各種回收內存的處理和調用soh_try_fit函數
      - loh_try_fit: 先嘗試調用a_fit_free_list_large_p從自由對象列表中分配, 然后嘗試調用loh_a_fit_segment_end_p從堆段結尾分配
        
        a_fit_free_list_large_p: 嘗試從自由對象列表中找到足夠大小的空間, 如果找到則把分配上下文指向這個空間
        
        bgc_loh_alloc_clr: 給分配上下文設置新的范圍, 照顧到后台GC
        
        adjust_limit_clr: 給分配上下文設置新的范圍
        
        loh_a_fit_segment_end_p: 遍歷第3代的堆段鏈表逐個調用a_fit_segment_end_p函數嘗試分配
        
        a_fit_segment_end_p: 嘗試在堆段的結尾找到一塊足夠大小的空間, 如果找到則把分配上下文指向這個空間
        
        bgc_loh_alloc_clr: 給分配上下文設置新的范圍, 照顧到后台GC
        
        adjust_limit_clr: 給分配上下文設置新的范圍

CoreCLR如何管理系統內存 (windows, linux)

看到這里我們應該知道分配上下文, 小對象, 大對象的內存都是來源於堆段, 那堆段的內存來源於哪里呢?
GC在程序啟動時會創建默認的堆段, 調用流程是init_gc_heap => get_initial_segment => make_heap_segment
如果默認的堆段不夠用會創建新的堆段
小對象的堆段會通過gc1 => plan_phase => soh_get_segment_to_expand => get_segment => make_heap_segment創建
大對象的堆段會通過allocate_large => loh_get_new_seg => get_large_segment => get_segment_for_loh => get_segment => make_heap_segment創建

默認的堆段會通過next_initial_memory分配內存, 這一塊內存在程序啟動時從reserve_initial_memory函數申請
reserve_initial_memory函數和make_heap_segment函數都會調用virtual_alloc函數

因為調用流程很長我這里就不一個個函數貼代碼了, 有興趣的可以自己去跟蹤
virtual_alloc函數的調用流程是

virtual_alloc => GCToOSInterface::VirtualReserve => ClrVirtualAllocAligned => ClrVirtualAlloc =>
CExecutionEngine::ClrVirtualAlloc => EEVirtualAlloc => VirtualAlloc

如果是windows, VirtualAlloc就是同名的windows api
如果是linux或者macosx, 調用流程是VirtualAlloc => VIRTUALReserveMemory => ReserveVirtualMemory
ReserveVirtualMemory函數會調用mmap函數

ReserveVirtualMemory函數的內容: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/pal/src/map/virtual.cpp#L894

static LPVOID ReserveVirtualMemory(
                IN CPalThread *pthrCurrent, /* Currently executing thread */
                IN LPVOID lpAddress,        /* Region to reserve or commit */
                IN SIZE_T dwSize)           /* Size of Region */
{
    UINT_PTR StartBoundary = (UINT_PTR)lpAddress;
    SIZE_T MemSize = dwSize;

    TRACE( "Reserving the memory now.\n");

    // Most platforms will only commit memory if it is dirtied,
    // so this should not consume too much swap space.
    int mmapFlags = 0;

#if HAVE_VM_ALLOCATE
    // Allocate with vm_allocate first, then map at the fixed address.
    int result = vm_allocate(mach_task_self(),
                             &StartBoundary,
                             MemSize,
                             ((LPVOID) StartBoundary != nullptr) ? FALSE : TRUE);

    if (result != KERN_SUCCESS)
    {
        ERROR("vm_allocate failed to allocated the requested region!\n");
        pthrCurrent->SetLastError(ERROR_INVALID_ADDRESS);
        return nullptr;
    }

    mmapFlags |= MAP_FIXED;
#endif // HAVE_VM_ALLOCATE

    mmapFlags |= MAP_ANON | MAP_PRIVATE;

    LPVOID pRetVal = mmap((LPVOID) StartBoundary,
                          MemSize,
                          PROT_NONE,
                          mmapFlags,
                          -1 /* fd */,
                          0  /* offset */);

    if (pRetVal == MAP_FAILED)
    {
        ERROR( "Failed due to insufficient memory.\n" );

#if HAVE_VM_ALLOCATE
        vm_deallocate(mach_task_self(), StartBoundary, MemSize);
#endif // HAVE_VM_ALLOCATE

        pthrCurrent->SetLastError(ERROR_NOT_ENOUGH_MEMORY);
        return nullptr;
    }

    /* Check to see if the region is what we asked for. */
    if (lpAddress != nullptr && StartBoundary != (UINT_PTR)pRetVal)
    {
        ERROR("We did not get the region we asked for from mmap!\n");
        pthrCurrent->SetLastError(ERROR_INVALID_ADDRESS);
        munmap(pRetVal, MemSize);
        return nullptr;
    }

#if MMAP_ANON_IGNORES_PROTECTION
    if (mprotect(pRetVal, MemSize, PROT_NONE) != 0)
    {
        ERROR("mprotect failed to protect the region!\n");
        pthrCurrent->SetLastError(ERROR_INVALID_ADDRESS);
        munmap(pRetVal, MemSize);
        return nullptr;
    }
#endif  // MMAP_ANON_IGNORES_PROTECTION

    return pRetVal;
}

CoreCLR在從系統申請內存時會使用VirtualAlloc或mmap模擬的VirtualAlloc
申請后會得到一塊尚未完全提交到物理內存的虛擬內存(注意保護模式是PROT_NONE, 表示該塊內存不能讀寫執行, 內核無需設置它的PageTable)
如果你有興趣可以看一下CoreCLR的虛擬內存占用, 工作站GC啟動時就占了1G多, 服務器GC啟動時就占用了20G

之后CoreCLR會根據使用慢慢的把使用的部分提交到物理內存, 流程是

GCToOSInterface::VirtualCommit => ClrVirtualAlloc => CExecutionEngine::ClrVirtualAlloc =>
EEVirtualAlloc => VirtualAlloc

如果是windows, VirtualAlloc是同名的windowsapi, 地址會被顯式指定且頁保護模式為可讀寫(PAGE_READWRITE)
如果是linux或者macosx, VirtualAlloc會調用VIRTUALCommitMemory, 且內部會調用mprotect來設置該頁為可讀寫(PROT_READ|PROT_WRITE)

當GC回收了垃圾對象, 不再需要部分內存時會把內存還給系統, 例如回收小對象后的流程是

gc1 => decommit_ephemeral_segment_pages => decommit_heap_segment_pages => GCToOSInterface::VirtualDecommit

GCToOSInterface::VirtualDecommit的調用流程是

GCToOSInterface::VirtualDecommit => ClrVirtualFree => CExecutionEngine::ClrVirtualFree =>
EEVirtualFree => VirtualFree

如果是windows, VirtualFree是同名的windowsapi, 表示該部分虛擬內存已經不再使用內核可以重置它們的PageTable
如果是linux或者macosx, VirtualFree通過mprotect模擬, 設置該頁的保護模式為PROT_NONE

VirtualFree函數的內容: https://github.com/dotnet/coreclr/blob/release/1.1.0/src/pal/src/map/virtual.cpp#L1291

BOOL
PALAPI
VirtualFree(
        IN LPVOID lpAddress,    /* Address of region. */
        IN SIZE_T dwSize,       /* Size of region. */
        IN DWORD dwFreeType )   /* Operation type. */
{
    BOOL bRetVal = TRUE;
    CPalThread *pthrCurrent;

    PERF_ENTRY(VirtualFree);
    ENTRY("VirtualFree(lpAddress=%p, dwSize=%u, dwFreeType=%#x)\n",
          lpAddress, dwSize, dwFreeType);

    pthrCurrent = InternalGetCurrentThread();
    InternalEnterCriticalSection(pthrCurrent, &virtual_critsec);

    /* Sanity Checks. */
    if ( !lpAddress )
    {
        ERROR( "lpAddress cannot be NULL. You must specify the base address of\
               regions to be de-committed. \n" );
        pthrCurrent->SetLastError( ERROR_INVALID_ADDRESS );
        bRetVal = FALSE;
        goto VirtualFreeExit;
    }

    if ( !( dwFreeType & MEM_RELEASE ) && !(dwFreeType & MEM_DECOMMIT ) )
    {
        ERROR( "dwFreeType must contain one of the following: \
               MEM_RELEASE or MEM_DECOMMIT\n" );
        pthrCurrent->SetLastError( ERROR_INVALID_PARAMETER );
        bRetVal = FALSE;
        goto VirtualFreeExit;
    }
    /* You cannot release and decommit in one call.*/
    if ( dwFreeType & MEM_RELEASE && dwFreeType & MEM_DECOMMIT )
    {
        ERROR( "MEM_RELEASE cannot be combined with MEM_DECOMMIT.\n" );
        bRetVal = FALSE;
        goto VirtualFreeExit;
    }

    if ( dwFreeType & MEM_DECOMMIT )
    {
        UINT_PTR StartBoundary  = 0;
        SIZE_T MemSize        = 0;

        if ( dwSize == 0 )
        {
            ERROR( "dwSize cannot be 0. \n" );
            pthrCurrent->SetLastError( ERROR_INVALID_PARAMETER );
            bRetVal = FALSE;
            goto VirtualFreeExit;
        }
        /* 
         * A two byte range straddling 2 pages caues both pages to be either
         * released or decommitted. So round the dwSize up to the next page 
         * boundary and round the lpAddress down to the next page boundary.
         */
        MemSize = (((UINT_PTR)(dwSize) + ((UINT_PTR)(lpAddress) & VIRTUAL_PAGE_MASK) 
                    + VIRTUAL_PAGE_MASK) & ~VIRTUAL_PAGE_MASK);

        StartBoundary = (UINT_PTR)lpAddress & ~VIRTUAL_PAGE_MASK;

        PCMI pUnCommittedMem;
        pUnCommittedMem = VIRTUALFindRegionInformation( StartBoundary );
        if (!pUnCommittedMem)
        {
            ASSERT( "Unable to locate the region information.\n" );
            pthrCurrent->SetLastError( ERROR_INTERNAL_ERROR );
            bRetVal = FALSE;
            goto VirtualFreeExit;
        }

        TRACE( "Un-committing the following page(s) %d to %d.\n", 
               StartBoundary, MemSize );

        // Explicitly calling mmap instead of mprotect here makes it
        // that much more clear to the operating system that we no
        // longer need these pages.
        if ( mmap( (LPVOID)StartBoundary, MemSize, PROT_NONE,
                   MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0 ) != MAP_FAILED )
        {
#if (MMAP_ANON_IGNORES_PROTECTION)
            if (mprotect((LPVOID) StartBoundary, MemSize, PROT_NONE) != 0)
            {
                ASSERT("mprotect failed to protect the region!\n");
                pthrCurrent->SetLastError(ERROR_INTERNAL_ERROR);
                munmap((LPVOID) StartBoundary, MemSize);
                bRetVal = FALSE;
                goto VirtualFreeExit;
            }
#endif  // MMAP_ANON_IGNORES_PROTECTION

            SIZE_T index = 0;
            SIZE_T nNumOfPagesToChange = 0;

            /* We can now commit this memory by calling VirtualAlloc().*/
            index = (StartBoundary - pUnCommittedMem->startBoundary) / VIRTUAL_PAGE_SIZE;
            
            nNumOfPagesToChange = MemSize / VIRTUAL_PAGE_SIZE;
            VIRTUALSetAllocState( MEM_RESERVE, index, 
                                  nNumOfPagesToChange, pUnCommittedMem );

            goto VirtualFreeExit;
        }
        else
        {
            ASSERT( "mmap() returned an abnormal value.\n" );
            bRetVal = FALSE;
            pthrCurrent->SetLastError( ERROR_INTERNAL_ERROR );
            goto VirtualFreeExit;
        }
    }
    
    if ( dwFreeType & MEM_RELEASE )
    {
        PCMI pMemoryToBeReleased = 
            VIRTUALFindRegionInformation( (UINT_PTR)lpAddress );
        
        if ( !pMemoryToBeReleased )
        {
            ERROR( "lpAddress must be the base address returned by VirtualAlloc.\n" );
            pthrCurrent->SetLastError( ERROR_INVALID_ADDRESS );
            bRetVal = FALSE;
            goto VirtualFreeExit;
        }
        if ( dwSize != 0 )
        {
            ERROR( "dwSize must be 0 if you are releasing the memory.\n" );
            pthrCurrent->SetLastError( ERROR_INVALID_PARAMETER );
            bRetVal = FALSE;
            goto VirtualFreeExit;
        }

        TRACE( "Releasing the following memory %d to %d.\n", 
               pMemoryToBeReleased->startBoundary, pMemoryToBeReleased->memSize );

        if ( munmap( (LPVOID)pMemoryToBeReleased->startBoundary, 
                     pMemoryToBeReleased->memSize ) == 0 )
        {
            if ( VIRTUALReleaseMemory( pMemoryToBeReleased ) == FALSE )
            {
                ASSERT( "Unable to remove the PCMI entry from the list.\n" );
                pthrCurrent->SetLastError( ERROR_INTERNAL_ERROR );
                bRetVal = FALSE;
                goto VirtualFreeExit;
            }
            pMemoryToBeReleased = NULL;
        }
        else
        {
            ASSERT( "Unable to unmap the memory, munmap() returned an abnormal value.\n" );
            pthrCurrent->SetLastError( ERROR_INTERNAL_ERROR );
            bRetVal = FALSE;
            goto VirtualFreeExit;
        }
    }

VirtualFreeExit:

    LogVaOperation(
        (dwFreeType & MEM_DECOMMIT) ? VirtualMemoryLogging::VirtualOperation::Decommit 
                                    : VirtualMemoryLogging::VirtualOperation::Release,
        lpAddress,
        dwSize,
        dwFreeType,
        0,
        NULL,
        bRetVal);

    InternalLeaveCriticalSection(pthrCurrent, &virtual_critsec);
    LOGEXIT( "VirtualFree returning %s.\n", bRetVal == TRUE ? "TRUE" : "FALSE" );
    PERF_EXIT(VirtualFree);
    return bRetVal;
}

我們可以看出, CoreCLR管理系統內存的方式比較底層
在windows上使用了VirtualAlloc和VirtualFree
在linux上使用了mmap和mprotect
而不是使用傳統的malloc和new
這樣會帶來更好的性能但同時增加了移植到其他平台的成本

動態調試GC分配對象內存的過程

要深入學習CoreCLR光看代碼是很難做到的, 比如這次大部分來源的gc.cpp有接近37000行的代碼,
為了很好的了解CoreCLR的工作原理這次我自己編譯了CoreCLR並在本地用lldb進行了調試, 這里我分享一下編譯和調試的過程
這里我使用了ubuntu 16.04 LTS, 因為linux上部署編譯環境比windows要簡單很多

下載CORECLR:

git clone https://github.com/dotnet/coreclr.git

切換到你正在使用的版本, 請務必切換不要直接去編譯master分支

git checkout v1.1.0

參考微軟的幫助安裝好需要的包

# https://github.com/dotnet/coreclr/blob/master/Documentation/building/linux-instructions.md
echo "deb http://llvm.org/apt/trusty/ llvm-toolchain-trusty-3.6 main" | sudo tee /etc/apt/sources.list.d/llvm.list
wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install cmake llvm-3.5 clang-3.5 lldb-3.6 lldb-3.6-dev libunwind8 libunwind8-dev gettext libicu-dev liblttng-ust-dev libcurl4-openssl-dev libssl-dev uuid-dev
cd coreclr
./build.sh

執行build.sh會從微軟的網站下載一些東西, 如果很長時間都下載不成功你應該考慮掛點什么東西
編譯過程需要幾十分鍾, 完成以后可以在coreclr/bin/Product/Linux.x64.Debug下看到編譯結果

完成以后用dotnet創建一個新的可執行項目, 在project.json中添加runtimes節

{
	"runtimes": {
		"ubuntu.16.04-x64": {}
	}
}

Program.cs的代碼可以隨意寫, 想測哪部分就寫哪部分的代碼，我這里寫的是多線程分配內存然后釋放的代碼

using System;
using System.Threading;
using System.Collections.Generic;

namespace ConsoleApplication
{
    public class A
    {
        public int a;
        public byte[] padding;
    }
    
    public class Program
    {
        public static void ThreadBody()
        {
            Thread.Sleep(1000);
            var list = new List<A>();
            for (long x = 0; x < 1000000; ++x) {
                list.Add(new A());
            }
        }
        
        public static void Main(string[] args)
        {
            var threads = new List<Thread>();
            for (var x = 0; x < 100; ++x)
            {
                var thread = new Thread(ThreadBody);
                threads.Add(thread);
                thread.Start();
            }
            foreach (var thread in threads)
            {
                thread.Join();
            }
            GC.Collect();
            Console.WriteLine("memory released");
            Console.ReadLine();
        }
    }
}

寫完以后編譯並發布

dotnet restore
dotnet publish

發布后bin/Debug/netcoreapp1.1/ubuntu16.04-x64/publish會多出最終發布的文件
把剛才CoreCLR編譯出來的coreclr/bin/Product/Linux.x64.Debug下的所有文件復制到publish目錄下, 並覆蓋原有文件
微軟官方的調試文檔可見 https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/building/debugging-instructions.md

使用lldb啟動進程, 這里我項目名稱是coreapp所以publish下的可執行文件名稱也是coreapp

lldb-3.6 ./coreapp

啟動進程后可以打命令來調試, 需要中斷(暫停)程序運行可以按下ctrl+c

這張圖中的命令

b allocate_small
給函數下斷點, 這里的allocate_small雖然全名是SVR::gc_heap::allocate_small或WKS::gc_heap::allocate_small
但是lldb允許用短名稱下斷點, 碰到多個符合的函數會一並截取

r
運行程序, 之前在pending中的斷點如果在程序運行后可以確定內存位置則實際的添加斷點

bt
查看當前的堆棧調用樹, 可以看當前被調用的函數的來源是哪些函數

這張圖中的命令

n
步過, 遇到函數不會進去, 如果需要步進可以用s
另外步過匯編和步進匯編是ni和si

fr v
查看當前堆棧幀中的變量
也就是傳入的參數和本地變量

p acontext->alloc_ptr
p *acontext
打印全局或本地變量的值, 這個命令是調試中必用的命令, 不僅支持查看變量還支持計算表達式

這張圖中的命令

c
繼續中斷進程直到退出或下一個斷點

br del
刪除之前設置的所有斷點

這張圖顯示的是線程列表中的第一個線程的分配上下文內容, 0x168可以通過p &((Thread*)nullptr)->m_Link計算得出(就是offsetof)
這張圖中的命令

me re -s4 -fx -c12 0x00007fff5c006f00
讀取0x00007fff5c006f00開始的內存, 單位是4byte, 表現形式是hex, 顯示12個單位

lldb不僅能調試CoreCLR自身的代碼
還能用來調試用戶寫的程序代碼, 需要微軟的SOS插件支持
詳細可以看微軟的官方文檔 https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/building/debugging-instructions.md

最后附上在這次分析中我常用的lldb命令
學習lldb可以查看官方的Tutorial和GDB and LLDB command examples

plugin load libsosplugin.so
process launch -s
process handle -s false SIGUSR1 SIGUSR2
breakpoint set -n LoadLibraryExW
c
sos DumpHeap

bpmd coreapp.dll ConsoleApplication.Program.Main

p g_pGCHeap
p n_heaps
p g_heaps[0]
p *WKS::gc_heap::ephemeral_heap_segment
p g_heaps[0]->ephemeral_heap_segment

p s_pThreadStore->m_ThreadList
p &((Thread*)nullptr)->m_Link
p ((Thread*)((char*)s_pThreadStore->m_ThreadList.m_link.m_pNext-0x168))->m_alloc_context
p ((Thread*)((char*)s_pThreadStore->m_ThreadList.m_link.m_pNext->m_pNext-0x168))->m_alloc_context

me re -s4 -fx -c100 0x00007fff5c027fe0

p generation_table
p generation_table[0]
p generation_table[2].free_list_allocator
p generation_table[2].free_list_allocator.first_bucket.head
p (generation_table[2].free_list_allocator.buckets)->head
p (generation_table[2].free_list_allocator.buckets+1)->head
p *generation_table[2].free_list_allocator.buckets

wa s v generation_table[2].free_list_allocator.first_bucket.head
me re -s8 -fx -c3 0x00007fff5bfff018

參考鏈接

因為gc的代碼實在龐大並且注釋少, 這次的分析我不僅在官方的github上提問了還動用到lldb才能做到初步的理解
下一篇我將講解GC內存回收器的內部實現, 可能需要的時間更長, 請耐心等待吧

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 linux內存管理源碼分析 - 頁框分配器 linux內存源碼分析 - SLUB分配器概述 Linux內存管理之bootmem分配器 Linux內存管理 (5)slab分配器 FMallocBinned2內存分配器聊聊內存分配器（Memory Allocator） Linux內核最新的連續內存分配器(CMA)——避免預留大塊內存【轉】 CoreCLR源碼探索(二) new是什么 CoreCLR源碼探索(一) Object是什么 Golang源碼探索(三) GC的實現原理