詳解Go中內存分配源碼實現

本文轉載自查看原文 2021-01-30 16:01 660 go

轉載請聲明出處哦~，本篇文章發布於luozhiyun的博客：https://www.luozhiyun.com

本文使用的go的源碼15.7

介紹

Go 語言的內存分配器就借鑒了 TCMalloc 的設計實現高速的內存分配，它的核心理念是使用多級緩存將對象根據大小分類，並按照類別實施不同的分配策略。TCMalloc 相關的信息可以看這里：http://goog-perftools.sourceforge.net/doc/tcmalloc.html。

即如果要分配的對象是個小對象（<= 32k），在每個線程中都會有一個無鎖的小對象緩存，可以直接高效的無鎖的方式進行分配；

如下：對象被分到不同的內存大小組中的鏈表中。

Group 37

如果是個大對象（>32k），那么頁堆進行分配。如下：

Large Object Allocation

雖然go內存分配器最初是基於tcmalloc的，但是現在已經有了很大的不同。所以上面的一些結構會有些許變化，下面再慢慢絮叨。

因為內存分配的源碼比較復雜，為了方便大家調試，所以在進行源碼分析之前，先看看是如何斷點匯編來進行調試的。

斷點調試匯編

目前Go語言支持GDB、LLDB和Delve幾種調試器。只有Delve是專門為Go語言設計開發的調試工具。而且Delve本身也是采用Go語言開發，對Windows平台也提供了一樣的支持。本節我們基於Delve簡單解釋如何調試Go匯編程序。項目地址：https://github.com/go-delve/delve

安裝：

go get github.com/go-delve/delve/cmd/dlv

首先編寫一個test.go的一個例子：

package main

import "fmt"

type A struct {
	test string
}
func main() {
	a := new(A)
	fmt.Println(a)
}

然后命令行進入包所在目錄，然后輸入dlv debug命令進入調試：

PS C:\document\code\test_go\src> dlv debug
Type 'help' for list of commands.

然后可以使用break命令在main包的main方法上設置一個斷點：

(dlv) break main.main
Breakpoint 1 set at 0x4bd30a for main.main() c:/document/code/test_go/src/test.go:8

通過breakpoints查看已經設置的所有斷點：

(dlv) breakpoints
Breakpoint runtime-fatal-throw at 0x4377e0 for runtime.fatalthrow() c:/software/go/src/runtime/panic.go:1162 (0)
Breakpoint unrecovered-panic at 0x437860 for runtime.fatalpanic() c:/software/go/src/runtime/panic.go:1189 (0)
        print runtime.curg._panic.arg
Breakpoint 1 at 0x4bd30a for main.main() c:/document/code/test_go/src/test.go:8 (0)

通過continue命令讓程序運行到下一個斷點處：

(dlv) continue
> main.main() c:/document/code/test_go/src/test.go:8 (hits goroutine(1):1 total:1) (PC: 0x4bd30a)
     3: import "fmt"
     4:
     5: type A struct {
     6:         test string
     7: }
=>   8: func main() {
     9:         a := new(A)
    10:         fmt.Println(a)
    11: }
    12:
    13:

通過disassemble反匯編命令查看main函數對應的匯編代碼：

(dlv) disassemble
TEXT main.main(SB) C:/document/code/test_go/src/test.go
        test.go:8       0x4bd2f0        65488b0c2528000000      mov rcx, qword ptr gs:[0x28]
        test.go:8       0x4bd2f9        488b8900000000          mov rcx, qword ptr [rcx]
        test.go:8       0x4bd300        483b6110                cmp rsp, qword ptr [rcx+0x10]
        test.go:8       0x4bd304        0f8697000000            jbe 0x4bd3a1
=>      test.go:8       0x4bd30a*       4883ec78                sub rsp, 0x78
        test.go:8       0x4bd30e        48896c2470              mov qword ptr [rsp+0x70], rbp
        test.go:8       0x4bd313        488d6c2470              lea rbp, ptr [rsp+0x70]
        test.go:9       0x4bd318        488d0581860100          lea rax, ptr [__image_base__+874912]
        test.go:9       0x4bd31f        48890424                mov qword ptr [rsp], rax
        test.go:9       0x4bd323        e8e800f5ff              call $runtime.newobject
        test.go:9       0x4bd328        488b442408              mov rax, qword ptr [rsp+0x8]
        test.go:9       0x4bd32d        4889442430              mov qword ptr [rsp+0x30], rax
        test.go:10      0x4bd332        4889442440              mov qword ptr [rsp+0x40], rax
        test.go:10      0x4bd337        0f57c0                  xorps xmm0, xmm0
        test.go:10      0x4bd33a        0f11442448              movups xmmword ptr [rsp+0x48], xmm0
        test.go:10      0x4bd33f        488d442448              lea rax, ptr [rsp+0x48]
        test.go:10      0x4bd344        4889442438              mov qword ptr [rsp+0x38], rax
        test.go:10      0x4bd349        8400                    test byte ptr [rax], al
        test.go:10      0x4bd34b        488b4c2440              mov rcx, qword ptr [rsp+0x40]
        test.go:10      0x4bd350        488d15099f0000          lea rdx, ptr [__image_base__+815712]
        test.go:10      0x4bd357        4889542448              mov qword ptr [rsp+0x48], rdx
        test.go:10      0x4bd35c        48894c2450              mov qword ptr [rsp+0x50], rcx
        test.go:10      0x4bd361        8400                    test byte ptr [rax], al
        test.go:10      0x4bd363        eb00                    jmp 0x4bd365
        test.go:10      0x4bd365        4889442458              mov qword ptr [rsp+0x58], rax
        test.go:10      0x4bd36a        48c744246001000000      mov qword ptr [rsp+0x60], 0x1
        test.go:10      0x4bd373        48c744246801000000      mov qword ptr [rsp+0x68], 0x1
        test.go:10      0x4bd37c        48890424                mov qword ptr [rsp], rax
        test.go:10      0x4bd380        48c744240801000000      mov qword ptr [rsp+0x8], 0x1
        test.go:10      0x4bd389        48c744241001000000      mov qword ptr [rsp+0x10], 0x1
        test.go:10      0x4bd392        e869a0ffff              call $fmt.Println
        test.go:11      0x4bd397        488b6c2470              mov rbp, qword ptr [rsp+0x70]
        test.go:11      0x4bd39c        4883c478                add rsp, 0x78
        test.go:11      0x4bd3a0        c3                      ret
        test.go:8       0x4bd3a1        e82a50faff              call $runtime.morestack_noctxt
        .:0             0x4bd3a6        e945ffffff              jmp $main.main

現在我們可以使用break斷點到runtime.newobject函數的調用上：

(dlv) break runtime.newobject
Breakpoint 2 set at 0x40d426 for runtime.newobject() c:/software/go/src/runtime/malloc.go:1164

輸入continue跳到斷點的位置：

(dlv) continue
> runtime.newobject() c:/software/go/src/runtime/malloc.go:1164 (hits goroutine(1):1 total:1) (PC: 0x40d426)
Warning: debugging optimized function
  1159: }
  1160:
  1161: // implementation of new builtin
  1162: // compiler (both frontend and SSA backend) knows the signature
  1163: // of this function
=>1164: func newobject(typ *_type) unsafe.Pointer {
  1165:         return mallocgc(typ.size, typ, true)
  1166: }
  1167:
  1168: //go:linkname reflect_unsafe_New reflect.unsafe_New
  1169: func reflect_unsafe_New(typ *_type) unsafe.Pointer {

print命令來查看typ的數據：

(dlv) print typ
*runtime._type {size: 16, ptrdata: 8, hash: 875453117, tflag: tflagUncommon|tflagExtraStar|tflagNamed (7), align: 8, fieldAlign: 8, kind: 25, equal: runtime.strequal, gcdata: *1, str: 5418, ptrToThis: 37472}

可以看到這里打印的size是16bytes，因為我們A結構體里面就一個string類型的field。

進入到mallocgc方法后，通過args和locals命令查看函數的參數和局部變量：

(dlv) args
size = (unreadable could not find loclist entry at 0x8b40 for address 0x40ca73)
typ = (*runtime._type)(0x4d59a0)
needzero = true
~r3 = (unreadable empty OP stack)
(dlv) locals
(no locals)

各個對象入口

我們根據匯編可以判斷，所有的函數入口都是runtime.mallocgc，但是下面兩個對象需要注意一下：

int64對象

runtime.convT64

func convT64(val uint64) (x unsafe.Pointer) {
	if val < uint64(len(staticuint64s)) {
		x = unsafe.Pointer(&staticuint64s[val])
	} else {
		x = mallocgc(8, uint64Type, false)
		*(*uint64)(x) = val
	}
	return
}

這段代碼表示如果一個int64類型的值小於256，直接十三姨的是緩存值，那么這個值不會進行內存分配。

string對象

runtime.convTstring

func convTstring(val string) (x unsafe.Pointer) {
	if val == "" {
		x = unsafe.Pointer(&zeroVal[0])
	} else {
		x = mallocgc(unsafe.Sizeof(val), stringType, true)
		*(*string)(x) = val
	}
	return
}

由這段代碼顯示，如果是創建一個為”“的string對象，那么會直接返回一個固定的地址值，不會進行內存分配。

調試用例

大家在調試的時候也可以使用下面的例子來進行調試，因為go里面的對象分配是分為大對象、小對象、微對象的，所以下面准備了三個方法分別對應三種對象的創建時的調試。

type smallobj struct {
	arr [1 << 10]byte
}

type largeobj struct {
	arr [1 << 26]byte
}

func tiny()   {
	y := 100000
	fmt.Println(y)
}

func large() {
	large := largeobj{}
	println(&large)
}

func small() {
	small := smallobj{}
	print(&small)
}

func main() {
	//tiny()
	//small()
	//large() 
}

分析

分配器的組件

內存分配是由內存分配器完成，分配器由3種組件構成：runtime.mspan、runtime.mcache、runtime.mcentral、runtime.mheap。

runtime.mspan

type mspan struct {
	// 上一個節點
	next *mspan     
	// 下一個節點
	prev *mspan      
	// span集合
	list *mSpanList  
	// span開始的地址值
	startAddr uintptr  
	// span管理的頁數
	npages    uintptr  
 
	// Object n starts at address n*elemsize + (start << pageShift).
	// 空閑節點的索引
	freeindex uintptr 
	// span中存放的對象數量
	nelems uintptr  
 
	// 用於快速查找內存中未被使用的內存
	allocCache uint64 
  // 用於計算mspan管理了多少內存
  elemsize    uintptr
  // span的結束地址值
  limit       uintptr
  
	...
}

runtime.mspan是內存管理器里面的最小粒度單元，所有的對象都是被管理在mspan下面。

mspan是一個鏈表，有上下指針；

npages代表mspan管理的堆頁的數量；

freeindex是空閑對象的索引；

nelems代表這個mspan中可以存放多少對象，等於(npages * pageSize)/elemsize；

allocCache用於快速的查找未被使用的內存地址；

elemsize表示一個對象會占用多個個bytes，等於class_to_size[sizeclass]，需要注意的是sizeclass每次獲取的時候會sizeclass方法，將sizeclass>>1；

limit表示span結束的地址值，等於startAddr+ npages*pageSize；

實例圖如下：

mcache

圖中alloc是一個擁有137個元素的mspan數組，mspan數組管理數個page大小的內存，每個page是8k，page的數量由spanclass規格決定。

runtime.mcache

type mcache struct { 
	...
	// 申請小對象的起始地址
	tiny             uintptr
	// 從起始地址tiny開始的偏移量
	tinyoffset       uintptr
	// tiny對象分配的數量
	local_tinyallocs uintptr // number of tiny allocs not counted in other stats
	// mspan對象集合，numSpanClasses=134
	alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass
	...
}

runtime.mcache是綁在並發模型GPM的P上，在分配微對象和小對象的時候會先去runtime.mcache中獲取，每一個處理器都會被分配一個線程緩存runtime.mcache，因此從runtime.mcache進行分配時無需加鎖。

在runtime.mcache中有一個alloc數組，是runtime.mspan的集合，runtime.mspan是 Go 語言內存管理的基本單元。對於[16B,32KB]的對象會使用這部分span進行內存分配，所以所有在這區間大小的對象都會從alloc這個數組里尋找，下面會分析到。

runtime.mcentral

type mcentral struct { 
	//spanClass Id
	spanclass spanClass
	// 空閑的span列表
	partial [2]spanSet // list of spans with a free object
	// 已經被使用的span列表
	full    [2]spanSet // list of spans with no free objects

	//分配mspan的累積計數
	nmalloc uint64
}

當runtime.mcache中空間不足的時候，會去runtime.mcentral中申請對應規格的mspan。獲取mspan的時候會從partial列表和full列表中獲取，獲取的時候會使用無鎖的方式獲取。

在runtime.mcentral中，有spanclass標識，spanclass表示這個mcentral的類型，下面我們會看到，在分配[16B,32KB]大小對象的時候，會將對象的大小分成67組：

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}

所以runtime.mcentral只負責一種spanclass規格類型。

runtime.mcentral的數據會由兩個spanSet托管，partial負責空閑的列表，full負責已被使用的列表。

type headTailIndex uint64

type spanSet struct { 
	// lock
	spineLock mutex
	// 數據塊的指針
	spine     unsafe.Pointer // *[N]*spanSetBlock, accessed atomically
	// len
	spineLen  uintptr        // Spine array length, accessed atomically
	// cap
	spineCap  uintptr        // Spine array cap, accessed under lock

	// 頭尾的指針，前32位是頭指針，后32位是尾指針
	index headTailIndex
}

spanSet這個數據結構里面有一個由index組成的頭尾指針，pop數據的時候會從頭獲取，push數據的時候從tail放入，spine相當於數據塊的指針，通過head和tail的位置可以算出每個數據塊的具體位置，數據塊由spanSetBlock表示：

const spanSetBlockEntries = 512
type spanSetBlock struct {
	...
	spans [spanSetBlockEntries]*mspan
}

spanSetBlock是一個存放mspan的數據塊，里面會包含一個存放512個mspan的數據指針。所以mcentral的總體數據結構如下：

mcentral

runtime.mheap

type mheap struct { 
	lock      mutex
	pages     pageAlloc // page allocation data structure 
	
	//arenas數組集合,一個二維數組
	arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

	//各個規格的mcentral集合
	central [numSpanClasses]struct {
		mcentral mcentral
		pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
	}
	...
}

對於runtime.mheap需要關注central和arenas。central是各個規格的mcentral集合，在初始化的時候會通過遍歷class_to_size來進行創建；arenas是一個二維數組，用來管理內存空間。arenas由多個runtime.heapArena組成，每個單元都會管理 64MB 的內存空間：

const (
	pageSize             = 8192                       // 8KB
	heapArenaBytes       = 67108864                   // 64MB 
	pagesPerArena        = heapArenaBytes / pageSize  // 8192
)

type heapArena struct {
	bitmap [heapArenaBitmapBytes]byte
	spans [pagesPerArena]*mspan
	pageInUse [pagesPerArena / 8]uint8
	pageMarks [pagesPerArena / 8]uint8
	zeroedBase uintptr
}

需要注意的是，上面的heapArenaBytes代表的64M只是在除windows以外的64 位機器才會顯示，在windows機器上顯示的是4MB。具體的可以看下面的官方注釋：

	//       Platform  Addr bits  Arena size  L1 entries   L2 entries
	// --------------  ---------  ----------  ----------  -----------
	//       */64-bit         48        64MB           1    4M (32MB)
	// windows/64-bit         48         4MB          64    1M  (8MB)
	//       */32-bit         32         4MB           1  1024  (4KB)
	//     */mips(le)         31         4MB           1   512  (2KB)

L1 entries、L2 entries分別代表的是runtime.mheap中arenas一維、二維的值。

mheap

給對象分配內存

我們通過對源碼的反編譯可以知道，堆上所有的對象都會通過調用runtime.newobject函數分配內存，該函數會調用runtime.mallocgc:

//創建一個新的對象
func newobject(typ *_type) unsafe.Pointer {
    //size表示該對象的大小
	return mallocgc(typ.size, typ, true)
}

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { 
	...  
	dataSize := size
	// 獲取mcache，用於處理微對象和小對象的分配
	c := gomcache()
	var x unsafe.Pointer
	// 表示對象是否包含指針，true表示對象里沒有指針
	noscan := typ == nil || typ.ptrdata == 0
	// maxSmallSize=32768 32k
	if size <= maxSmallSize {
		// maxTinySize= 16 bytes 
		if noscan && size < maxTinySize {
			...
		} else {
			...
		}
		// 大於 32 Kb 的內存分配,通過 mheap 分配
	} else {
		...
	} 
	... 
	return x
}

通過mallocgc的代碼可以知道，mallocgc在分配內存的時候，會按照對象的大小分為3檔來進行分配：

小於16bytes的小對象；
在16bytes與32k之間的微對象；
大於 32 Kb的大對象；

大對象分配

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { 
	...  
	var s *mspan
	shouldhelpgc = true
	systemstack(func() {
		s = largeAlloc(size, needzero, noscan)
	})
	s.freeindex = 1
	s.allocCount = 1
	x = unsafe.Pointer(s.base())
	size = s.elemsize
	... 
	return x
}

從上面我們可以看到分配大於32KB的空間時，直接使用largeAlloc來分配一個mspan。

func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
	// _PageSize=8k,也就是表明對象太大，溢出
	if size+_PageSize < size {
		throw("out of memory")
	}
	// _PageShift==13，計算需要分配的頁數
	npages := size >> _PageShift
	// 如果不是整數，多出來一些，需要加1
	if size&_PageMask != 0 {
		npages++
	} 
	...
	// 從堆上分配
	s := mheap_.alloc(npages, makeSpanClass(0, noscan), needzero)
	if s == nil {
		throw("out of memory")
	}
	...
	return s
}

在分配內存的時候是按頁來進行分配的，每個頁的大小是_PageSize（8K），然后需要根據傳入的size來判斷需要分多少頁，最后調用alloc從堆上分配。

func (h *mheap) alloc(npages uintptr, spanclass spanClass, needzero bool) *mspan {
	var s *mspan
	systemstack(func() { 
		if h.sweepdone == 0 {
			// 回收一部分內存
			h.reclaim(npages)
		}
		// 進行內存分配
		s = h.allocSpan(npages, false, spanclass, &memstats.heap_inuse)
	}) 
	...
	return s
}

繼續看allocSpan的實現：

const pageCachePages = 8 * unsafe.Sizeof(pageCache{}.cache)

func (h *mheap) allocSpan(npages uintptr, manual bool, spanclass spanClass, sysStat *uint64) (s *mspan) {
	// Function-global state.
	gp := getg()
	base, scav := uintptr(0), uintptr(0)
 
	pp := gp.m.p.ptr()
	// 申請的內存比較小,嘗試從pcache申請內存
	if pp != nil && npages < pageCachePages/4 {
		c := &pp.pcache
 
		if c.empty() {
			lock(&h.lock)
			*c = h.pages.allocToCache()
			unlock(&h.lock)
		} 

		base, scav = c.alloc(npages)
		if base != 0 {
			s = h.tryAllocMSpan()

			if s != nil && gcBlackenEnabled == 0 && (manual || spanclass.sizeclass() != 0) {
				goto HaveSpan
			} 
		}
	} 
	lock(&h.lock)
	// 內存比較大或者線程的頁緩存中內存不足，從mheap的pages上獲取內存
	if base == 0 { 
		base, scav = h.pages.alloc(npages)
		// 內存也不夠，那么進行擴容
		if base == 0 {
			if !h.grow(npages) {
				unlock(&h.lock)
				return nil
			}
			// 重新申請內存
			base, scav = h.pages.alloc(npages)
			// 內存不足，拋出異常
			if base == 0 {
				throw("grew heap, but no adequate free space found")
			}
		}
	}
	if s == nil { 
		// 分配一個mspan對象
		s = h.allocMSpanLocked()
	}
 
	unlock(&h.lock)

HaveSpan: 
	// 設置參數初始化
	s.init(base, npages) 
	...
	// 建立mheap與mspan之間的聯系
	h.setSpans(s.base(), npages, s)
	...
	return s
}

這里會根據需要分配的內存大小再判斷一次：

如果要分配的頁數小於pageCachePages/4=64/4=16頁，那么就嘗試從pcache申請內存；
如果申請的內存比較大或者線程的頁緩存中內存不足，會通過runtime.pageAlloc.alloc從頁堆分配內存；
如果頁堆上內存不足，那么就mheap的grow方法從系統上申請內存，然后再調用pageAlloc的alloc分配內存；

下面來看看grow的向操作系統申請內存：

func (h *mheap) grow(npage uintptr) bool {
	// We must grow the heap in whole palloc chunks.
	ask := alignUp(npage, pallocChunkPages) * pageSize

	totalGrowth := uintptr(0)
	nBase := alignUp(h.curArena.base+ask, physPageSize)
	// 內存不夠則調用sysAlloc申請內存
	if nBase > h.curArena.end { 
		av, asize := h.sysAlloc(ask)
		if av == nil {
			print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
			return false
		}
		// 重新設置curArena的值
		if uintptr(av) == h.curArena.end { 
			h.curArena.end = uintptr(av) + asize
		} else { 
			if size := h.curArena.end - h.curArena.base; size != 0 {
				h.pages.grow(h.curArena.base, size)
				totalGrowth += size
			} 
			h.curArena.base = uintptr(av)
			h.curArena.end = uintptr(av) + asize
		} 
		nBase = alignUp(h.curArena.base+ask, physPageSize)
	} 
	...
	return true
}

grow會通過curArena的end值來判斷是不是需要從系統申請內存；如果end小於nBase那么會調用runtime.mheap.sysAlloc方法從操作系統中申請更多的內存；

func (h *mheap) sysAlloc(n uintptr) (v unsafe.Pointer, size uintptr) {
	n = alignUp(n, heapArenaBytes)
 
	// 在預先保留的內存中申請一塊可以使用的空間
	v = h.arena.alloc(n, heapArenaBytes, &memstats.heap_sys)
	if v != nil {
		size = n
		goto mapped
	} 
	// 根據頁堆的arenaHints在目標地址上嘗試擴容
	for h.arenaHints != nil {
		hint := h.arenaHints
		p := hint.addr
		if hint.down {
			p -= n
		}
		if p+n < p {
			// We can't use this, so don't ask.
			v = nil
		} else if arenaIndex(p+n-1) >= 1<<arenaBits {
			// Outside addressable heap. Can't use.
			v = nil
		} else {
			// 從操作系統中申請內存
			v = sysReserve(unsafe.Pointer(p), n)
		}
		if p == uintptr(v) {
			// Success. Update the hint.
			if !hint.down {
				p += n
			}
			hint.addr = p
			size = n
			break
		} 
		if v != nil {
			sysFree(v, n, nil)
		}
		h.arenaHints = hint.next
		h.arenaHintAlloc.free(unsafe.Pointer(hint))
	}  
	...
	// 將內存由Reserved轉為Prepared
	sysMap(v, size, &memstats.heap_sys)

mapped:
	// Create arena metadata.
	// 初始化一個heapArena來管理剛剛申請的內存
	for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
		l2 := h.arenas[ri.l1()]
		if l2 == nil { 
			l2 = (*[1 << arenaL2Bits]*heapArena)(persistentalloc(unsafe.Sizeof(*l2), sys.PtrSize, nil))
			if l2 == nil {
				throw("out of memory allocating heap arena map")
			}
			atomic.StorepNoWB(unsafe.Pointer(&h.arenas[ri.l1()]), unsafe.Pointer(l2))
		}  
		var r *heapArena
		r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
		...  
		// 將創建heapArena放入到arenas列表中
		h.allArenas = h.allArenas[:len(h.allArenas)+1]
		h.allArenas[len(h.allArenas)-1] = ri
		atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
	}
	return
}

sysAlloc方法會調用runtime.linearAlloc.alloc預先保留的內存中申請一塊可以使用的空間；如果沒有會調用sysReserve方法會從操作系統中申請內存；最后初始化一個heapArena來管理剛剛申請的內存，然后將創建heapArena放入到arenas列表中。

至此，大對象的分配流程至此結束。

小對象分配

對於介於16bytes~32K的對象分配如下：

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
	...
	dataSize := size
	// 獲取mcache，用於處理微對象和小對象的分配
	c := gomcache()
	var x unsafe.Pointer
	// 表示對象是否包含指針，true表示對象里沒有指針
	noscan := typ == nil || typ.ptrdata == 0
	// maxSmallSize=32768 32k
	if size <= maxSmallSize {
		// maxTinySize= 16 bytes 
		if noscan && size < maxTinySize { 
			...
		} else {
			var sizeclass uint8
			//計算 sizeclass
			// smallSizeMax=1024
			if size <= smallSizeMax-8 {
				// smallSizeDiv=8
				sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
			} else {
				// largeSizeDiv=128,smallSizeMax = 1024
				sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
			}
			size = uintptr(class_to_size[sizeclass])
			spc := makeSpanClass(sizeclass, noscan)
			span := c.alloc[spc]
			//從對應的 span 里面分配一個 object 
			v := nextFreeFast(span)
			if v == 0 {
				// mcache不夠用了，則從 mcentral 申請內存到 mcache
				v, span, shouldhelpgc = c.nextFree(spc)
			}
			x = unsafe.Pointer(v)
			if needzero && span.needzero != 0 {
				memclrNoHeapPointers(unsafe.Pointer(v), size)
			}
		} 
		...
	}  
	...
	return x
}

首先會先計算sizeclass 大小，計算 sizeclass 是通過預先定義兩個數組：size_to_class8 和 size_to_class128。小於 1024 - 8 = 1016 （smallSizeMax=1024），使用 size_to_class8，否則使用數組 size_to_class128。

舉個例子，比如要分配 20 byte 的內存，那么sizeclass = size_to_calss8[(20+7)/8] = size_to_class8[3] = 3。然后通過class_to_size[3]獲取到對應的值32，表示應該要分配32bytes的內存值。

接着會從alloc數組中獲取一個span的指針，通過調用nextFreeFast嘗試從mcache中獲取內存，如果mcache不夠用了，則嘗試調用nextFree從 mcentral 申請內存到 mcache。

下面看看nextFreeFast：

func nextFreeFast(s *mspan) gclinkptr {
    // 獲取allocCache二進制中0的個數
	theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
	if theBit < 64 {
		result := s.freeindex + uintptr(theBit)
		if result < s.nelems {
			freeidx := result + 1
			if freeidx%64 == 0 && freeidx != s.nelems {
				return 0
			}
			s.allocCache >>= uint(theBit + 1)
			s.freeindex = freeidx
			s.allocCount++
			return gclinkptr(result*s.elemsize + s.base())
		}
	}
	return 0
}

allocCache在初始化的時候會初始化成^uint64(0)，換算成二進制，如果為0則表示被占用，通過allocCache可以快速的定位待分配的空間：

allocCache

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
	s = c.alloc[spc]
	shouldhelpgc = false
	// 當前span中找到合適的index索引
	freeIndex := s.nextFreeIndex()
	// 當前span已經滿了
	if freeIndex == s.nelems { 
		if uintptr(s.allocCount) != s.nelems {
			println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
			throw("s.allocCount != s.nelems && freeIndex == s.nelems")
		}
		// 從 mcentral 中獲取可用的span，並替換掉當前 mcache里面的span
		c.refill(spc)
		shouldhelpgc = true
		s = c.alloc[spc]
		// 再次到新的span里面查找合適的index
		freeIndex = s.nextFreeIndex()
	}

	if freeIndex >= s.nelems {
		throw("freeIndex is not valid")
	}
	// 計算出來內存地址，並更新span的屬性
	v = gclinkptr(freeIndex*s.elemsize + s.base())
	s.allocCount++
	if uintptr(s.allocCount) > s.nelems {
		println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
		throw("s.allocCount > s.nelems")
	}
	return
}

nextFree中會判斷當前span是不是已經滿了，如果滿了就調用refill方法從 mcentral 中獲取可用的span，並替換掉當前 mcache里面的span。

func (c *mcache) refill(spc spanClass) { 
	s := c.alloc[spc]
	...
	s = mheap_.central[spc].mcentral.cacheSpan()
	if s == nil {
		throw("out of memory")
	} 
	...
	c.alloc[spc] = s
}

Refill 根據指定的sizeclass獲取對應的span，並作為 mcache的新的sizeclass對應的span。

func (c *mcentral) cacheSpan() *mspan {
	...
	sg := mheap_.sweepgen
	spanBudget := 100

	var s *mspan
 
	// 從清理過的、包含空閑空間的spanSet結構中查找可以使用的內存管理單元
	if s = c.partialSwept(sg).pop(); s != nil {
		goto havespan
	} 
	for ; spanBudget >= 0; spanBudget-- {
		// 從未被清理過的、有空閑對象的spanSet查找可用的span
		s = c.partialUnswept(sg).pop()
		if s == nil {
			break
		}
		if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
			// 找到要回收的span，觸發sweep進行清理
			s.sweep(true)
			goto havespan
		}
	}
	for ; spanBudget >= 0; spanBudget-- {
		// 獲取未被清理的、不包含空閑空間的spanSet查找可用的span
		s = c.fullUnswept(sg).pop()
		if s == nil {
			break
		}
		if atomic.Load(&s.sweepgen) == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
			s.sweep(true)
			freeIndex := s.nextFreeIndex()
			if freeIndex != s.nelems {
				s.freeindex = freeIndex
				goto havespan
			}
			c.fullSwept(sg).push(s)
		}
	}
	// 從堆中申請新的內存管理單元
	s = c.grow()
	if s == nil {
		return nil
	} 
havespan:
	n := int(s.nelems) - int(s.allocCount)
	if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
		throw("span has no free objects")
	} 
	//更新 nmalloc
	atomic.Xadd64(&c.nmalloc, int64(n))
	usedBytes := uintptr(s.allocCount) * s.elemsize
	atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
	if trace.enabled {
		// heap_live changed.
		traceHeapAlloc()
	}
	if gcBlackenEnabled != 0 {
		// heap_live changed.
		gcController.revise()
	}
	freeByteBase := s.freeindex &^ (64 - 1)
	whichByte := freeByteBase / 8 
	// 更新allocCache
	s.refillAllocCache(whichByte) 
 	// s.allocCache.
	s.allocCache >>= s.freeindex % 64 
	return s
}

cacheSpan主要是從mcentral的spanset中去尋找可用的span，如果沒找到那么調用grow方法從堆中申請新的內存管理單元。

獲取到后更新nmalloc、allocCache等字段。

runtime.mcentral.grow觸發擴容操作從堆中申請新的內存:

func (c *mcentral) grow() *mspan {
	// 獲取待分配的頁數
	npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
	size := uintptr(class_to_size[c.spanclass.sizeclass()])
	// 獲取新的span
	s := mheap_.alloc(npages, c.spanclass, true)
	if s == nil {
		return nil
	}

	// Use division by multiplication and shifts to quickly compute:
	// n := (npages << _PageShift) / size
	n := (npages << _PageShift) >> s.divShift * uintptr(s.divMul) >> s.divShift2
	// 初始化limit 
	s.limit = s.base() + size*n
	heapBitsForAddr(s.base()).initSpan(s)
	return s
}

grow里面會調用runtime.mheap.alloc方法獲取span，這個方法在上面已經講過了，不記得的同學可以翻一下文章上面。

到這里小對象的分配就講解完畢了。

微對象分配

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
	...
	dataSize := size
	// 獲取mcache，用於處理微對象和小對象的分配
	c := gomcache()
	var x unsafe.Pointer
	// 表示對象是否包含指針，true表示對象里沒有指針
	noscan := typ == nil || typ.ptrdata == 0
	// maxSmallSize=32768 32k
	if size <= maxSmallSize {
		// maxTinySize= 16 bytes 
		if noscan && size < maxTinySize { 
			off := c.tinyoffset 
			// 指針內存對齊
			if size&7 == 0 {
				off = alignUp(off, 8)
			} else if size&3 == 0 {
				off = alignUp(off, 4)
			} else if size&1 == 0 {
				off = alignUp(off, 2)
			}
			// 判斷指針大小相加是否超過16
			if off+size <= maxTinySize && c.tiny != 0 {
				// 獲取tiny空閑內存的起始位置
				x = unsafe.Pointer(c.tiny + off)
				// 重設偏移量
				c.tinyoffset = off + size
				// 統計數量
				c.local_tinyallocs++
				mp.mallocing = 0
				releasem(mp)
				return x
			}  
			// 重新分配一個內存塊
			span := c.alloc[tinySpanClass]
			v := nextFreeFast(span)
			if v == 0 {
				v, _, shouldhelpgc = c.nextFree(tinySpanClass)
			}
			x = unsafe.Pointer(v)
			//將申請的內存塊全置為 0
			(*[2]uint64)(x)[0] = 0
			(*[2]uint64)(x)[1] = 0 
			// 如果申請的內存塊用不完，則將剩下的給 tiny，用 tinyoffset 記錄分配了多少。
			if size < c.tinyoffset || c.tiny == 0 {
				c.tiny = uintptr(x)
				c.tinyoffset = size
			}
			size = maxTinySize
		}  
		...
	}  
	...
	return x
}

在分配對象內存的時候做了一個判斷，如果該對象的大小小於16bytes，並且是不包含指針的，那么就可以看作是微對象。

在分配微對象的時候，會先判斷一下tiny指向的內存塊夠不夠用，如果tiny剩余的空間超過了size大小，那么就直接在tiny上分配內存返回；

mchache2

這里我再次使用我上面的圖來加以解釋。首先會去mcache數組里面找到對應的span，tinySpanClass對應的span的屬性如下：

startAddr: 824635752448,
npages: 1,
manualFreeList: 0,
freeindex: 128,
nelems: 512,
elemsize: 16,
limit: 824635760640,
allocCount: 128,
spanclass: tinySpanClass (5),
...

tinySpanClass對應的mspan里面只有一個page，里面的元素可以裝512（nelems）個；page里面每個對象的大小是16bytes（elemsize），目前已分配了128個對象（allocCount），當然我上面的page畫不了這么多，象征性的畫了一下。

上面的圖中還畫了在page里面其中的一個object已經被使用了12bytes，還剩下4bytes沒有被使用，所以會更新tinyoffset與tiny的值。

總結

本文先是介紹了如何對go的匯編進行調試，然后分了三個層次來講解go中的內存分配是如何進行的。對於小於32k的對象來說，go通過無鎖的方式可以直接從mcache獲取到了對應的內存，如果mcache內存不夠的話，先是會到mcentral中獲取內存，最后才到mheap中申請內存。對於大對象（>32k）來說可以直接mheap中申請，但是對於大對象來說也是有一定優化，當大對象需要分配的頁小於16頁的時候會直接從pageCache中分配，否則才會從堆頁中獲取。

Reference

https://chai2010.cn/advanced-go-programming-book/ch3-asm/ch3-09-debug.html

https://deepu.tech/memory-management-in-golang/

https://medium.com/@ankur_anand/a-visual-guide-to-golang-memory-allocator-from-ground-up-e132258453ed

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

https://draveness.me/golang/docs/part3-runtime/ch07-memory/golang-memory-allocator

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 go - 內存分配機制詳解詳解Go語言的內存模型及堆的分配管理圖解Go語言內存分配 CoreCLR源碼探索(三) GC內存分配器的內部實現 Go語言內存管理（一）內存分配 Netty源碼—五、內存分配概述詳解Go語言調度循環源碼實現如何實現zookeeper內存分配 Java 中的內存分配 java中內存分配