深入理解Go語言(04):scheduler調度器-GPM源碼分析


在前面一節中簡單介紹了golang的調度模型-GPM模型,介紹了他們各自的作用。這篇文章就來看看他們的源碼結構。

Go版本:go1.13.9

M結構體

M結構體是OS線程的一個抽象,主要負責結合P運行G。
它里面有很多字段,差不多有60個字段,我們看看里面主要的字段意思。
/src/runtime/runtime2.go

type m struct {
    // 系統管理的一個g,執行調度代碼時使用的。比如執行用戶的goroutine時,就需要把把用戶
    // 的棧信息換到內核線程的棧,以便能夠執行用戶goroutine
	g0      *g     // goroutine with scheduling stack
	morebuf gobuf  // gobuf arg to morestack
	divmod  uint32 // div/mod denominator for arm - known to liblink

	// Fields not known to debuggers.
	procid        uint64       // for debuggers, but offset not hard-coded
    //處理signal的 g
	gsignal       *g           // signal-handling g
	goSigStack    gsignalStack // Go-allocated signal handling stack
	sigmask       sigset       // storage for saved signal mask
    //線程的本地存儲TLS,這里就是為什么OS線程能運行M關鍵地方
	tls           [6]uintptr   // thread-local storage (for x86 extern register)
	//go 關鍵字運行的函數
    mstartfn      func()
    //當前運行的用戶goroutine的g結構體對象
	curg          *g       // current running goroutine
	caughtsig     guintptr // goroutine running during fatal signal
    
    //當前工作線程綁定的P,如果沒有就為nil
	p             puintptr // attached p for executing go code (nil if not executing go code)
	//暫存與當前M潛在關聯的P
    nextp         puintptr
    //M之前調用的P
	oldp          puintptr // the p that was attached before executing a syscall
	id            int64
	mallocing     int32
	throwing      int32
    //當前M是否關閉搶占式調度
	preemptoff    string // if != "", keep curg running on this m
	locks         int32
	dying         int32
	profilehz     int32
    //M的自旋狀態,為true時M處於自旋狀態,正在從其他線程偷G; 為false,休眠狀態
	spinning      bool // m is out of work and is actively looking for work
	blocked       bool // m is blocked on a note
	newSigstack   bool // minit on C thread called sigaltstack
	printlock     int8
	incgo         bool   // m is executing a cgo call
	freeWait      uint32 // if == 0, safe to free g0 and delete m (atomic)
	fastrand      [2]uint32
	needextram    bool
	traceback     uint8
	ncgocall      uint64      // number of cgo calls in total
	ncgo          int32       // number of cgo calls currently in progress
	cgoCallersUse uint32      // if non-zero, cgoCallers in use temporarily
	cgoCallers    *cgoCallers // cgo traceback if crashing in cgo call
	//沒有goroutine運行時,工作線程睡眠
    //通過這個來喚醒工作線程
    park          note // 休眠鎖
    //記錄所有工作線程的鏈表
	alllink       *m // on allm
	schedlink     muintptr
    //當前線程內存分配的本地緩存
	mcache        *mcache
    //當前M鎖定的G,
	lockedg       guintptr
	createstack   [32]uintptr // stack that created this thread.
	lockedExt     uint32      // tracking for external LockOSThread
	lockedInt     uint32      // tracking for internal lockOSThread
	nextwaitm     muintptr    // next m waiting for lock
	waitunlockf   func(*g, unsafe.Pointer) bool
	waitlock      unsafe.Pointer
	waittraceev   byte
	waittraceskip int
	startingtrace bool
	syscalltick   uint32
    //操作系統線程id
	thread        uintptr // thread handle
	freelink      *m      // on sched.freem

	// these are here because they are too large to be on the stack
	// of low-level NOSPLIT functions.
	libcall   libcall
	libcallpc uintptr // for cpu profiler
	libcallsp uintptr
	libcallg  guintptr
	syscall   libcall // stores syscall parameters on windows

	vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call)
	vdsoPC uintptr // PC for traceback while in VDSO call

	dlogPerM

	mOS
}

看看幾個比較重要的字段:
g0:用於執行調度器的g0
gsignal:用於信號處理
tls:線程本地存儲的tls
p:goroutine綁定的本地資源


P結構體

一個M要運行,必須綁定P才能運行goroutine,M阻塞時,P會被傳給其他M。

/src/runtime/runtime2.go

type p struct {
    //allp中的索引
	id          int32
    //p的狀態
	status      uint32 // one of pidle/prunning/...
	link        puintptr
	schedtick   uint32     // incremented on every scheduler call->每次scheduler調用+1
	syscalltick uint32     // incremented on every system call->每次系統調用+1
	sysmontick  sysmontick // last tick observed by sysmon
    //指向綁定的 m,如果 p 是 idle 的話,那這個指針是 nil
	m           muintptr   // back-link to associated m (nil if idle)
	mcache      *mcache
	raceprocctx uintptr

    //不同大小可用defer結構池
	deferpool    [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
	deferpoolbuf [5][32]*_defer

	// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
	goidcache    uint64
	goidcacheend uint64

    //本地運行隊列,可以無鎖訪問
	// Queue of runnable goroutines. Accessed without lock.
	runqhead uint32  //隊列頭
	runqtail uint32   //隊列尾
    //數組實現的循環隊列
	runq     [256]guintptr
    
	// runnext, if non-nil, is a runnable G that was ready'd by
	// the current G and should be run next instead of what's in
	// runq if there's time remaining in the running G's time
	// slice. It will inherit the time left in the current time
	// slice. If a set of goroutines is locked in a
	// communicate-and-wait pattern, this schedules that set as a
	// unit and eliminates the (potentially large) scheduling
	// latency that otherwise arises from adding the ready'd
	// goroutines to the end of the run queue.
    // runnext 非空時,代表的是一個 runnable 狀態的 G,
    //這個 G 被 當前 G 修改為 ready 狀態,相比 runq 中的 G 有更高的優先級。
    //如果當前 G 還有剩余的可用時間,那么就應該運行這個 G
    //運行之后,該 G 會繼承當前 G 的剩余時間
	runnext guintptr

	// Available G's (status == Gdead)
    //空閑的g
	gFree struct {
		gList
		n int32
	}

	sudogcache []*sudog
	sudogbuf   [128]*sudog

	tracebuf traceBufPtr

	// traceSweep indicates the sweep events should be traced.
	// This is used to defer the sweep start event until a span
	// has actually been swept.
	traceSweep bool
	// traceSwept and traceReclaimed track the number of bytes
	// swept and reclaimed by sweeping in the current sweep loop.
	traceSwept, traceReclaimed uintptr

	palloc persistentAlloc // per-P to avoid mutex

	_ uint32 // Alignment for atomic fields below

	// Per-P GC state
	gcAssistTime         int64    // Nanoseconds in assistAlloc
	gcFractionalMarkTime int64    // Nanoseconds in fractional mark worker (atomic)
	gcBgMarkWorker       guintptr // (atomic)
	gcMarkWorkerMode     gcMarkWorkerMode

	// gcMarkWorkerStartTime is the nanotime() at which this mark
	// worker started.
	gcMarkWorkerStartTime int64

	// gcw is this P's GC work buffer cache. The work buffer is
	// filled by write barriers, drained by mutator assists, and
	// disposed on certain GC state transitions.
	gcw gcWork

	// wbBuf is this P's GC write barrier buffer.
	//
	// TODO: Consider caching this in the running G.
	wbBuf wbBuf

	runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point

	pad cpu.CacheLinePad
}

其他的一些字段就是gc,trace,debug信息


G結構體

G就是goroutine。主要保存 goroutine 的所有信息以及棧信息,gobuf結構體:cpu里的寄存器信息,以便在輪到本 goroutine 執行時,知道從哪里開始執行。

/src/runtime/runtime2.go

type stack struct {
	lo uintptr   //棧頂,指向內存低地址
	hi uintptr   //棧底,指向內存搞地址
}

type g struct {
	// Stack parameters.
	// stack describes the actual stack memory: [stack.lo, stack.hi).
	// stackguard0 is the stack pointer compared in the Go stack growth prologue.
	// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
	// stackguard1 is the stack pointer compared in the C stack growth prologue.
	// It is stack.lo+StackGuard on g0 and gsignal stacks.
	// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
	// 記錄該goroutine使用的棧
    stack       stack   // offset known to runtime/cgo
    
	//下面兩個成員用於棧溢出檢查,實現棧的自動伸縮,搶占調度也會用到stackguard0
    stackguard0 uintptr // offset known to liblink
	stackguard1 uintptr // offset known to liblink

	_panic         *_panic // innermost panic - offset known to liblink
	_defer         *_defer // innermost defer
    
    // 此goroutine正在被哪個工作線程執行
	m              *m      // current m; offset known to arm liblink
    //這個字段跟調度切換有關,G切換時用來保存上下文,保存什么,看下面gobuf結構體
	sched          gobuf
	syscallsp      uintptr        // if status==Gsyscall, syscallsp = sched.sp to use during gc
	syscallpc      uintptr        // if status==Gsyscall, syscallpc = sched.pc to use during gc
	stktopsp       uintptr        // expected sp at top of stack, to check in traceback
	param          unsafe.Pointer // passed parameter on wakeup,wakeup喚醒時傳遞的參數
	// 狀態Gidle,Grunnable,Grunning,Gsyscall,Gwaiting,Gdead
    atomicstatus   uint32
	stackLock      uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
	goid           int64
    
    //schedlink字段指向全局運行隊列中的下一個g,
    //所有位於全局運行隊列中的g形成一個鏈表
	schedlink      guintptr
	waitsince      int64      // approx time when the g become blocked
	waitreason     waitReason // if status==Gwaiting,g被阻塞的原因
    //搶占信號,stackguard0 = stackpreempt,如果需要搶占調度,設置preempt為true
	preempt        bool       // preemption signal, duplicates stackguard0 = stackpreempt
	paniconfault   bool       // panic (instead of crash) on unexpected fault address
	preemptscan    bool       // preempted g does scan for gc
	gcscandone     bool       // g has scanned stack; protected by _Gscan bit in status
	gcscanvalid    bool       // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
	throwsplit     bool       // must not split stack
	raceignore     int8       // ignore race detection events
	sysblocktraced bool       // StartTrace has emitted EvGoInSyscall about this goroutine
	sysexitticks   int64      // cputicks when syscall has returned (for tracing)
	traceseq       uint64     // trace event sequencer
	tracelastp     puintptr   // last P emitted an event for this goroutine
	// 如果調用了 LockOsThread,那么這個 g 會綁定到某個 m 上
    lockedm        muintptr
	sig            uint32
	writebuf       []byte
	sigcode0       uintptr
	sigcode1       uintptr
	sigpc          uintptr
    // 創建這個goroutine的go表達式的pc
	gopc           uintptr         // pc of go statement that created this goroutine
	ancestors      *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
	startpc        uintptr         // pc of goroutine function
	racectx        uintptr
	waiting        *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
	cgoCtxt        []uintptr      // cgo traceback context
	labels         unsafe.Pointer // profiler labels
	timer          *timer         // cached timer for time.Sleep,為 time.Sleep 緩存的計時器
	selectDone     uint32         // are we participating in a select and did someone win the race?

	// Per-G GC state

	// gcAssistBytes is this G's GC assist credit in terms of
	// bytes allocated. If this is positive, then the G has credit
	// to allocate gcAssistBytes bytes without assisting. If this
	// is negative, then the G must correct this by performing
	// scan work. We track this in bytes to make it fast to update
	// and check for debt in the malloc hot path. The assist ratio
	// determines how this corresponds to scan work debt.
	gcAssistBytes int64
}

gobuf

gobuf結構體用於保存goroutine的調度信息,主要包括CPU的幾個寄存器的值。

要了解寄存器是什么,可以點擊這里:
寄存器1
寄存器2

/src/runtime/runtime2.go

type gobuf struct {
	// The offsets of sp, pc, and g are known to (hard-coded in) libmach.
	//
	// ctxt is unusual with respect to GC: it may be a
	// heap-allocated funcval, so GC needs to track it, but it
	// needs to be set and cleared from assembly, where it's
	// difficult to have write barriers. However, ctxt is really a
	// saved, live register, and we only ever exchange it between
	// the real register and the gobuf. Hence, we treat it as a
	// root during stack scanning, which means assembly that saves
	// and restores it doesn't need write barriers. It's still
	// typed as a pointer so that any other writes from Go get
	// write barriers.
	sp   uintptr      // 保存CPU的rsp寄存器的值
	pc   uintptr      // 保存CPU的rip寄存器的值
	g    guintptr     // 記錄當前這個gobuf對象屬於哪個goroutine
	ctxt unsafe.Pointer
    
    //保存系統調用的返回值,因為從系統調用返回之后如果p被其它工作線程搶占,
    //則這個goroutine會被放入全局運行隊列被其它工作線程調度,其它線程需要知道系統調用的返回值。
	ret  sys.Uintreg  // 保存系統調用的返回值
	lr   uintptr
    
    //保存CPU的rip寄存器的值
	bp   uintptr // for GOEXPERIMENT=framepointer
}

調度器sched結構

所有的gorouteine都是被調度器調度運行,調度器持有全局資源

sched

/src/runtime/runtime2.go

type schedt struct {
	// accessed atomically. keep at top to ensure alignment on 32-bit systems.
    // 需以原子訪問訪問。
    // 保持在 struct 頂部,以使其在 32 位系統上可以對齊
	goidgen  uint64
	lastpoll uint64

	lock mutex

	// When increasing nmidle, nmidlelocked, nmsys, or nmfreed, be
	// sure to call checkdead().
	
    //由空閑的工作線程組成的鏈表
	midle        muintptr // idle m's waiting for work
    //空閑的工作線程的數量
	nmidle       int32    // number of idle m's waiting for work
    //空閑的且被 lock 的 m 計數
	nmidlelocked int32    // number of locked m's waiting for work
    //已經創建的多個m,下一個m id
	mnext        int64    // number of m's that have been created and next M ID
    //被允許創建的最大m線程數量
	maxmcount    int32    // maximum number of m's allowed (or die)
	nmsys        int32    // number of system m's not counted for deadlock
    //累積空閑的m數量
	nmfreed      int64    // cumulative number of freed m's

    //系統goroutine的數量,自動更新
	ngsys uint32 // number of system goroutines; updated atomically
	
    //由空閑的 p 結構體對象組成的鏈表
	pidle      puintptr // idle p's
    //空閑的 p 結構體對象的數量
	npidle     uint32
	nmspinning uint32 // See "Worker thread parking/unparking" comment in proc.go.

	// Global runnable queue.
    //全局運行隊列 G隊列
	runq     gQueue //這個結構體在proc.go里
    //元素數量
	runqsize int32

	// disable controls selective disabling of the scheduler.
	//
	// Use schedEnableUser to control this.
	//
	// disable is protected by sched.lock.
	disable struct {
		// user disables scheduling of user goroutines.
		user     bool
		runnable gQueue // pending runnable Gs
		n        int32  // length of runnable
	}

	// Global cache of dead G's. 有效 dead G 全局緩存
	gFree struct {
		lock    mutex
		stack   gList // Gs with stacks
		noStack gList // Gs without stacks
		n       int32
	}

	// Central cache of sudog structs. dusog結構的集中緩存
	sudoglock  mutex
	sudogcache *sudog

	// Central pool of available defer structs of different sizes. 不同大小有效的defer結構的池
	deferlock mutex
	deferpool [5]*_defer

	// freem is the list of m's waiting to be freed when their
	// m.exited is set. Linked through m.freelink.
	freem *m

	gcwaiting  uint32 // gc is waiting to run
	stopwait   int32
	stopnote   note
	sysmonwait uint32
	sysmonnote note

	// safepointFn should be called on each P at the next GC
	// safepoint if p.runSafePointFn is set.
	safePointFn   func(*p)
	safePointWait int32
	safePointNote note

	profilehz int32 // cpu profiling rate

	procresizetime int64 // nanotime() of last change to gomaxprocs
	totaltime      int64 // ∫gomaxprocs dt up to procresizetime
}

gQueue

/src/runtime/proc.go

type gQueue struct {
	head guintptr //隊列頭
	tail guintptr //隊列尾
}

一些重要全局變量

/src/runtime/proc.go

m0 m            //代表主線程
g0  g          //m0綁定的g0,也就是M結構體中m0.g0=&g0


allgs  []*g  //保存所有的g

/src/runtime/runtime2.go

allm  *m             //所有的m構成的一個鏈表,包括上面的m0
allp  []*p            //保存所有的p, len(allp) == gomaxprocs

sched         schedt //調度器的結構體,保存了調度器的各種信息

ncpu       int32  //系統cpu核的數量,程序啟動時由runtime初始化
gomaxprocs int32 //p 的最大數量,默認等於ncpu,可以通過GOMAXPROCS修改

在程序初始化時,這些變量都會被初始化為0值,指針會被初始化為nil指針,切片初始化為nil切片,int被初始化為數字0,結構體的所有成員變量按其本類型初始化為其類型的0值。


調度器初始化

調度器初始化有一個主要的函數 schedinit(), 這個函數在 /src/runtime/proc.go 文件中。
函數開頭還把初始化的順序給列出來了:

// The bootstrap sequence is:
//
//  call osinit
//  call schedinit
//  make & queue new G
//  call runtime·mstart
//
// The new G calls runtime·main.

func schedinit() {
	// raceinit must be the first call to race detector.
	// In particular, it must be done before mallocinit below calls racemapshadow.
	_g_ := getg() //getg() 在 src/runtime/stubs.go 中聲明,真正的代碼由編譯器生成
	if raceenabled {
		_g_.racectx, raceprocctx0 = raceinit()
	}
	
    //設置最大M的數量
	sched.maxmcount = 10000

	tracebackinit()
	moduledataverify()
    //初始化棧空間常用管理鏈表
	stackinit()
	mallocinit()
    //初始化當前m
	mcommoninit(_g_.m)
	cpuinit()       // must run before alginit
	alginit()       // maps must not be used before this call
	modulesinit()   // provides activeModules
	typelinksinit() // uses maps, activeModules
	itabsinit()     // uses activeModules

	msigsave(_g_.m)
	initSigmask = _g_.m.sigmask

	goargs()
	goenvs()
	parsedebugvars()
	gcinit()

	sched.lastpoll = uint64(nanotime())
    // 把p數量從1調整到默認的CPU Core數量
	procs := ncpu
	if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
		procs = n
	}
    //調整P數量
    //這里的P都是新建的,所以不返回有本地任務的p
	if procresize(procs) != nil {
		throw("unknown runnable goroutine during bootstrap")
	}

	// For cgocheck > 1, we turn on the write barrier at all times
	// and check all pointer writes. We can't do this until after
	// procresize because the write barrier needs a P.
	if debug.cgocheck > 1 {
		writeBarrier.cgo = true
		writeBarrier.enabled = true
		for _, p := range allp {
			p.wbBuf.reset()
		}
	}

	if buildVersion == "" {
		// Condition should never trigger. This code just serves
		// to ensure runtime·buildVersion is kept in the resulting binary.
		buildVersion = "unknown"
	}
	if len(modinfo) == 1 {
		// Condition should never trigger. This code just serves
		// to ensure runtime·modinfo is kept in the resulting binary.
		modinfo = ""
	}
}

開頭的這個函數getg(),跳轉到了 func getg() *g  ,定義這么一個形式,什么意思?
函數首先調用 getg() 函數獲取當前正在運行的 ggetg()src/runtime/stubs.go 中聲明,真正的代碼由編譯器生成。

// getg returns the pointer to the current g.
// The compiler rewrites calls to this function into instructions
// that fetch the g directly (from TLS or from the dedicated register).
func getg() *g

注釋里也說了,getg 返回當前正在運行的 goroutine 的指針,它會從 tls 里取出 tls[0],也就是當前運行的 goroutine 的地址。編譯器插入類似下面的代碼:

get_tls(CX) 
MOVQ g(CX), BX; // BX存器里面現在放的是當前g結構體對象的地址

原來是這么個意思。

調度器初始化大致過程:
M初始化            -->   P 初始化          - -> G初始化
mcommoninit           Procresize                newproc
-------------------------------------------------------
allm 池                     allp池                       g.sched執行現場
                                                               p.runq 調度隊列

MPG初始化過程。 M/P/G 初始化:mcommoninit、procresize、newproc,他們負責M資源池(allm)、p資源池(allp)、G的運行現場(g.sched) 以及調度隊列(p.runq)


 

調度循環

所有的工作初始化完成后,就要啟動運行器了。准備工作做好了,就要啟動mstart了。
這個工作在匯編語言中也可以看出來

/src/runtime/asm_amd64.s  (在linux下)

TEXT runtime·rt0_go(SB),NOSPLIT,$0

  ... ... ...

  MOVL	16(SP), AX		// copy argc
	MOVL	AX, 0(SP)
	MOVQ	24(SP), AX		// copy argv
	MOVQ	AX, 8(SP)
	CALL	runtime·args(SB)  
	CALL	runtime·osinit(SB)    //OS初始化
	CALL	runtime·schedinit(SB) //調度器初始化

	// create a new goroutine to start program
	MOVQ	$runtime·mainPC(SB), AX		// entry
	PUSHQ	AX
	PUSHQ	$0			// arg size
	CALL	runtime·newproc(SB)       // G 初始化
	POPQ	AX
	POPQ	AX

	// start this M , 啟動M
	CALL	runtime·mstart(SB)

	CALL	runtime·abort(SB)	// mstart should never return
	RET

參考

  1. 雨痕 《Go語言學習筆記》 https://book.douban.com/subject/26832468/
  2. 深度解密Go語言 https://qcrao.com/2019/09/02/dive-into-go-scheduler/
  3. https://blog.csdn.net/u010853261/article/details/84790392


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM