《k8s-1.13版本源碼分析》- Scheduler啟動前邏輯

本文轉載自查看原文 2019-02-22 15:29 822 《k8s-1.13版本源碼分析》

本文原始地址（gitbook格式）：https://farmer-hutao.github.io/k8s-source-code-analysis/core/scheduler/before-scheduler-run.html

本項目github地址：https://github.com/farmer-hutao/k8s-source-code-analysis

調度程序啟動前邏輯
1. 概述

前面提到過scheduler程序可以分為三層，第一層是調度器啟動前的邏輯，包括命令行參數解析、參數校驗、調度器初始化等一系列邏輯。這個部分我不會太詳細地介紹，因為這些代碼位於調度框架之前，相對比較枯燥無趣，講多了磨滅大伙對源碼的興趣～

2. cobra和main

劇透一下先，如果你之前沒有用過cobra，那么在第一次見到cobra之后，很可能以后你自己寫的程序，開發的小工具會全部變成cobra風格。我最近半年寫的命令行程序就全部是基於cobra+pflag的。cobra有多優雅呢，且聽我慢慢道來～

2.1. cobra是啥

從github上我們可以找到這個項目，截至今天已經有上萬個star，一百多個contributors，可見來頭不小！Cobra官方描述是：

Cobra is both a library for creating powerful modern CLI applications as well as a program to generate applications and command files.

也就是這個意思：Cobra既是一個創建強大的現代化命令行程序的庫，又是一個用於生成應用和命令行文件的程序。有很多流行的Go項目用了Cobra，其中當然包括我們最最熟知的k8s和docker，大致列出來有這些：
- Kubernetes
- Hugo
- rkt
- etcd
- Moby (former Docker)
- Docker (distribution)
- OpenShift
- Delve
- GopherJS
- CockroachDB
- Bleve
- ProjectAtomic (enterprise)
- Giant Swarm's gsctl
- Nanobox/Nanopack
- rclone
- nehm
- Pouch
如果你是雲計算方向的攻城獅，上面半數項目應該都耳熟能詳～

2.2. 使用cobra

下面我們實踐一下cobra，先下載這個項目編譯一下：
```
# 如果你的網絡很給力，那么下面這個命令就夠了； go get -u github.com/spf13/cobra/cobra # 如果你的網絡不給力，那就下載cobra的zip包，丟到GOPATH下對應目錄，然后解決依賴，再build 
```
於是我們得到了這樣一個可執行文件及項目源碼：

我們試一下這個命令：cobra init ${project-name}
```
[root@farmer-hutao src]# cobra init myapp Your Cobra application is ready at /root/go/src/myapp Give it a try by going there and running `go run main.go`. Add commands to it by running `cobra add [cmdname]`. [root@farmer-hutao src]# ls myapp/ cmd LICENSE main.go [root@farmer-hutao src]# pwd /root/go/src 
```
如上，本地可以看到一個main.go和一個cmd目錄，這個cmd和k8s源碼里的cmd是不是很像～

main.go里面的代碼很精簡，如下：

main.go
```
package main import "myapp/cmd" func main() { cmd.Execute() } 
```
這里注意到調用了一個cmd的Execute()方法，我們繼續看cmd是什么：

如上圖，在main.go里面import了myapp/cmd，也就是這個root.go文件。所以Execute()函數就很好找了。在Execute里面調用了rootCmd.Execute()方法，這個rootCmd是*cobra.Command類型的。我們關注一下這個類型。

下面我們繼續使用cobra命令給myapp添加一個子命令：

如上，我們的程序可以使用version子命令了！我們看一下源碼發生了什么變化：

多了一個version.go，在這個源文件的init()函數里面調用了一個rootCmd.AddCommand(versionCmd)，這里可以猜到是根命令下添加一個子命令的意思，根命令表示的就是我們直接執行這個可執行文件，子命令就是version，放在一起的感覺就類似大家使用kubectl version的感覺。

另外注意到這里的Run屬性是一個匿名函數，這個函數中輸出了“version called”字樣，也就是說我們執行version子命令的時候其實是調用到了這里的Run.

最后我們實踐一下多級子命令：

套路也就這樣，通過serverCmd.AddCommand(createCmd)調用后就能夠把*cobra.Command類型的createCmd變成serverCmd的子命令了，這個時候我們玩起來就像kubectl get pods.

行，看到這里我們回頭看一下scheduler的源碼就能找到main的邏輯了。

3. Scheduler的main

我們打開文件：cmd/kube-scheduler/scheduler.go可以找到scheduler的main()函數，很簡短，去掉枝干后如下：

cmd/kube-scheduler/scheduler.go:34
```
func main() { command := app.NewSchedulerCommand() if err := command.Execute(); err != nil { fmt.Fprintf(os.Stderr, "%v\n", err) os.Exit(1) } } 
```
看到這里猜都能猜到kube-scheduler這個二進制文件在運行的時候是調用了command.Execute()函數背后的那個Run，那個Run躲在command := app.NewSchedulerCommand()這行代碼調用的NewSchedulerCommand()方法里，這個方法一定返回了一個*cobra.Command類型的對象。我們跟進去這個函數，看一下是不是這個樣子：

cmd/kube-scheduler/app/server.go:70
```
/ NewSchedulerCommand creates a *cobra.Command object with default parameters func NewSchedulerCommand() *cobra.Command { cmd := &cobra.Command{ Use: "kube-scheduler", Long: `The Kubernetes scheduler is a policy-rich, topology-aware, workload-specific function that significantly impacts availability, performance, and capacity. The scheduler needs to take into account individual and collective resource requirements, quality of service requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, deadlines, and so on. Workload-specific requirements will be exposed through the API as necessary.`, Run: func(cmd *cobra.Command, args []string) { if err := runCommand(cmd, args, opts); err != nil { fmt.Fprintf(os.Stderr, "%v\n", err) os.Exit(1) } }, } return cmd } 
```
如上，同樣我先刪掉了一些枝干代碼，剩下的可以很清楚地看到，schduler啟動時調用了runCommand(cmd, args, opts)，這個函數在哪里呢，繼續跟一下：

cmd/kube-scheduler/app/server.go:117
```
// runCommand runs the scheduler. func runCommand(cmd *cobra.Command, args []string, opts *options.Options) error { c, err := opts.Config() stopCh := make(chan struct{}) // Get the completed config cc := c.Complete() return Run(cc, stopCh) } 
```
如上，可以看到這里是處理配置問題后調用了一個Run()函數，Run()的作用是基於給定的配置啟動scheduler，它只會在出錯時或者channel stopCh被關閉時才退出，代碼主要部分如下：

cmd/kube-scheduler/app/server.go:167
```
// Run executes the scheduler based on the given configuration. It only return on error or when stopCh is closed. func Run(cc schedulerserverconfig.CompletedConfig, stopCh <-chan struct{}) error { // Create the scheduler. sched, err := scheduler.New(cc.Client, cc.InformerFactory.Core().V1().Nodes(), stopCh, scheduler.WithName(cc.ComponentConfig.SchedulerName)) // Prepare a reusable runCommand function. run := func(ctx context.Context) { sched.Run() <-ctx.Done() } ctx, cancel := context.WithCancel(context.TODO()) defer cancel() go func() { select { case <-stopCh: cancel() case <-ctx.Done(): } }() // Leader election is disabled, so runCommand inline until done. run(ctx) return fmt.Errorf("finished without leader elect") } 
```
可以看到這里最終是要跑sched.Run()這個方法來啟動scheduler，sched.Run()方法已經在pkg下，具體位置是pkg/scheduler/scheduler.go:276，也就是scheduler框架真正運行的邏輯了。於是我們已經從main出發，找到了scheduler主框架的入口，具體的scheduler邏輯我們下一講再來仔細分析。

最后我們來看一下sched的定義，在linux里我們經常會看到一些軟件叫做什么什么d，d也就是daemon，守護進程的意思，也就是一直跑在后台的一個程序。這里的sched也就是“scheduler daemon”的意思。sched的其實是*Scheduler類型，定義在：

pkg/scheduler/scheduler.go:58
```
// Scheduler watches for new unscheduled pods. It attempts to find // nodes that they fit on and writes bindings back to the api server. type Scheduler struct { config *factory.Config } 
```
如上，注釋也很清晰，說Scheduler watch新創建的未被調度的pods，然后嘗試尋找合適的node，回寫一個綁定關系到api server.這里也可以體會到daemon的感覺，我們平時搭建的k8s集群中運行着一個daemon進程叫做kube-scheduler，這個一直跑着的進程做的就是上面注釋里說的事情，在程序里面也就對應這樣一個對象：Scheduler.

Scheduler結構體中的Config對象我們再簡單看一下：

pkg/scheduler/factory/factory.go:96
```
// Config is an implementation of the Scheduler's configured input data. type Config struct { // It is expected that changes made via SchedulerCache will be observed // by NodeLister and Algorithm. SchedulerCache schedulerinternalcache.Cache // Ecache is used for optimistically invalid affected cache items after // successfully binding a pod Ecache *equivalence.Cache NodeLister algorithm.NodeLister Algorithm algorithm.ScheduleAlgorithm GetBinder func(pod *v1.Pod) Binder // PodConditionUpdater is used only in case of scheduling errors. If we succeed // with scheduling, PodScheduled condition will be updated in apiserver in /bind // handler so that binding and setting PodCondition it is atomic. PodConditionUpdater PodConditionUpdater // PodPreemptor is used to evict pods and update pod annotations. PodPreemptor PodPreemptor // NextPod should be a function that blocks until the next pod // is available. We don't use a channel for this, because scheduling // a pod may take some amount of time and we don't want pods to get // stale while they sit in a channel. NextPod func() *v1.Pod // SchedulingQueue holds pods to be scheduled SchedulingQueue internalqueue.SchedulingQueue } 
```
如上，同樣我只保留了一些好理解的字段，我們隨便掃一下可以看到譬如：SchedulingQueue、NextPod、NodeLister這些很容易從字面上理解的字段，也就是Scheduler對象在工作（完成調度這件事）中需要用到的一些對象。

ok，下一講我們開始聊Scheduler的工作過程！