docker 源碼分析六（基於1.8.2版本），Docker run啟動過程

本文轉載自查看原文 2016-01-29 15:21 2574 docker

上一篇大致了解了docker 容器的創建過程，其實主要還是從文件系統的視角分析了創建一個容器時需要得建立 RootFS，建立volumes等步驟；本章來分析一下建立好一個容器后，將這個容器運行起來的過程，

本章主要分析一下 docker deamon端的實現方法；根據前面幾章的介紹可以容易找到，客戶端的實現代碼在api/client/run.go中，大體步驟是首先通過上一篇文章中的createContainer()方法建立一個container，然后通過調用cli.call("POST", "/containers/"+createResponse.ID+"/start", nil, nil)來實現將這個container啟動；在api/server/server.go中，客戶端請求對應的mapping為 "/containers/{name:.*}/start": s.postContainersStart，實現方法postContainerStart在api/server/container.go文件中，代碼如下：

func (s *Server) postContainersStart(version version.Version, w http.ResponseWriter, r *http.Request, vars map[string]string) error {

if vars == nil {

return fmt.Errorf("Missing parameter")

}

var hostConfig *runconfig.HostConfig

if r.Body != nil && (r.ContentLength > 0 || r.ContentLength == -1) {

if err := checkForJSON(r); err != nil {

return err

}

c, err := runconfig.DecodeHostConfig(r.Body)

if err != nil {

return err

}

hostConfig = c

}

if err := s.daemon.ContainerStart(vars["name"], hostConfig); err != nil {

if err.Error() == "Container already started" {

w.WriteHeader(http.StatusNotModified)

return nil

}

return err

}

w.WriteHeader(http.StatusNoContent)

return nil

}

邏輯非常簡單，首先從request中解析參數，然后調用s.daemon.ContainerStart(vars["name"],hostConfig)啟動容器，最后將結果寫回response；主要的實現部分在s.daemon.ContainerStart(vars["name"],hostConfig)之中。在daemon/start.go中；

func (daemon *Daemon) ContainerStart(name string, hostConfig *runconfig.HostConfig) error {

container, err := daemon.Get(name)

if err != nil {

return err

}

if container.IsPaused() {

return fmt.Errorf("Cannot start a paused container, try unpause instead.")

}

if container.IsRunning() {

return fmt.Errorf("Container already started")

}

// Windows does not have the backwards compatibility issue here.

if runtime.GOOS != "windows" {

// This is kept for backward compatibility - hostconfig should be passed when

// creating a container, not during start.

if hostConfig != nil {

if err := daemon.setHostConfig(container, hostConfig); err != nil {

return err

}

} else {

if hostConfig != nil {

return fmt.Errorf("Supplying a hostconfig on start is not supported. It should be supplied on create")

}

// check if hostConfig is in line with the current system settings.

// It may happen cgroups are umounted or the like.

if _, err = daemon.verifyContainerSettings(container.hostConfig, nil); err != nil {

return err

}

if err := container.Start(); err != nil {

return fmt.Errorf("Cannot start container %s: %s", name, err)

}

return nil

}

首先根據傳進來的名字，通過deamon.Get() (daemon/daemon.go)

func (daemon *Daemon) Get(prefixOrName string) (*Container, error) {

if containerByID := daemon.containers.Get(prefixOrName); containerByID != nil {

// prefix is an exact match to a full container ID

return containerByID, nil

}

// GetByName will match only an exact name provided; we ignore errors

if containerByName, _ := daemon.GetByName(prefixOrName); containerByName != nil {

// prefix is an exact match to a full container Name

return containerByName, nil

}

containerId, indexError := daemon.idIndex.Get(prefixOrName)

if indexError != nil {

return nil, indexError

}

return daemon.containers.Get(containerId), nil

}

首先從daemon.containers中根據name來進行查找，找出container是否已經存在了。daemon.container是contStore類型的結構體，其結構如下：

type contStore struct {

s map[string]*Container

sync.Mutex

}

接着通過GetByName查找：GetByName同樣在daemon/daemon.go中，代碼如下：

func (daemon *Daemon) GetByName(name string) (*Container, error) {

fullName, err := GetFullContainerName(name)

if err != nil {

return nil, err

}

entity := daemon.containerGraph.Get(fullName)

if entity == nil {

return nil, fmt.Errorf("Could not find entity for %s", name)

}

e := daemon.containers.Get(entity.ID())

if e == nil {

return nil, fmt.Errorf("Could not find container for entity id %s", entity.ID())

}

return e, nil

}

daemon.containerGraph是graphdb.Database類型(pkg/graphdb/graphdb.go文件中)，

type Database struct {

conn *sql.DB

mux sync.RWMutex

}

Database是一個存儲容器和容器之間關系的數據庫；目前Database是一個sqlite3數據庫，所在的路徑是/var/lib/docker/link/linkgraph.db中，其是在NewDaemon的實例化過程中，傳遞進來的。

graphdbPath := filepath.Join(config.Root, "linkgraph.db")

graph, err := graphdb.NewSqliteConn(graphdbPath)

if err != nil {

return nil, err

}

d.containerGraph = graph

數據庫中最主要有兩個表，分別是Entity，Edge，每一個鏡像對應一個實體，存在Entity表；每個鏡像與其父鏡像的關系存在Edge表。每一個表在代碼中也對應着一個結構體：

// Entity with a unique id.

type Entity struct {

id string

}

// An Edge connects two entities together.

type Edge struct {

EntityID string

Name string

ParentID string

}

通過建表語句也許更能直觀一些：

createEntityTable = `

CREATE TABLE IF NOT EXISTS entity (

id text NOT NULL PRIMARY KEY

);`

createEdgeTable = `

CREATE TABLE IF NOT EXISTS edge (

"entity_id" text NOT NULL,

"parent_id" text NULL,

"name" text NOT NULL,

CONSTRAINT "parent_fk" FOREIGN KEY ("parent_id") REFERENCES "entity" ("id"),

CONSTRAINT "entity_fk" FOREIGN KEY ("entity_id") REFERENCES "entity" ("id")

);

最后一步就是通過GetByName查找完之后，接着根據daemon.idIndex.Get()進行查找，idIndex和前一篇中的鏡像的idIndex是一樣的，是一個trie的結構；

回到ContainerStart() 函數，在獲取了container之后，接着判斷container是否是停止和正在運行的，如果都不是，在進行一些參數驗證(端口映射的設置、驗證exec driver、驗證內核是否支持cpu share，IO weight等)后，則啟動調用container.Start() (daemon/container.go)啟動container；

func (container *Container) Start() (err error) {

container.Lock()

defer container.Unlock()

if container.Running {

return nil

}

if container.removalInProgress || container.Dead {

return fmt.Errorf("Container is marked for removal and cannot be started.")

}

// if we encounter an error during start we need to ensure that any other

// setup has been cleaned up properly

defer func() {

if err != nil {

container.setError(err)

// if no one else has set it, make sure we don't leave it at zero

if container.ExitCode == 0 {

container.ExitCode = 128

}

container.toDisk()

container.cleanup()

container.LogEvent("die")

}

}()

if err := container.Mount(); err != nil {

return err

}

// Make sure NetworkMode has an acceptable value. We do this to ensure

// backwards API compatibility.

container.hostConfig = runconfig.SetDefaultNetModeIfBlank(container.hostConfig)

if err := container.initializeNetworking(); err != nil {

return err

}

linkedEnv, err := container.setupLinkedContainers()

if err != nil {

return err

}

if err := container.setupWorkingDirectory(); err != nil {

return err

}

env := container.createDaemonEnvironment(linkedEnv)

if err := populateCommand(container, env); err != nil {

return err

}

mounts, err := container.setupMounts()

if err != nil {

return err

}

container.command.Mounts = mounts

return container.waitForStart()
}

defer func() 里面的作用就是如果start container出問題的話，進行一些清理工作；

container.Mount() 掛在container的aufs文件系統；

initializeNetworking() 對網絡進行初始化，docker網絡模式有三種，分別是 bridge模式（每個容器用戶單獨的網絡棧），host模式（與宿主機共用一個網絡棧），contaier模式（與其他容器共用一個網絡棧，猜測kubernate中的pod所用的模式）；根據config和hostConfig中的參數來確定容器的網絡模式，然后調動libnetwork包來建立網絡，關於docker網絡的部分后面會單獨拿出一章出來梳理；

container.setupLinkedContainers() 將通過--link相連的容器中的信息獲取過來，然后將其中的信息轉成環境變量(是[]string數組的形式，每一個元素類似於"NAME=xxxx")的形式

返回；

setupWorkingDirectory() 建立容器執行命令時的工作目錄；

createDaemonEnvironment() 將container中的自有的一些環境變量和之前的linkedEnv和合在一起(append)，然后返回；

populateCommand(container, env) 主要是為container的execdriver(最終啟動容器的) 設置網絡模式、設置namespace(pid,ipc,uts)等、資源(resources)限制等，並且設置在容器內執行的Command，Command中含有容器內進程的啟動命令；

container.setupMounts() 返回container的所有掛載點；

最后調用container.waitForStart()函數啟動容器；

func (container *Container) waitForStart() error {

container.monitor = newContainerMonitor(container, container.hostConfig.RestartPolicy)

// block until we either receive an error from the initial start of the container's

// process or until the process is running in the container

select {

case <-container.monitor.startSignal:

case err := <-promise.Go(container.monitor.Start):

return err

}

return nil

}

首先實例化出來一個containerMonitor，monitor的作用主要是監控容器內第一個進程的執行，如果執行沒有成功，那么monitor可以按照一定的重啟策略(startPolicy)來進行重啟；

看下一下montitor(daemon/monitor.go)中的Start()函數，最主要的部分是

m.container.daemon.Run(m.container, pipes, m.callback)

在daemon/daemon.go文件中， Run方法：

func (daemon *Daemon) Run(c *Container, pipes *execdriver.Pipes, startCallback execdriver.StartCallback) (execdriver.ExitStatus, error) {
return daemon.execDriver.Run(c.command, pipes, startCallback)
}

docker的execDriver有兩個：lxc 和 native；lxc是較早的driver，native是默認的，用的是libcontainer；所以最終這個Run的方式是調用daemon/execdriver/native/driver.go中的Run() 方法：

func (d *Driver) Run(c *execdriver.Command, pipes *execdriver.Pipes, startCallback execdriver.StartCallback) (execdriver. ExitStatus, error) {

// take the Command and populate the libcontainer.Config from it

container, err := d.createContainer(c)

if err != nil {

return execdriver.ExitStatus{ExitCode: -1}, err

}

p := &libcontainer.Process{

Args: append([]string{c.ProcessConfig.Entrypoint}, c.ProcessConfig.Arguments...),

Env: c.ProcessConfig.Env,

Cwd: c.WorkingDir,

User: c.ProcessConfig.User,

}

if err := setupPipes(container, &c.ProcessConfig, p, pipes); err != nil {

return execdriver.ExitStatus{ExitCode: -1}, err

}

cont, err := d.factory.Create(c.ID, container)

if err != nil {

return execdriver.ExitStatus{ExitCode: -1}, err

}

d.Lock()

d.activeContainers[c.ID] = cont

d.Unlock()

defer func() {

cont.Destroy()

d.cleanContainer(c.ID)

}()

if err := cont.Start(p); err != nil {

return execdriver.ExitStatus{ExitCode: -1}, err

}

if startCallback != nil {

pid, err := p.Pid()

if err != nil {

p.Signal(os.Kill)

p.Wait()

return execdriver.ExitStatus{ExitCode: -1}, err

}

startCallback(&c.ProcessConfig, pid)

}

oom := notifyOnOOM(cont)

waitF := p.Wait

if nss := cont.Config().Namespaces; !nss.Contains(configs.NEWPID) {

// we need such hack for tracking processes with inherited fds,

// because cmd.Wait() waiting for all streams to be copied

waitF = waitInPIDHost(p, cont)

}

ps, err := waitF()

if err != nil {

execErr, ok := err.(*exec.ExitError)

if !ok {

return execdriver.ExitStatus{ExitCode: -1}, err

}

ps = execErr.ProcessState

}

cont.Destroy()

_, oomKill := <-oom

return execdriver.ExitStatus{ExitCode: utils.ExitStatus(ps.Sys().(syscall.WaitStatus)), OOMKilled: oomKill}, nil

}

d.createContainer(c) 根據command實例化出來一個container需要的配置；Capabilities、Namespace、Group、mountpoints等，首先根據模板生成固定的配置（daemon/execdriver/native/template/default_template.go），然后在根據command建立容器特定的namespace

接着實例化一個libcontainer.Process{}，里面的Args參數就是用戶輸入的entrypoint和cmd參數的組合，這也是將來容器的第一個進程(initProcess)要運行的一部分；

setupPipes(container, &c.ProcessConfig, p, pipes); 將container類(pipes)的標准輸入輸出與 libcontainer.Process (也是將來容器中的的init processs，就是變量p）進行綁定，這樣就可以獲取初始進程的輸入和輸出；

cont, err := d.factory.Create(c.ID, container) 調用driver.factory(~/docker_src/vendor/src/github.com/opencontainers/runc/libcontainer/factory_linux.go )來實例化一個linux container，結構如下：

linuxContainer{

id: id,

root: containerRoot,

config: config,

initPath: l.InitPath,

initArgs: l.InitArgs,

criuPath: l.CriuPath,

cgroupManager: l.NewCgroupsManager(config.Cgroups, nil),

}

這個linuxContainer類和之前的container類是不同的，這個是execdriver專有的類，其中比較主要的，ID就是containerID，initPath：是dockerinit的路徑，initArgs是docker init的參數，然后是CriuPath（用於給容器做checkpoint），cgroupMangeer：管理容器的進程所在的資源；

dockerinit要說一下，dockerinit是一個固定的二進制文件，是一個容器運行起來之后去執行的第一個可執行文件，dockerinit的作用是在新的namespace中設置掛在資源，初始化網絡棧等等，當然還有一作用是由dockerinit來負責執行用戶設定的entrypoint和cmd；執行entrypoint和cmd，執行entrypoint和cmd的時候，與dockerinit是在同一個進程中；

cont.Start(p); 通過linuxcontainer運行之前的libcontainer.Process，這個步驟稍后會詳細講解；

接下來就是常規的步驟了，調用callback函數、監控container是否會有內存溢出的問題(通過cgroupmanager)、然后p.Wait()等待libcontainer.Process執行完畢、無誤執行完畢后接着調用destroy銷毀linuxcontainer，然后返回執行狀態；

接下來對linuxcontainer的start(vendor/src/github.com/opencontainers/runc/libcontainer/container_linux.go)過程詳細介紹一下；

func (c *linuxContainer) Start(process *Process) error {

c.m.Lock()

defer c.m.Unlock()

status, err := c.currentStatus()

if err != nil {

return err

}

doInit := status == Destroyed

parent, err := c.newParentProcess(process, doInit)

if err != nil {

return newSystemError(err)

}

if err := parent.start(); err != nil {

// terminate the process to ensure that it properly is reaped.

if err := parent.terminate(); err != nil {

logrus.Warn(err)

}

return newSystemError(err)

}

process.ops = parent

if doInit {

c.updateState(parent)

}

return nil

}

這個Start()函數的作用就是開啟容器的第一個進程initProcess，docker daemon開啟一個新的容器，其實就是fork出一個新的進程（這個進程有自己的namespace，從而實現容器間的隔離），這個進程同時也是容器的初始進程，這個初始進程用來執行dockerinit、entrypoint、cmd等一系列操作；

status, err := c.currentStatus() 首先判斷一下容器的初始進程是否已經存在，不存在的話會返回destroyd狀態；

parent, err := c.newParentProcess(process, doInit) 開啟新的進程，下面插進來一下關於newParentProcess的代碼

func (c *linuxContainer) newParentProcess(p *Process, doInit bool) (parentProcess, error) {

parentPipe, childPipe, err := newPipe()

if err != nil {

return nil, newSystemError(err)

}

cmd, err := c.commandTemplate(p, childPipe)

if err != nil {

return nil, newSystemError(err)

}

if !doInit {

return c.newSetnsProcess(p, cmd, parentPipe, childPipe), nil

}

return c.newInitProcess(p, cmd, parentPipe, childPipe)

}

func (c *linuxContainer) commandTemplate(p *Process, childPipe *os.File) (*exec.Cmd, error) {

cmd := &exec.Cmd{

Path: c.initPath,

Args: c.initArgs,

}

cmd.Stdin = p.Stdin

cmd.Stdout = p.Stdout

cmd.Stderr = p.Stderr

cmd.Dir = c.config.Rootfs

if cmd.SysProcAttr == nil {

cmd.SysProcAttr = &syscall.SysProcAttr{}

}

cmd.ExtraFiles = append(p.ExtraFiles, childPipe)

cmd.Env = append(cmd.Env, fmt.Sprintf("_LIBCONTAINER_INITPIPE=%d", stdioFdCount+len(cmd.ExtraFiles)-1))

if c.config.ParentDeathSignal > 0 {

cmd.SysProcAttr.Pdeathsig = syscall.Signal(c.config.ParentDeathSignal)

}

return cmd, nil

}

上面兩個函數是相互關聯的，上面的函數調用了下面的函數，

newParentProcess中首先調用了

parentPipe, childPipe, err := newPipe() 來創建一個socket pair，形成一個管道；這個管道是docker daemon 與將來的dockerinit進行通信的渠道，上面說過dockerinit的作用是初始化新的namespace 內的一些重要資源，但這些資源是需要docker daemon 在宿主機上申請的，如：veth pair，docker daemon 在自己的命名空間中創建了這些內容之后，通過這個管道將數據交給 dockerinit

接着cmd, err := c.commandTemplate(p, childPipe)。這部分主要有兩個作用，將dockerinit及其參數分裝成go語言中的exec.Cmd類，

&exec.Cmd{

Path: c.initPath,

Args: c.initArgs,

}

這個Cmd類就是將來要真正執行的進程；其他一些事情是綁定Cmd的表述輸入輸入到libcontainer.Process（之前已經將輸入輸出綁定到container類），還有將管道的childpipe一端綁定到Cmd類的打開的文件中。

接着在newParentProcess中，返回了 newInitProcess(p, cmd, parentPipe, childPipe)，其實質是返回了一個initProcess類(vendor/src/github.com/opencontainers/runc/libcontainer/process_linux.go);

initProcess{

cmd: cmd,

childPipe: childPipe,

parentPipe: parentPipe,

manager: c.cgroupManager,

config: c.newInitConfig(p),

}

其中的cmd，就是之前封裝好的exec.Cmd類、然后childPipe已經綁定到了cmd的文件描述符中、parentPipe是pipe的另一端、manager是cgroup控制資源的作用、config是將之前的libcontainer.Process的配置（其中包括entrypoint和cmd的配置）轉化成一些配置信息，這部分配置信息將通過parentPipe發給cmd的childpipe，最終由dockerinit來運行、接下來會講到；

然后回到 Start()函數中， parent就是一個initProcess類，緊接着就是調用這個類的start()方法了

func (p *initProcess) start() error {

defer p.parentPipe.Close()

err := p.cmd.Start()

p.childPipe.Close()

if err != nil {

return newSystemError(err)

}

fds, err := getPipeFds(p.pid())

if err != nil {

return newSystemError(err)

}

p.setExternalDescriptors(fds)

if err := p.manager.Apply(p.pid()); err != nil {

return newSystemError(err)

}

defer func() {

if err != nil {

// TODO: should not be the responsibility to call here

p.manager.Destroy()

}

}()

if err := p.createNetworkInterfaces(); err != nil {

return newSystemError(err)

}

if err := p.sendConfig(); err != nil {

return newSystemError(err)

}

// wait for the child process to fully complete and receive an error message

// if one was encoutered

var ierr *genericError

if err := json.NewDecoder(p.parentPipe).Decode(&ierr); err != nil && err != io.EOF {

return newSystemError(err)

}

if ierr != nil {

return newSystemError(ierr)

}

return nil

}

最主要的幾個步驟，p.cmd.Start() 首先運行cmd的命令；

p.manager.Apply(p.pid()) cmd運行起來之后，是一個新的進程，也是container中的第一個進程，會有一個pid，將這個pid加入到cgroup配置中，確保以后由初始進程fork出來的子進程也能遵守cgroup的資源配置；

createNetworkInterfaces() 為進程建立網絡配置，並放到config配置中；

p.sendConfig() 將配置（包括網絡配置、entrypoint、cmd等）通過parentPipe發給cmd進程，並有cmd中的dockerinit執行；

json.NewDecoder(p.parentPipe).Decode(&ierr); 等待cmd的執行是否會有問題；

容器的啟動主要過程就是 docker 將container的主要配置封裝成一個Command類，然后交給execdriver（libcontainer），libcontainer將command中的配置生成一個libcontainer.process類和一個linuxcontainer類，然后由linux container這個類運行libcontainer.process。運行的過程是生成一個os.exec.Cmd類（里面包含dockerinit），啟動這個dockerinit，然后在運行entrypoint和cmd；

年前就先分析這么多了，接下來要看看swarm、kubernates、和docker 網絡相關的東西；

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 docker 源碼分析四（基於1.8.2版本），Docker鏡像的獲取和存儲 Disconf源碼分析之啟動過程分析上（1） Tomcat啟動過程源碼分析一 mysql源碼分析-啟動過程 Android 源碼分析 -- (一) Android啟動過程 workerman源碼分析之啟動過程 Tomcat源碼分析（六）----- Tomcat 啟動過程(一) SpringBoot源碼分析之SpringBoot的啟動過程 linux源碼分析（二）－啟動過程 docker學習(3) 容器的啟動過程

docker 源碼分析 六（基於1.8.2版本），Docker run啟動過程

免責聲明！

docker 源碼分析六（基於1.8.2版本），Docker run啟動過程