k8s中Controller-Manager和Scheduler的選主邏輯


K8s中的control-plane包括了apiserver、controller-manager、scheduler、etcd,當搭建高可用集群時就會涉及到部分組件的選主問題。etcd是整個集群所有狀態信息的存儲,涉及數據的讀寫和多個etcd之間數據的同步,對數據的一致性要求嚴格,所以使用較復雜的raft算法來選擇用於提交數據的主節點。而apiserver作為集群入口,本身是無狀態的web服務器,多個apiserver服務之間直接負載請求並不需要做選主。Controller-Manager和Scheduler作為任務類型的組件,比如controller-manager內置的k8s各種資源對象的控制器實時的watch apiserver獲取對象最新的變化事件做期望狀態和實際狀態調整,調度器watch未綁定節點的pod做節點選擇,顯然多個這些任務同時工作是完全沒有必要的,所以controller-manager和scheduler也是需要選主的,但是選主邏輯和etcd不一樣的,這里只需要保證從多個controller-manager和scheduler之間選出一個進入工作狀態即可,而無需考慮它們之間的數據一致和同步。

 

kube-scheduler中關於leader選擇的參數描述

/ # kube-scheduler -h 2>&1 | grep -i leader--leader-elect                                                      Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability. (default true)
      --leader-elect-lease-duration duration                              The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15s)
      --leader-elect-renew-deadline duration                              The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. (default 10s)
      --leader-elect-resource-lock endpoints                              The type of resource object that is used for locking during leader election. Supported options are endpoints (default) and `configmaps`. (default "endpoints")
      --leader-elect-retry-period duration                                The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 2s)

 

基於k8s 1.11源碼分析,Lock Resouce為Endpoint

1、調度器啟動時先選舉leader,再回調schuduler的run方法進入調度邏輯

// https://sourcegraph.com/github.com/kubernetes/kubernetes@release-1.11/-/blob/cmd/kube-scheduler/app/server.go

func Run(c schedulerserverconfig.CompletedConfig, stopCh <-chan struct{}) error {
......
// Prepare a reusable run function.
    run := func(stopCh <-chan struct{}) {
        sched.Run()
        <-stopCh
    }

    // If leader election is enabled, run via LeaderElector until done and exit.
    if c.LeaderElection != nil {
        c.LeaderElection.Callbacks = leaderelection.LeaderCallbacks{
            OnStartedLeading: run,
            OnStoppedLeading: func() {
                utilruntime.HandleError(fmt.Errorf("lost master"))
            },
        }
        leaderElector, err := leaderelection.NewLeaderElector(*c.LeaderElection)
        leaderElector.Run()
}
}

 

2、直接調用Acquire方法來嘗試競選為leader

// Run starts the leader election loop
func (le *LeaderElector) Run() {
    defer func() {
        runtime.HandleCrash()
        le.config.Callbacks.OnStoppedLeading()
    }()
    le.acquire()
    stop := make(chan struct{})
    go le.config.Callbacks.OnStartedLeading(stop)
    le.renew()
    close(stop)
}

 

3、Acquire方法以leader-elect-retry-period指定的時間為間隔,循環調用TryAcquireOrRenew方法,其中的le.config.Lock類型為EndpointsLock,EndpointsLock.Identity()方法返回自己的主機名,EndpointsLock.Get方法請求apiServer獲取保存在etcd中的選舉記錄。

如果從apiserver獲取ep選舉記錄對象失敗,則嘗試自己作為leader

以自己觀察到的observe時間來看,如果租約(15s)未到,並且自己不是leader,不能去搶占為leader,所以就沒有其他可以做的了

如果當前自己就是leader,不管租約是否到期,都以當前時間嘗試續約,競選時間acquireTime保持、leader切換次數保持,否則切換次數加1

向apiserver發送更新ep選舉記錄對象的請求,由apiserver來保證多個客戶端的原子更新操作,通過對比resourceVersion版本號(對應etcd中的modifiedindex編號),保證只有一個client能修改成功,其余的返回409

Lock被初始化為EndpointsLock
type EndpointsLock struct {
    // EndpointsMeta should contain a Name and a Namespace of an
    // Endpoints object that the LeaderElector will attempt to lead.
    EndpointsMeta metav1.ObjectMeta
    Client        corev1client.EndpointsGetter
    LockConfig    ResourceLockConfig
    e             *v1.Endpoints
}

// Get returns the election record from a Endpoints Annotation
func (el *EndpointsLock) Get() (*LeaderElectionRecord, error) {
    var record LeaderElectionRecord
    el.e, err = el.Client.Endpoints(el.EndpointsMeta.Namespace).Get(el.EndpointsMeta.Name, metav1.GetOptions{})
    if recordBytes, found := el.e.Annotations[LeaderElectionRecordAnnotationKey]; found {
        if err := json.Unmarshal([]byte(recordBytes), &record); err != nil {
            return nil, err
        }
    }
    return &record, nil
}

//如果自己不是leader,嘗試競選為leader,如果自己就是leader,嘗試renew續租
// tryAcquireOrRenew tries to acquire a leader lease if it is not already acquired,
// else it tries to renew the lease if it has already been acquired. Returns true
// on success else returns false.
func (le *LeaderElector) tryAcquireOrRenew() bool {
    now := metav1.Now()
    // 這個Identity()返回的就是自己的hostname + "_" + string(uuid.NewUUID())
// 初始化一個leader是自己的leaderElectionRecord對象,為自己acquire成功時准備 leaderElectionRecord := rl.LeaderElectionRecord{ HolderIdentity: le.config.Lock.Identity(), LeaseDurationSeconds: int(le.config.LeaseDuration / time.Second), RenewTime: now, AcquireTime: now, } // 1. obtain or create the ElectionRecord oldLeaderElectionRecord, err := le.config.Lock.Get()
// 如果從apiserver獲取ep失敗,則嘗試自己作為leader
if err != nil { le.observedRecord = leaderElectionRecord le.observedTime = le.clock.Now() return true } // 2. Record obtained, check the Identity & Time
// apiServer中的leader對象和自己記錄的不一樣,更新自己的記錄 if !reflect.DeepEqual(le.observedRecord, *oldLeaderElectionRecord) { le.observedRecord = *oldLeaderElectionRecord le.observedTime = le.clock.Now() }

//以自己觀察到的observe時間來看,如果租約(15s)未到,並且自己不是leader,那么自己沒有其他可以做的了
if le.observedTime.Add(le.config.LeaseDuration).After(now.Time) && oldLeaderElectionRecord.HolderIdentity != le.config.Lock.Identity() { return false } // 3. We're going to try to update. The leaderElectionRecord is set to it's default // here. Let's correct it before updating.
// 走到這里可能:1、自己不是leader,但是租約到期了 2、自己是leader,但租約沒有到期 3、自己是leader,但是租約到期
// 如果當前自己就是leader,即對應2、3,不管租約是否到期,都以當前時間嘗試續約,競選時間acquireTime保持、leader切換次數保持,否則切換次數加1
if oldLeaderElectionRecord.HolderIdentity == le.config.Lock.Identity() { leaderElectionRecord.AcquireTime = oldLeaderElectionRecord.AcquireTime leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions } else { leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions + 1 } // update the lock itself
// 向apiserver發送更新ep的請求,由apiserver來保證多個客戶端的原子更新操作,其resourceVersion版本號機制保證只有一個client能修改成功
if err = le.config.Lock.Update(leaderElectionRecord); err != nil { glog.Errorf("Failed to update lock: %v", err) return false } le.observedRecord = leaderElectionRecord le.observedTime = le.clock.Now() return true }

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM