一文讀懂 SuperEdge 分布式健康檢查(雲端)


杜楊浩,騰訊雲高級工程師,熱衷於開源、容器和Kubernetes。目前主要從事鏡像倉庫、Kubernetes集群高可用&備份還原,以及邊緣計算相關研發工作。

前言

SuperEdge分布式健康檢查功能由邊端的edge-health-daemon以及雲端的edge-health-admission組成:

  • edge-health-daemon:對同區域邊緣節點執行分布式健康檢查,並向apiserver發送健康狀態投票結果(給node打annotation)
  • edge-health-admission:不斷根據node edge-health annotation調整kube-controller-manager設置的node taint(去掉NoExecute taint)以及endpoints(將失聯節點上的pods從endpoint subsets notReadyAddresses移到addresses中),從而實現雲端和邊端共同決定節點狀態

整體架構如下所示:

img

之所以創建edge-health-admission雲端組件,是因為當雲邊斷連時,kube-controller-manager會執行如下操作:

  • 失聯的節點被置為ConditionUnknown狀態,並被添加NoSchedule和NoExecute的taints
  • 失聯的節點上的pod從Service的Endpoint列表中移除

當edge-health-daemon在邊端根據健康檢查判斷節點狀態正常時,會更新node:去掉NoExecute taint。但是在node成功更新之后又會被kube-controller-manager給刷回去(再次添加NoExecute taint),因此必須添加Kubernetes mutating admission webhook也即edge-health-admission,將kube-controller-manager對node api resource的更改做調整,最終實現分布式健康檢查效果

在深入源碼之前先介紹一下Kubernetes Admission Controllers

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. The controllers consist of the list below, are compiled into the kube-apiserver binary, and may only be configured by the cluster administrator. In that list, there are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. These execute the mutating and validating (respectively) admission control webhooks which are configured in the API.

Kubernetes Admission Controllers是kube-apiserver處理api請求的某個環節,用於在api請求認證&鑒權之后,對象持久化之前進行調用,對請求進行校驗或者修改(or both)

Kubernetes Admission Controllers包括多種admission,大多數都內嵌在kube-apiserver代碼中了。其中MutatingAdmissionWebhook以及ValidatingAdmissionWebhook controller比較特殊,它們分別會調用外部構造的mutating admission control webhooks以及validating admission control webhooks

Admission webhooks are HTTP callbacks that receive admission requests and do something with them. You can define two types of admission webhooks, validating admission webhook and mutating admission webhook. Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults. After all object modifications are complete, and after the incoming object is validated by the API server, validating admission webhooks are invoked and can reject requests to enforce custom policies.

Admission Webhooks是一個HTTP回調服務,接受AdmissionReview請求並進行處理,按照處理方式的不同,可以將Admission Webhooks分類如下:

  • validating admission webhook:通過ValidatingWebhookConfiguration配置,會對api請求進行准入校驗,但是不能修改請求對象
  • mutating admission webhook:通過MutatingWebhookConfiguration配置,會對api請求進行准入校驗以及修改請求對象

兩種類型的webhooks都需要定義如下Matching requests字段:

  • admissionReviewVersions:定義了apiserver所支持的AdmissionReview api resoure的版本列表(API servers send the first AdmissionReview version in the admissionReviewVersions list they support)
  • name:webhook名稱(如果一個WebhookConfiguration中定義了多個webhooks,需要保證名稱的唯一性)
  • clientConfig:定義了webhook server的訪問地址(url or service)以及CA bundle(optionally include a custom CA bundle to use to verify the TLS connection)
  • namespaceSelector:限定了匹配請求資源的命名空間labelSelector
  • objectSelector:限定了匹配請求資源本身的labelSelector
  • rules:限定了匹配請求的operations,apiGroups,apiVersions,resources以及resource scope,如下:
    • operations:規定了請求操作列表(Can be "CREATE", "UPDATE", "DELETE", "CONNECT", or "*" to match all.)
    • apiGroups:規定了請求資源的API groups列表("" is the core API group. "*" matches all API groups.)
    • apiVersions:規定了請求資源的API versions列表("*" matches all API versions.)
    • resources:規定了請求資源類型(node, deployment and etc)
    • scope:規定了請求資源的范圍(Cluster,Namespaced or *)
  • timeoutSeconds:規定了webhook回應的超時時間,如果超時了,根據failurePolicy進行處理
  • failurePolicy:規定了apiserver對admission webhook請求失敗的處理策略:
    • Ignore:means that an error calling the webhook is ignored and the API request is allowed to continue.
    • Fail:means that an error calling the webhook causes the admission to fail and the API request to be rejected.
  • matchPolicy:規定了rules如何匹配到來的api請求,如下:
    • Exact:完全匹配rules列表限制
    • Equivalent:如果修改請求資源(apiserver可以實現對象在不同版本的轉化)可以轉化為能夠配置rules列表限制,則認為該請求匹配,可以發送給admission webhook
  • reinvocationPolicy:In v1.15+, to allow mutating admission plugins to observe changes made by other plugins, built-in mutating admission plugins are re-run if a mutating webhook modifies an object, and mutating webhooks can specify a reinvocationPolicy to control whether they are reinvoked as well.
    • Never: the webhook must not be called more than once in a single admission evaluation
    • IfNeeded: the webhook may be called again as part of the admission evaluation if the object being admitted is modified by other admission plugins after the initial webhook call.
  • Side effects:某些webhooks除了修改AdmissionReview的內容外,還會連帶修改其它的資源("side effects")。而sideEffects指示了Webhooks是否具有"side effects",取值如下:
    • None: calling the webhook will have no side effects.
    • NoneOnDryRun: calling the webhook will possibly have side effects, but if a request with dryRun: true is sent to the webhook, the webhook will suppress the side effects (the webhook is dryRun-aware).

這里給出edge-health-admission對應的MutatingWebhookConfiguration作為參考示例:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: edge-health-admission
webhooks:
  - admissionReviewVersions:
      - v1
    clientConfig:
      caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNwRENDQVl3Q0NRQ2RaL0w2akZSSkdqQU5CZ2txaGtpRzl3MEJBUXNGQURBVU1SSXdFQVlEVlFRRERBbFgKYVhObE1tTWdRMEV3SGhjTk1qQXdOekU0TURRek9ERTNXaGNOTkRjeE1qQTBNRFF6T0RFM1dqQVVNUkl3RUFZRApWUVFEREFsWGFYTmxNbU1nUTBFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUNSCnhHT2hrODlvVkRHZklyVDBrYVkwajdJQVJGZ2NlVVFmVldSZVhVcjh5eEVOQkF6ZnJNVVZyOWlCNmEwR2VFL3cKZzdVdW8vQWtwUEgrbzNQNjFxdWYrTkg1UDBEWHBUd1pmWU56VWtyaUVja3FOSkYzL2liV0o1WGpFZUZSZWpidgpST1V1VEZabmNWOVRaeTJISVF2UzhTRzRBTWJHVmptQXlDMStLODBKdDI3QUl4YmdndmVVTW8xWFNHYnRxOXlJCmM3Zk1QTXJMSHhaOUl5aTZla3BwMnJrNVdpeU5YbXZhSVA4SmZMaEdnTU56YlJaS1RtL0ZKdDdyV0dhQ1orNXgKV0kxRGJYQ2MyWWhmbThqU1BqZ3NNQTlaNURONDU5ellJSkVhSTFHeFI3MlhaUVFMTm8zdE5jd3IzVlQxVlpiTgo1cmhHQlVaTFlrMERtd25vWTBCekFnTUJBQUV3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQUhuUDJibnJBcWlWCjYzWkpMVzM0UWFDMnRreVFScTNVSUtWR3RVZHFobWRVQ0I1SXRoSUlleUdVRVdqVExpc3BDQzVZRHh4YVdrQjUKTUxTYTlUY0s3SkNOdkdJQUdQSDlILzRaeXRIRW10aFhiR1hJQ3FEVUVmSUVwVy9ObUgvcnBPQUxhYlRvSUVzeQpVNWZPUy9PVVZUM3ZoSldlRjdPblpIOWpnYk1SZG9zVElhaHdQdTEzZEtZMi8zcEtxRW1Cd1JkbXBvTExGbW9MCmVTUFQ4SjREZExGRkh2QWJKalFVbjhKQTZjOHUrMzZJZDIrWE1sTGRZYTdnTnhvZTExQTl6eFJQczRXdlpiMnQKUXZpbHZTbkFWb0ZUSVozSlpjRXVWQXllNFNRY1dKc3FLMlM0UER1VkNFdlg0SmRCRlA2NFhvU08zM3pXaWhtLworMXg3OXZHMUpFcz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
      service:
        namespace: kube-system
        name: edge-health-admission
        path: /node-taint
    failurePolicy: Ignore
    matchPolicy: Exact
    name: node-taint.k8s.io
    namespaceSelector: {}
    objectSelector: {}
    reinvocationPolicy: Never
    rules:
      - apiGroups:
          - '*'
        apiVersions:
          - '*'
        operations:
          - UPDATE
        resources:
          - nodes
        scope: '*'
    sideEffects: None
    timeoutSeconds: 5
  - admissionReviewVersions:
      - v1
    clientConfig:
      caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNwRENDQVl3Q0NRQ2RaL0w2akZSSkdqQU5CZ2txaGtpRzl3MEJBUXNGQURBVU1SSXdFQVlEVlFRRERBbFgKYVhObE1tTWdRMEV3SGhjTk1qQXdOekU0TURRek9ERTNXaGNOTkRjeE1qQTBNRFF6T0RFM1dqQVVNUkl3RUFZRApWUVFEREFsWGFYTmxNbU1nUTBFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUNSCnhHT2hrODlvVkRHZklyVDBrYVkwajdJQVJGZ2NlVVFmVldSZVhVcjh5eEVOQkF6ZnJNVVZyOWlCNmEwR2VFL3cKZzdVdW8vQWtwUEgrbzNQNjFxdWYrTkg1UDBEWHBUd1pmWU56VWtyaUVja3FOSkYzL2liV0o1WGpFZUZSZWpidgpST1V1VEZabmNWOVRaeTJISVF2UzhTRzRBTWJHVmptQXlDMStLODBKdDI3QUl4YmdndmVVTW8xWFNHYnRxOXlJCmM3Zk1QTXJMSHhaOUl5aTZla3BwMnJrNVdpeU5YbXZhSVA4SmZMaEdnTU56YlJaS1RtL0ZKdDdyV0dhQ1orNXgKV0kxRGJYQ2MyWWhmbThqU1BqZ3NNQTlaNURONDU5ellJSkVhSTFHeFI3MlhaUVFMTm8zdE5jd3IzVlQxVlpiTgo1cmhHQlVaTFlrMERtd25vWTBCekFnTUJBQUV3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQUhuUDJibnJBcWlWCjYzWkpMVzM0UWFDMnRreVFScTNVSUtWR3RVZHFobWRVQ0I1SXRoSUlleUdVRVdqVExpc3BDQzVZRHh4YVdrQjUKTUxTYTlUY0s3SkNOdkdJQUdQSDlILzRaeXRIRW10aFhiR1hJQ3FEVUVmSUVwVy9ObUgvcnBPQUxhYlRvSUVzeQpVNWZPUy9PVVZUM3ZoSldlRjdPblpIOWpnYk1SZG9zVElhaHdQdTEzZEtZMi8zcEtxRW1Cd1JkbXBvTExGbW9MCmVTUFQ4SjREZExGRkh2QWJKalFVbjhKQTZjOHUrMzZJZDIrWE1sTGRZYTdnTnhvZTExQTl6eFJQczRXdlpiMnQKUXZpbHZTbkFWb0ZUSVozSlpjRXVWQXllNFNRY1dKc3FLMlM0UER1VkNFdlg0SmRCRlA2NFhvU08zM3pXaWhtLworMXg3OXZHMUpFcz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
      service:
        namespace: kube-system
        name: edge-health-admission
        path: /endpoint
    failurePolicy: Ignore
    matchPolicy: Exact
    name: endpoint.k8s.io
    namespaceSelector: {}
    objectSelector: {}
    reinvocationPolicy: Never
    rules:
      - apiGroups:
          - '*'
        apiVersions:
          - '*'
        operations:
          - UPDATE
        resources:
          - endpoints
        scope: '*'
    sideEffects: None
    timeoutSeconds: 5

kube-apiserver會發送AdmissionReview(apiGroup: admission.k8s.io,apiVersion:v1 or v1beta1)給Webhooks,並封裝成JSON格式,示例如下:

# This example shows the data contained in an AdmissionReview object for a request to update the scale subresource of an apps/v1 Deployment
{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "request": {
    # Random uid uniquely identifying this admission call
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    # Fully-qualified group/version/kind of the incoming object
    "kind": {"group":"autoscaling","version":"v1","kind":"Scale"},
    # Fully-qualified group/version/kind of the resource being modified
    "resource": {"group":"apps","version":"v1","resource":"deployments"},
    # subresource, if the request is to a subresource
    "subResource": "scale",
    # Fully-qualified group/version/kind of the incoming object in the original request to the API server.
    # This only differs from `kind` if the webhook specified `matchPolicy: Equivalent` and the
    # original request to the API server was converted to a version the webhook registered for.
    "requestKind": {"group":"autoscaling","version":"v1","kind":"Scale"},
    # Fully-qualified group/version/kind of the resource being modified in the original request to the API server.
    # This only differs from `resource` if the webhook specified `matchPolicy: Equivalent` and the
    # original request to the API server was converted to a version the webhook registered for.
    "requestResource": {"group":"apps","version":"v1","resource":"deployments"},
    # subresource, if the request is to a subresource
    # This only differs from `subResource` if the webhook specified `matchPolicy: Equivalent` and the
    # original request to the API server was converted to a version the webhook registered for.
    "requestSubResource": "scale",
    # Name of the resource being modified
    "name": "my-deployment",
    # Namespace of the resource being modified, if the resource is namespaced (or is a Namespace object)
    "namespace": "my-namespace",
    # operation can be CREATE, UPDATE, DELETE, or CONNECT
    "operation": "UPDATE",
    "userInfo": {
      # Username of the authenticated user making the request to the API server
      "username": "admin",
      # UID of the authenticated user making the request to the API server
      "uid": "014fbff9a07c",
      # Group memberships of the authenticated user making the request to the API server
      "groups": ["system:authenticated","my-admin-group"],
      # Arbitrary extra info associated with the user making the request to the API server.
      # This is populated by the API server authentication layer and should be included
      # if any SubjectAccessReview checks are performed by the webhook.
      "extra": {
        "some-key":["some-value1", "some-value2"]
      }
    },
    # object is the new object being admitted.
    # It is null for DELETE operations.
    "object": {"apiVersion":"autoscaling/v1","kind":"Scale",...},
    # oldObject is the existing object.
    # It is null for CREATE and CONNECT operations.
    "oldObject": {"apiVersion":"autoscaling/v1","kind":"Scale",...},
    # options contains the options for the operation being admitted, like meta.k8s.io/v1 CreateOptions, UpdateOptions, or DeleteOptions.
    # It is null for CONNECT operations.
    "options": {"apiVersion":"meta.k8s.io/v1","kind":"UpdateOptions",...},
    # dryRun indicates the API request is running in dry run mode and will not be persisted.
    # Webhooks with side effects should avoid actuating those side effects when dryRun is true.
    # See http://k8s.io/docs/reference/using-api/api-concepts/#make-a-dry-run-request for more details.
    "dryRun": false
  }
}

而Webhooks需要向kube-apiserver回應具有相同版本的AdmissionReview,並封裝成JSON格式,包含如下關鍵字段:

  • uid:拷貝發送給webhooks的AdmissionReview request.uid字段
  • allowed:true表示准許;false表示不准許
  • status:當不准許請求時,可以通過status給出相關原因(http code and message)
  • patch:base64編碼,包含mutating admission webhook對請求對象的一系列JSON patch操作
  • patchType:目前只支持JSONPatch類型

示例如下:

# a webhook response to add that label would be:
{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "<value from request.uid>",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiAiYWRkIiwgInBhdGgiOiAiL3NwZWMvcmVwbGljYXMiLCAidmFsdWUiOiAzfV0="
  }
}

edge-health-admission實際上就是一個mutating admission webhook,選擇性地對endpoints以及node UPDATE請求進行修改,下面將詳細分析其原理

edge-health-admission源碼分析

edge-health-admission完全參考官方示例編寫,如下是監聽入口:

func (eha *EdgeHealthAdmission) Run(stopCh <-chan struct{}) {
    if !cache.WaitForNamedCacheSync("edge-health-admission", stopCh, eha.cfg.NodeInformer.Informer().HasSynced) {
        return
    }
    http.HandleFunc("/node-taint", eha.serveNodeTaint)
    http.HandleFunc("/endpoint", eha.serveEndpoint)
    server := &http.Server{
        Addr: eha.cfg.Addr,
    }
    go func() {
        if err := server.ListenAndServeTLS(eha.cfg.CertFile, eha.cfg.KeyFile); err != http.ErrServerClosed {
            klog.Fatalf("ListenAndServeTLS err %+v", err)
        }
    }()
    for {
        select {
        case <-stopCh:
            ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
            defer cancel()
            if err := server.Shutdown(ctx); err != nil {
                klog.Errorf("Server: program exit, server exit error %+v", err)
            }
            return
        default:
        }
    }
}

這里會注冊兩種路由處理函數:

  • node-taint:對應處理函數serveNodeTaint,負責對node UPDATE請求進行更改
  • endpoint:對應處理函數serveEndpoint,負責對endpoints UPDATE請求進行更改

而這兩個函數都會調用serve函數,如下:

// serve handles the http portion of a request prior to handing to an admit function
func serve(w http.ResponseWriter, r *http.Request, admit admitFunc) {
    var body []byte
    if r.Body != nil {
        if data, err := ioutil.ReadAll(r.Body); err == nil {
            body = data
        }
    }
    // verify the content type is accurate
    contentType := r.Header.Get("Content-Type")
    if contentType != "application/json" {
        klog.Errorf("contentType=%s, expect application/json", contentType)
        return
    }
    klog.V(4).Info(fmt.Sprintf("handling request: %s", body))
    // The AdmissionReview that was sent to the webhook
    requestedAdmissionReview := admissionv1.AdmissionReview{}
    // The AdmissionReview that will be returned
    responseAdmissionReview := admissionv1.AdmissionReview{}
    deserializer := codecs.UniversalDeserializer()
    if _, _, err := deserializer.Decode(body, nil, &requestedAdmissionReview); err != nil {
        klog.Error(err)
        responseAdmissionReview.Response = toAdmissionResponse(err)
    } else {
        // pass to admitFunc
        responseAdmissionReview.Response = admit(requestedAdmissionReview)
    }
    // Return the same UID
    responseAdmissionReview.Response.UID = requestedAdmissionReview.Request.UID
    klog.V(4).Info(fmt.Sprintf("sending response: %+v", responseAdmissionReview.Response))
    respBytes, err := json.Marshal(responseAdmissionReview)
    if err != nil {
        klog.Error(err)
    }
    if _, err := w.Write(respBytes); err != nil {
        klog.Error(err)
    }
}

serve邏輯如下所示:

  • 解析request.Body為AdmissionReview對象,並賦值給requestedAdmissionReview
  • 對AdmissionReview對象執行admit函數,並賦值給回responseAdmissionReview
  • 設置responseAdmissionReview.Response.UID為請求的AdmissionReview.Request.UID

其中serveNodeTaint以及serveEndpoint對應的admit函數分別為:mutateNodeTaint以及mutateEndpoint,下面依次分析:

1、mutateNodeTaint

mutateNodeTaint會對node UPDATE請求按照分布式健康檢查結果進行修改:

func (eha *EdgeHealthAdmission) mutateNodeTaint(ar admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {
    klog.V(4).Info("mutating node taint")
    nodeResource := metav1.GroupVersionResource{Group: "", Version: "v1", Resource: "nodes"}
    if ar.Request.Resource != nodeResource {
        klog.Errorf("expect resource to be %s", nodeResource)
        return nil
    }
    var node corev1.Node
    deserializer := codecs.UniversalDeserializer()
    if _, _, err := deserializer.Decode(ar.Request.Object.Raw, nil, &node); err != nil {
        klog.Error(err)
        return toAdmissionResponse(err)
    }
    reviewResponse := admissionv1.AdmissionResponse{}
    reviewResponse.Allowed = true
    if index, condition := util.GetNodeCondition(&node.Status, v1.NodeReady); index != -1 && condition.Status == v1.ConditionUnknown {
        if node.Annotations != nil {
            var patches []*patch
            if healthy, existed := node.Annotations[common.NodeHealthAnnotation]; existed && healthy == common.NodeHealthAnnotationPros {
                if index, existed := util.TaintExistsPosition(node.Spec.Taints, common.UnreachableNoExecuteTaint); existed {
                    patches = append(patches, &patch{
                        OP:   "remove",
                        Path: fmt.Sprintf("/spec/taints/%d", index),
                    })
                    klog.V(4).Infof("UnreachableNoExecuteTaint: remove %d taints %s", index, node.Spec.Taints[index])
                }
            }
            if len(patches) > 0 {
                patchBytes, _ := json.Marshal(patches)
                reviewResponse.Patch = patchBytes
                pt := admissionv1.PatchTypeJSONPatch
                reviewResponse.PatchType = &pt
            }
        }
    }
    return &reviewResponse
}

主體邏輯如下:

  • 檢查AdmissionReview.Request.Resource是否為node資源的group/version/kind
  • 將AdmissionReview.Request.Object.Raw轉化為node對象
  • 設置AdmissionReview.Response.Allowed為true,表示無論如何都准許該請求
  • 執行協助邊端健康檢查核心邏輯:在節點處於ConditionUnknown狀態且分布式健康檢查結果為正常的情況下,若節點存在NoExecute(node.kubernetes.io/unreachable) taint,則將其移除

總的來說,mutateNodeTaint的作用就是:不斷修正被kube-controller-manager更新的節點狀態,去掉NoExecute(node.kubernetes.io/unreachable) taint,讓節點不會被驅逐

2、mutateEndpoint

mutateEndpoint會對endpoints UPDATE請求按照分布式健康檢查結果進行修改:

func (eha *EdgeHealthAdmission) mutateEndpoint(ar admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {
    klog.V(4).Info("mutating endpoint")
    endpointResource := metav1.GroupVersionResource{Group: "", Version: "v1", Resource: "endpoints"}
    if ar.Request.Resource != endpointResource {
        klog.Errorf("expect resource to be %s", endpointResource)
        return nil
    }
    var endpoint corev1.Endpoints
    deserializer := codecs.UniversalDeserializer()
    if _, _, err := deserializer.Decode(ar.Request.Object.Raw, nil, &endpoint); err != nil {
        klog.Error(err)
        return toAdmissionResponse(err)
    }
    reviewResponse := admissionv1.AdmissionResponse{}
    reviewResponse.Allowed = true
    for epSubsetIndex, epSubset := range endpoint.Subsets {
        for notReadyAddrIndex, EndpointAddress := range epSubset.NotReadyAddresses {
            if node, err := eha.nodeLister.Get(*EndpointAddress.NodeName); err == nil {
                if index, condition := util.GetNodeCondition(&node.Status, v1.NodeReady); index != -1 && condition.Status == v1.ConditionUnknown {
                    if node.Annotations != nil {
                        var patches []*patch
                        if healthy, existed := node.Annotations[common.NodeHealthAnnotation]; existed && healthy == common.NodeHealthAnnotationPros {
                            // TODO: handle readiness probes failure
                            // Remove address on node from endpoint notReadyAddresses
                            patches = append(patches, &patch{
                                OP:   "remove",
                                Path: fmt.Sprintf("/subsets/%d/notReadyAddresses/%d", epSubsetIndex, notReadyAddrIndex),
                            })
                            // Add address on node to endpoint readyAddresses
                            TargetRef := map[string]interface{}{}
                            TargetRef["kind"] = EndpointAddress.TargetRef.Kind
                            TargetRef["namespace"] = EndpointAddress.TargetRef.Namespace
                            TargetRef["name"] = EndpointAddress.TargetRef.Name
                            TargetRef["uid"] = EndpointAddress.TargetRef.UID
                            TargetRef["apiVersion"] = EndpointAddress.TargetRef.APIVersion
                            TargetRef["resourceVersion"] = EndpointAddress.TargetRef.ResourceVersion
                            TargetRef["fieldPath"] = EndpointAddress.TargetRef.FieldPath
                            patches = append(patches, &patch{
                                OP:   "add",
                                Path: fmt.Sprintf("/subsets/%d/addresses/0", epSubsetIndex),
                                Value: map[string]interface{}{
                                    "ip":        EndpointAddress.IP,
                                    "hostname":  EndpointAddress.Hostname,
                                    "nodeName":  EndpointAddress.NodeName,
                                    "targetRef": TargetRef,
                                },
                            })
                            if len(patches) != 0 {
                                patchBytes, _ := json.Marshal(patches)
                                reviewResponse.Patch = patchBytes
                                pt := admissionv1.PatchTypeJSONPatch
                                reviewResponse.PatchType = &pt
                            }
                        }
                    }
                }
            } else {
                klog.Errorf("Get pod's node err %+v", err)
            }
        }
    }
    return &reviewResponse
}

主體邏輯如下:

  • 檢查AdmissionReview.Request.Resource是否為endpoints資源的group/version/kind
  • 將AdmissionReview.Request.Object.Raw轉化為endpoints對象
  • 設置AdmissionReview.Response.Allowed為true,表示無論如何都准許該請求
  • 遍歷endpoints.Subset.NotReadyAddresses,如果EndpointAddress所在節點處於ConditionUnknown狀態且分布式健康檢查結果為正常,則將該EndpointAddress從endpoints.Subset.NotReadyAddresses移到endpoints.Subset.Addresses

總的來說,mutateEndpoint的作用就是:不斷修正被kube-controller-manager更新的endpoints狀態,將分布式健康檢查正常節點上的負載從endpoints.Subset.NotReadyAddresses移到endpoints.Subset.Addresses中,讓服務依舊可用

總結

  • SuperEdge分布式健康檢查功能由邊端的edge-health-daemon以及雲端的edge-health-admission組成:
    • edge-health-daemon:對同區域邊緣節點執行分布式健康檢查,並向apiserver發送健康狀態投票結果(給node打annotation)
    • edge-health-admission:不斷根據node edge-health annotation調整kube-controller-manager設置的node taint(去掉NoExecute taint)以及endpoints(將失聯節點上的pods從endpoint subsets notReadyAddresses移到addresses中),從而實現雲端和邊端共同決定節點狀態
  • 之所以創建edge-health-admission雲端組件,是因為當雲邊斷連時,kube-controller-manager會將失聯的節點置為ConditionUnknown狀態,並添加NoSchedule和NoExecute的taints;同時失聯的節點上的pod從Service的Endpoint列表中移除。當edge-health-daemon在邊端根據健康檢查判斷節點狀態正常時,會更新node:去掉NoExecute taint。但是在node成功更新之后又會被kube-controller-manager給刷回去(再次添加NoExecute taint),因此必須添加Kubernetes mutating admission webhook也即edge-health-admission,將kube-controller-manager對node api resource的更改做調整,最終實現分布式健康檢查效果
  • Kubernetes Admission Controllers是kube-apiserver處理api請求的某個環節,用於在api請求認證&鑒權之后,對象持久化之前進行調用,對請求進行校驗或者修改(or both);包括多種admission,大多數都內嵌在kube-apiserver代碼中了。其中MutatingAdmissionWebhook以及ValidatingAdmissionWebhook controller比較特殊,它們分別會調用外部構造的mutating admission control webhooks以及validating admission control webhooks
  • Admission Webhooks是一個HTTP回調服務,接受AdmissionReview請求並進行處理,按照處理方式的不同,可以將Admission Webhooks分類如下:
    • validating admission webhook:通過ValidatingWebhookConfiguration配置,會對api請求進行准入校驗,但是不能修改請求對象
    • mutating admission webhook:通過MutatingWebhookConfiguration配置,會對api請求進行准入校驗以及修改請求對象
  • kube-apiserver會發送AdmissionReview(apiGroup: admission.k8s.io,apiVersion:v1 or v1beta1)給Webhooks,並封裝成JSON格式;而Webhooks需要向kube-apiserver回應具有相同版本的AdmissionReview,並封裝成JSON格式,包含如下關鍵字段:
    • uid:拷貝發送給webhooks的AdmissionReview request.uid字段
    • allowed:true表示准許;false表示不准許
    • status:當不准許請求時,可以通過status給出相關原因(http code and message)
    • patch:base64編碼,包含mutating admission webhook對請求對象的一系列JSON patch操作
    • patchType:目前只支持JSONPatch類型
  • edge-health-admission實際上就是一個mutating admission webhook,選擇性地對endpoints以及node UPDATE請求進行修改,包含如下處理邏輯:
    • mutateNodeTaint:不斷修正被kube-controller-manager更新的節點狀態,去掉NoExecute(node.kubernetes.io/unreachable) taint,讓節點不會被驅逐
    • mutateEndpoint:不斷修正被kube-controller-manager更新的endpoints狀態,將分布式健康檢查正常節點上的負載從endpoints.Subset.NotReadyAddresses移到endpoints.Subset.Addresses中,讓服務依舊可用


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM