kubernetes彈性伸縮

本文轉載自查看原文 2020-05-28 14:41 1360 kubernetes

在 Kubernetes 的生態中，在多個維度、多個層次提供了不同的組件來滿足不同的伸縮場景。

有三種彈性伸縮：

CA（Cluster Autoscaler）：Node級別自動擴/縮容 cluster-autoscaler組件
HPA（Horizontal Pod Autoscaler）：Pod個數自動擴/縮容
VPA（Vertical Pod Autoscaler）：Pod配置自動擴/縮容，主要是CPU、內存 addon-resizer組件

如果在雲上建議 HPA 結合 cluster-autoscaler 的方式進行集群的彈性伸縮管理。

node自動擴容縮容

擴容：Cluster AutoScaler 定期檢測是否有充足的資源來調度新創建的 Pod，當資源不足時會調用 Cloud Provider 創建新的 Node。

縮容：Cluster AutoScaler 也會定期監測 Node 的資源使用情況，當一個 Node 長時間資源利用率都很低時（低於 50%）自動將其所在虛擬機從雲服務商中刪除。此時，原來的 Pod 會自動調度到其他 Node 上面。

支持的雲提供商：

ansiable擴容node流程

1. 觸發新增Node
2. 調用Ansible腳本部署組件
3. 檢查服務是否可用
4. 調用API將新Node加入集群或者啟用Node自動加入
5. 觀察新Node狀態
6. 完成Node擴容，接收新Pod

node縮容流程：

#獲取節點列表
kubectl get node

#設置不可調度
kubectl cordon $node_name

#驅逐節點上的pod
kubectl drain $node_name --ignore-daemonsets

#移除節點
kubectl delete node $node_name

POD自動擴容縮容（HPA）

Horizontal Pod Autoscaler（HPA，Pod水平自動伸縮），根據資源利用率或者自定義指標自動調整replication controller, deployment 或 replica set，實現部署的自動擴展和縮減，讓部署的規模接近於實際服務的負載。HPA不適於無法縮放的對象，例如DaemonSet。

HPA基本原理

Kubernetes 中的 Metrics Server 持續采集所有 Pod 副本的指標數據。HPA 控制器通過 Metrics Server 的 API（Heapster 的 API 或聚合 API）獲取這些數據，基於用戶定義的擴縮容規則進行計算，得到目標 Pod 副本數量。當目標 Pod 副本數量與當前副本數量不同時，HPA 控制器就向 Pod 的副本控制器（Deployment、RC 或 ReplicaSet）發起 scale 操作，調整 Pod 的副本數量，完成擴縮容操作。如圖所示。

在彈性伸縮中，冷卻周期是不能逃避的一個話題，由於評估的度量標准是動態特性，副本的數量可能會不斷波動。有時被稱為顛簸，所以在每次做出擴容縮容后，冷卻時間是多少。

在 HPA 中，默認的擴容冷卻周期是 3 分鍾，縮容冷卻周期是 5 分鍾。

可以通過調整kube-controller-manager組件啟動參數設置冷卻時間：

--horizontal-pod-autoscaler-downscale-delay ：擴容冷卻
--horizontal-pod-autoscaler-upscale-delay ：縮容冷卻

HPA的演進歷程：

目前 HPA 已經支持了 autoscaling/v1、autoscaling/v2beta1和autoscaling/v2beta2 三個大版本。

目前大多數人比較熟悉是autoscaling/v1，這個版本只支持CPU一個指標的彈性伸縮。

而autoscaling/v2beta1增加了支持自定義指標，autoscaling/v2beta2又額外增加了外部指標支持。

而產生這些變化不得不提的是Kubernetes社區對監控與監控指標的認識與轉變。從早期Heapster到Metrics Server再到將指標邊界進行划分，一直在豐富監控生態。

示例：

#v1版本：
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50


#v2beta2版本：
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: packets-per-second
      target:
        type: AverageValue
        averageValue: 1k
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1beta1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 10k
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector: "queue=worker_tasks"
      target:
        type: AverageValue
        averageValue: 30

基於CPU指標縮放

Kubernetes API Aggregation

在 Kubernetes 1.7 版本引入了聚合層，允許第三方應用程序通過將自己注冊到kube-apiserver上，仍然通過 API Server 的 HTTP URL 對新的 API 進行訪問和操作。為了實現這個機制，Kubernetes 在 kube-apiserver 服務中引入了一個 API 聚合層（API Aggregation Layer），用於將擴展 API 的訪問請求轉發到用戶服務的功能。

當你訪問 apis/metrics.k8s.io/v1beta1 的時候，實際上訪問到的是一個叫作 kube-aggregator 的代理。而 kube-apiserver，正是這個代理的一個后端；而 Metrics Server，則是另一個后端。通過這種方式，我們就可以很方便地擴展 Kubernetes 的 API 了。

如果你使用kubeadm部署的，默認已開啟。如果你使用二進制方式部署的話，需要在kube-APIServer中添加啟動參數，增加以下配置：

# vi /opt/kubernetes/cfg/kube-apiserver.conf
...
--requestheader-client-ca-file=/opt/kubernetes/ssl/ca.pem \
--proxy-client-cert-file=/opt/kubernetes/ssl/server.pem \
--proxy-client-key-file=/opt/kubernetes/ssl/server-key.pem \
--requestheader-allowed-names=kubernetes \
--requestheader-extra-headers-prefix=X-Remote-Extra- \
--requestheader-group-headers=X-Remote-Group \
--requestheader-username-headers=X-Remote-User \
--enable-aggregator-routing=true \
...

在設置完成重啟 kube-apiserver 服務，就啟用 API 聚合功能了。

部署 Metrics Server

Metrics Server是一個集群范圍的資源使用情況的數據聚合器。作為一個應用部署在集群中。

Metric server從每個節點上Kubelet公開的摘要API收集指標。

Metrics server通過Kubernetes聚合器注冊在Master APIServer中。

部署清單地址：https://github.com/kubernetes-sigs/metrics-server

# git clone https://github.com/kubernetes-incubator/metrics-server

修改deployment.yaml文件，修正集群問題
問題1：metrics-server默認使用節點hostname通過kubelet 10250端口獲取數據，但是coredns里面沒有該數據無法解析(10.96.0.10:53)，可以在metrics server啟動命令添加參數 --kubelet-preferred-address-types=InternalIP 直接使用節點IP地址獲取數據

問題2：kubelet 的10250端口使用的是https協議，連接需要驗證tls證書。可以在metrics server啟動命令添加參數--kubelet-insecure-tls不驗證客戶端證書

問題3：yaml文件中的image地址k8s.gcr.io/metrics-server-amd64:v0.3.0 需要梯子，需要改成中國可以訪問的image地址，可以使用aliyun的。這里使用hub.docker.com里的google鏡像地址 image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1

kubectl apply -f .
kubectl get pod -n kube-system

可通過Metrics API在Kubernetes中獲得資源使用率指標，例如容器CPU和內存使用率。這些度量標准既可以由用戶直接訪問（例如，通過使用kubectl top命令），也可以由集群中的控制器（例如，Horizontal Pod Autoscaler）用於進行決策。

測試：

kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl top node
kubectl get apiservice |grep metrics
kubectl describe apiservice v1beta1.metrics.k8s.io

autoscaling/v1（CPU指標實踐）

autoscaling/v1版本只支持CPU一個指標。

創建HPA策略：

# kubectl get pod
NAME                         READY   STATUS    RESTARTS   AGE
java-demo-8548998c57-d4wkp   1/1     Running   0          12m
java-demo-8548998c57-w24x6   1/1     Running   0          11m
java-demo-8548998c57-wbnrs   1/1     Running   0          11m
# kubectl autoscale deployment java-demo --cpu-percent=50 --min=3 --max=10 --dry-run -o yaml > hpa-v1.yaml
# cat hpa-v1.yaml 
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: java-demo
spec:
  maxReplicas: 10
  minReplicas: 3
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: java-demo
  targetCPUUtilizationPercentage: 50
# kubectl apply -f hpa-v1.yaml
# kubectl get hpa
NAME        REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
java-demo   Deployment/java-demo   1%/50%    3         10        3          10m
# kubectl describe hpa java-demo

scaleTargetRef：表示當前要伸縮對象是誰

targetCPUUtilizationPercentage：當整體的資源利用率超過50%的時候，會進行擴容。

開啟壓測：

# yum install httpd-tools -y
# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
java-demo    ClusterIP   10.0.0.215   <none>        80/TCP    171m
# ab -n 100000 -c 100 http://10.0.0.215/index
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.0.0.215 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
apr_socket_recv: Connection refused (111)
Total of 85458 requests completed

檢測擴容狀態

# kubectl get hpa 
NAME        REFERENCE              TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
java-demo   Deployment/java-demo   1038%/50%   3         10        10         165m

# kubectl get pod
NAME                         READY   STATUS    RESTARTS   AGE
java-demo-77d4f5cdcf-4chv4   1/1     Running   0          56s
java-demo-77d4f5cdcf-9bkz7   1/1     Running   0          56s
java-demo-77d4f5cdcf-bk9mk   1/1     Running   0          156m
java-demo-77d4f5cdcf-bv68j   1/1     Running   0          41s
java-demo-77d4f5cdcf-khhlv   1/1     Running   0          41s
java-demo-77d4f5cdcf-nvdjh   1/1     Running   0          56s
java-demo-77d4f5cdcf-pqxvb   1/1     Running   0          41s
java-demo-77d4f5cdcf-pxgl9   1/1     Running   0          41s
java-demo-77d4f5cdcf-qqk6q   1/1     Running   0          156m
java-demo-77d4f5cdcf-tkct6   1/1     Running   0          156m

# kubectl top pod
NAME                         CPU(cores)   MEMORY(bytes)   
java-demo-77d4f5cdcf-4chv4   2m           269Mi           
java-demo-77d4f5cdcf-bk9mk   2m           246Mi           
java-demo-77d4f5cdcf-cwzwz   2m           177Mi           
java-demo-77d4f5cdcf-cz7hj   3m           220Mi           
java-demo-77d4f5cdcf-fb9zl   3m           197Mi           
java-demo-77d4f5cdcf-ftjht   3m           194Mi           
java-demo-77d4f5cdcf-qdxqf   2m           174Mi           
java-demo-77d4f5cdcf-qx52w   2m           175Mi           
java-demo-77d4f5cdcf-rfrlh   3m           220Mi           
java-demo-77d4f5cdcf-xjzjt   2m           176Mi

工作流程：hpa -> apiserver -> kube aggregation -> metrics-server -> kubelet(cadvisor)

autoscaling/v2beta2（多指標）

為滿足更多的需求， HPA 還有 autoscaling/v2beta1和 autoscaling/v2beta2兩個版本。

這兩個版本的區別是 autoscaling/v1beta1支持了 Resource Metrics（CPU）和 Custom Metrics（應用程序指標），而在 autoscaling/v2beta2的版本中額外增加了External Metrics的支持。

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: java-demo
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - resource:
    type: Resource
      name: cpu
      target:
        averageUtilization: 60
        type: Utilization

與上面v1版本效果一樣，只不過這里格式有所變化。

v2還支持其他另種類型的度量指標，：Pods和Object。

type: Pods
pods:
  metric:
    name: packets-per-second
  target:
    type: AverageValue
    averageValue: 1k

type: Object
object:
  metric:
    name: requests-per-second
  describedObject:
    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    name: main-route
  target:
    type: Value
    value: 2k

metrics中的type字段有四種類型的值：Object、Pods、Resource、External。

Resource：指的是當前伸縮對象下的pod的cpu和memory指標，只支持Utilization和AverageValue類型的目標值。
Object：指的是指定k8s內部對象的指標，數據需要第三方adapter提供，只支持Value和AverageValue類型的目標值。
Pods：指的是伸縮對象Pods的指標，數據需要第三方的adapter提供，只允許AverageValue類型的目標值。
External：指的是k8s外部的指標，數據同樣需要第三方的adapter提供，只支持Value和AverageValue類型的目標值。

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: java-demo
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: java-demo
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: packets-per-second
      target:
        type: AverageValue
        averageValue: 1k
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1beta1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 10k

基於Prometheus自定義指標縮放

資源指標只包含CPU、內存，一般來說也夠了。但如果想根據自定義指標:如請求qps/5xx錯誤數來實現HPA，就需要使用自定義指標了，目前比較成熟的實現是 Prometheus Custom Metrics。自定義指標由Prometheus來提供，再利用k8s-prometheus-adpater聚合到apiserver，實現和核心指標（metric-server)同樣的效果。

工作流程：hpa -> apiserver -> kube aggregation -> prometheus-adapter -> prometheus -> pods

部署 Custom Metrics Adapter

但是prometheus采集到的metrics並不能直接給k8s用，因為兩者數據格式不兼容，還需要另外一個組件(k8s-prometheus-adpater)，將prometheus的metrics 數據格式轉換成k8s API接口能識別的格式，轉換以后，因為是自定義API，所以還需要用Kubernetes aggregator在主APIServer中注冊，以便直接通過/apis/來訪問。

https://github.com/DirectXMan12/k8s-prometheus-adapter

該 PrometheusAdapter 有一個穩定的Helm Charts，我們直接使用。

先准備下helm環境：

wget https://get.helm.sh/helm-v3.0.0-linux-amd64.tar.gz
tar zxvf helm-v3.0.0-linux-amd64.tar.gz 
mv linux-amd64/helm /usr/bin/
helm repo add stable http://mirror.azure.cn/kubernetes/charts
helm repo update
helm repo list

部署prometheus-adapter，指定prometheus地址：

# helm install prometheus-adapter stable/prometheus-adapter --namespace kube-system --set prometheus.url=http://prometheus.kube-system,prometheus.port=9090
# helm list -n kube-system
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS        CHART                    APP VERSION
prometheus-adapter      kube-system     1               2020-05-28 11:38:35.156622425 +0800 CST deployed      prometheus-adapter-2.3.1 v0.6.0

確保適配器注冊到APIServer：

# kubectl get apiservices |grep custom 
# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"

基於QPS指標實踐

部署應用暴露prometheus指標接口，可以通過訪問service看到

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: metrics-app
  name: metrics-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metrics-app
  template:
    metadata:
      labels:
        app: metrics-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "80"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - image: lizhenliang/metrics-app
        name: metrics-app
        ports:
        - name: web
          containerPort: 80
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 3
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 3
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: metrics-app
  labels:
    app: metrics-app
spec:
  ports:
  - name: web
    port: 80
    targetPort: 80
  selector:
    app: metrics-app


# curl 10.99.15.240/metrics
# HELP http_requests_total The amount of requests in total
# TYPE http_requests_total counter
http_requests_total 86
# HELP http_requests_per_second The amount of requests per second the latest ten seconds
# TYPE http_requests_per_second gauge
http_requests_per_second 0.5

創建HPA策略

使用Prometheus提供的指標測試來測試自定義指標（QPS）的自動縮放。

# vi app-hpa-v2.yml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: metrics-app-hpa 
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: metrics-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 800m   # 800m 即0.8個/秒

配置適配器收集特定的指標

當創建好HPA還沒結束，因為適配器還不知道你要什么指標（http_requests_per_second），HPA也就獲取不到Pod提供指標。

ConfigMap在default名稱空間中編輯prometheus-adapter ，並seriesQuery在該rules: 部分的頂部添加一個新的：

# kubectl edit cm prometheus-adapter -n kube-system
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: prometheus-adapter
    chart: prometheus-adapter-v0.1.2
    heritage: Tiller
    release: prometheus-adapter
  name: prometheus-adapter
data:
  config.yaml: |
    rules:
    - seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
      resources:
        overrides:
          kubernetes_namespace: {resource: "namespace"}
          kubernetes_pod_name: {resource: "pod"}
      name:
        matches: "^(.*)_total"
        as: "${1}_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
...

該規則將http_requests在2分鍾的間隔內收集該服務的所有Pod的平均速率。

測試API：

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second"

壓測

ab -n 100000 -c 100  http://10.99.15.240/metrics

查看PHA狀態

kubectl get hpa
kubectl describe hpa metrics-app-hpa

小結

通過/metrics收集每個Pod的http_request_total指標；
prometheus將收集到的信息匯總；
APIServer定時從Prometheus查詢，獲取request_per_second的數據；
HPA定期向APIServer查詢以判斷是否符合配置的autoscaler規則；
如果符合autoscaler規則，則修改Deployment的ReplicaSet副本數量進行伸縮。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 kubernetes指南--彈性伸縮 Docker（三）：利用Kubernetes實現容器的彈性伸縮第22章：kubernetes彈性伸縮(HPA) Kubernetes（k8s）的彈性伸縮 Kubernetes 彈性伸縮全場景解析（一）- 概念延伸與組件布局 Kubernetes使用Keda進行彈性伸縮，更合理利用資源 Kubernetes 彈性伸縮全場景解讀（二）- HPA 的原理與演進 Kubernetes 彈性伸縮全場景解讀（五） - 定時伸縮組件發布與開源彈性伸縮布局flex 彈性伸縮 AS（Auto Scaling）