HPA控制器介紹
當系統資源過高的時候,我們可以使用如下命令來實現 Pod 的擴縮容功能
$ kubectl -n luffy scale deployment myblog --replicas=2
但是這個過程是手動操作的。在實際項目中,我們需要做到是的是一個自動化感知並自動擴容的操作。Kubernetes 也為提供了這樣的一個資源對象:Horizontal Pod Autoscaling(Pod 水平自動伸縮),簡稱HPA
基本原理:HPA 通過監控分析控制器控制的所有 Pod 的負載變化情況來確定是否需要調整 Pod 的副本數量
HPA的實現有兩個版本:
- autoscaling/v1,只包含了根據CPU指標的檢測,穩定版本
- autoscaling/v2beta1,支持根據memory或者用戶自定義指標進行伸縮
如何獲取Pod的監控數據?
- k8s 1.8以下:使用heapster,1.11版本完全廢棄
- k8s 1.8以上:使用metric-server
思考:為什么之前用 heapster ,現在廢棄了項目,改用 metric-server ?
heapster時代,apiserver 會直接將metric請求通過apiserver proxy 的方式轉發給集群內的 hepaster 服務,采用這種 proxy 方式是有問題的:
-
http://kubernetes_master_address/api/v1/namespaces/namespace_name/services/service_name[:port_name]/proxy
-
proxy只是代理請求,一般用於問題排查,不夠穩定,且版本不可控
-
heapster的接口不能像apiserver一樣有完整的鑒權以及client集成
-
pod 的監控數據是核心指標(HPA調度),應該和 pod 本身擁有同等地位,即 metric應該作為一種資源存在,如metrics.k8s.io 的形式,稱之為 Metric Api
於是官方從 1.8 版本開始逐步廢棄 heapster,並提出了上邊 Metric api 的概念,而 metrics-server 就是這種概念下官方的一種實現,用於從 kubelet獲取指標,替換掉之前的 heapster。
Metrics Server
可以通過標准的 Kubernetes API 把監控數據暴露出來,比如獲取某一Pod的監控數據:
https://192.168.136.10:6443/apis/metrics.k8s.io/v1beta1/namespaces/<namespace-name>/pods/<pod-name>
# https://192.168.136.10:6443/api/v1/namespaces/luffy/pods?limit=500
目前的采集流程:
Metric Server
...
Metric server collects metrics from the Summary API, exposed by Kubelet on each node.
Metrics Server registered in the main API server through Kubernetes aggregator, which was introduced in Kubernetes 1.7
...
安裝
官方代碼倉庫地址:https://github.com/kubernetes-sigs/metrics-server
Depending on your cluster setup, you may also need to change flags passed to the Metrics Server container. Most useful flags:
--kubelet-preferred-address-types
- The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])--kubelet-insecure-tls
- Do not verify the CA of serving certificates presented by Kubelets. For testing purposes only.--requestheader-client-ca-file
- Specify a root certificate bundle for verifying client certificates on incoming requests.
$ wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
修改args參數:
...
84 containers:
85 - name: metrics-server
86 image: registry.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
87 imagePullPolicy: IfNotPresent
88 args:
89 - --cert-dir=/tmp
90 - --secure-port=4443
91 - --kubelet-insecure-tls
92 - --kubelet-preferred-address-types=InternalIP
...
執行安裝:
$ kubectl create -f components.yaml
$ kubectl -n kube-system get pods
$ kubectl top nodes
kubelet的指標采集
無論是 heapster還是 metric-server,都只是數據的中轉和聚合,兩者都是調用的 kubelet 的 api 接口獲取的數據,而 kubelet 代碼中實際采集指標的是 cadvisor 模塊,你可以在 node 節點訪問 10250 端口獲取監控數據:
- Kubelet Summary metrics: https://127.0.0.1:10250/metrics,暴露 node、pod 匯總數據
- Cadvisor metrics: https://127.0.0.1:10250/metrics/cadvisor,暴露 container 維度數據
調用示例:
$ curl -k -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6InhXcmtaSG5ZODF1TVJ6dUcycnRLT2c4U3ZncVdoVjlLaVRxNG1wZ0pqVmcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi10b2tlbi1xNXBueiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJhZG1pbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImViZDg2ODZjLWZkYzAtNDRlZC04NmZlLTY5ZmE0ZTE1YjBmMCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbiJ9.iEIVMWg2mHPD88GQ2i4uc_60K4o17e39tN0VI_Q_s3TrRS8hmpi0pkEaN88igEKZm95Qf1qcN9J5W5eqOmcK2SN83Dd9dyGAGxuNAdEwi0i73weFHHsjDqokl9_4RGbHT5lRY46BbIGADIphcTeVbCggI6T_V9zBbtl8dcmsd-lD_6c6uC2INtPyIfz1FplynkjEVLapp_45aXZ9IMy76ljNSA8Uc061Uys6PD3IXsUD5JJfdm7lAt0F7rn9SdX1q10F2lIHYCMcCcfEpLr4Vkymxb4IU4RCR8BsMOPIO_yfRVeYZkG4gU2C47KwxpLsJRrTUcUXJktSEPdeYYXf9w" https://localhost:10250/metrics
kubelet雖然提供了 metric 接口,但實際監控邏輯由內置的cAdvisor模塊負責,早期的時候,cadvisor是單獨的組件,從k8s 1.12開始,cadvisor 監聽的端口在k8s中被刪除,所有監控數據統一由Kubelet的API提供。
cadvisor獲取指標時實際調用的是 runc/libcontainer庫,而libcontainer是對 cgroup文件 的封裝,即 cadvsior也只是個轉發者,它的數據來自於cgroup文件。
cgroup文件中的值是監控數據的最終來源,如
-
mem usage的值,
-
對於docker容器來講,來源於
/sys/fs/cgroup/memory/docker/[containerId]/memory.usage_in_bytes
-
對於pod來講,
/sys/fs/cgroup/memory/kubepods/besteffort/pod[podId]/memory.usage_in_bytes
或者/sys/fs/cgroup/memory/kubepods/burstable/pod[podId]/memory.usage_in_bytes
-
-
如果沒限制內存,Limit = machine_mem,否則來自於
/sys/fs/cgroup/memory/docker/[id]/memory.limit_in_bytes
-
內存使用率 = memory.usage_in_bytes/memory.limit_in_bytes
Metrics數據流:
思考:
Metrics Server是獨立的一個服務,只能服務內部實現自己的api,是如何做到通過標准的kubernetes 的API格式暴露出去的?
kube-aggregator
kube-aggregator聚合器及Metric-Server的實現
kube-aggregator是對 apiserver 的api的一種拓展機制,它允許開發人員編寫一個自己的服務,並把這個服務注冊到k8s的api里面,即擴展 API 。
定義一個APIService對象:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.luffy.k8s.io
spec:
group: luffy.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: service-A # 必須https訪問
namespace: luffy
port: 443
version: v1beta1
versionPriority: 100
k8s會自動幫我們代理如下url的請求:
proxyPath := "/apis/" + apiService.Spec.Group + "/" + apiService.Spec.Version
即:https://192.168.136.10:6443/apis/luffy.k8s.io/v1beta1/xxxx轉到我們的service-A服務中,service-A中只需要實現 https://service-A/luffy.k8s.io/v1beta1/xxxx
即可。
看下metric-server的實現:
$ kubectl get apiservice
NAME SERVICE AVAILABLE
v1beta1.metrics.k8s.io kube-system/metrics-server True
$ kubectl get apiservice v1beta1.metrics.k8s.io -oyaml
...
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
...
$ kubectl -n kube-system get svc metrics-server
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metrics-server ClusterIP 10.110.111.146 <none> 443/TCP 11h
$ curl -k -H "Authorization: Bearer xxxx" https://10.110.111.146
{
"paths": [
"/apis",
"/apis/metrics.k8s.io",
"/apis/metrics.k8s.io/v1beta1",
"/healthz",
"/healthz/healthz",
"/healthz/log",
"/healthz/ping",
"/healthz/poststarthook/generic-apiserver-start-informers",
"/metrics",
"/openapi/v2",
"/version"
]
# https://192.168.136.10:6443/apis/metrics.k8s.io/v1beta1/namespaces/<namespace-name>/pods/<pod-name>
#
$ curl -k -H "Authorization: Bearer xxxx" https://10.110.111.146/apis/metrics.k8s.io/v1beta1/namespaces/luffy/pods/myblog-5d9ff54d4b-4rftt
$ curl -k -H "Authorization: Bearer xxxx" https://192.168.136.10:6443/apis/metrics.k8s.io/v1beta1/namespaces/luffy/pods/myblog-5d9ff54d4b-4rftt
HPA實踐
基於CPU的動態伸縮
創建hpa對象:
# 方式一
$ cat hpa-myblog.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-myblog-cpu
namespace: luffy
spec:
maxReplicas: 3
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myblog
targetCPUUtilizationPercentage: 10
# 方式二
$ kubectl -n luffy autoscale deployment myblog --cpu-percent=10 --min=1 --max=3
Deployment對象必須配置requests的參數,不然無法獲取監控數據,也無法通過HPA進行動態伸縮
驗證:
$ yum -y install httpd-tools
$ kubectl -n luffy get svc myblog
myblog ClusterIP 10.104.245.225 <none> 80/TCP 6d18h
# 為了更快看到效果,先調整副本數為1
$ kubectl -n luffy scale deploy myblog --replicas=1
# 模擬1000個用戶並發訪問頁面10萬次
$ ab -n 100000 -c 1000 http://10.104.245.225/blog/index/
$ kubectl get hpa
$ kubectl -n luffy get pods
壓力降下來后,會有默認5分鍾的scaledown的時間,可以通過controller-manager的如下參數設置:
--horizontal-pod-autoscaler-downscale-stabilization
The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
是一個逐步的過程,當前的縮放完成后,下次縮放的時間間隔,比如從3個副本降低到1個副本,中間大概會等待2*5min = 10分鍾
基於內存的動態伸縮
創建hpa對象
$ cat hpa-demo-mem.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-mem
namespace: luffy
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-mem
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 30
加壓演示腳本:
$ cat increase-mem-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: increase-mem-config
namespace: luffy
data:
increase-mem.sh: |
#!/bin/bash
mkdir /tmp/memory
mount -t tmpfs -o size=40M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block
sleep 60
rm /tmp/memory/block
umount /tmp/memory
rmdir /tmp/memory
測試deployment:
$ cat hpa-demo-mem-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-mem
namespace: luffy
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: increase-mem-script
configMap:
name: increase-mem-config
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
volumeMounts:
- name: increase-mem-script
mountPath: /etc/script
resources:
requests:
memory: 50Mi
cpu: 50m
securityContext:
privileged: true
測試:
$ kubectl create -f increase-mem-config.yaml
$ kubectl create -f hpa-demo-mem.yaml
$ kubectl create -f hpa-demo-mem-deploy.yaml
$ kubectl -n luffy exec -ti hpa-demo-mem-7fc75bf5c8-xx424 sh
#/ sh /etc/script/increase-mem.sh
# 觀察hpa及pod
$ kubectl -n luffy get hpa
$ kubectl -n luffy get po
基於自定義指標的動態伸縮
除了基於 CPU 和內存來進行自動擴縮容之外,我們還可以根據自定義的監控指標來進行。這個我們就需要使用 Prometheus Adapter
,Prometheus 用於監控應用的負載和集群本身的各種指標,Prometheus Adapter
可以幫我們使用 Prometheus 收集的指標並使用它們來制定擴展策略,這些指標都是通過 APIServer 暴露的,而且 HPA 資源對象也可以很輕易的直接使用。
架構圖: