環境:
kubernetes 1.11+/openshift3.11
自定義metric HPA原理:
首選需要注冊一個apiservice(custom metrics API)。
當HPA請求metrics時,kube-aggregator
(apiservice的controller)會將請求轉發到adapter,adapter作為kubernentes集群的pod,實現了Kubernetes resource metrics API and custom metrics API,它會根據配置的rules從Prometheus抓取並處理metrics,在處理(如重命名metrics等)完后將metric通過custom metrics API返回給HPA。最后HPA通過獲取的metrics的value對Deployment/ReplicaSet進行擴縮容。
adapter作為extension-apiserver(即自己實現的pod)
,充當了代理kube-apiserver請求Prometheus的功能。
如下是k8s-prometheus-adapter apiservice的定義,kube-aggregator
通過下面的service
將請求轉發給adapter。v1beta1.custom.metrics.k8s.io
是寫在k8s-prometheus-adapter代碼中的,因此不能任意改變。
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: custom-metrics-apiserver
namespace: custom-metrics
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
部署:
-
github下載k8s-prometheus-adapter
-
參照官方文檔部署adapter:
-
pull鏡像:
directxman12/k8s-prometheus-adapter:latest
,修改鏡像tag並push到本地鏡像倉庫 -
生成證書:運行如下shell腳本(來自官方)生成cm-adapter-serving-certs.yaml,並將其拷貝到
manifests/
目錄下,該證書用於kube-aggregator
與adapter通信時認證adapter。注意下面證書有效時間為5年(43800h)以及授權的域名。#!/usr/bin/env bash # exit immediately when a command fails set -e # only exit with zero if all commands of the pipeline exit successfully set -o pipefail # error on unset variables set -u # Detect if we are on mac or should use GNU base64 options case $(uname) in Darwin) b64_opts='-b=0' ;; *) b64_opts='--wrap=0' esac go get -v -u github.com/cloudflare/cfssl/cmd/... export PURPOSE=metrics echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","'${PURPOSE}'"]}}}' > "ca-config.json" export SERVICE_NAME=custom-metrics-apiserver export ALT_NAMES='"custom-metrics-apiserver.custom-metrics","custom-metrics-apiserver.custom-metrics.svc"' echo "{\"CN\":\"${SERVICE_NAME}\", \"hosts\": [${ALT_NAMES}], \"key\": {\"algo\": \"rsa\",\"size\": 2048}}" | \ cfssl gencert -ca=ca.crt -ca-key=ca.key -config=ca-config.json - | cfssljson -bare apiserver cat <<-EOF > cm-adapter-serving-certs.yaml apiVersion: v1 kind: Secret metadata: name: cm-adapter-serving-certs data: serving.crt: $(base64 ${b64_opts} < apiserver.pem) serving.key: $(base64 ${b64_opts} < apiserver-key.pem) EOF
可以在custom-metrics-apiservice.yaml中設置
insecureSkipTLSVerify: true
時,kube-aggregator
不會校驗adapter的如上證書。如果需要啟用校驗,則需要在caBundle中添加openshift集群的ca證書(非openshift集群的自簽證書會被認為是不可信任的證書),將openshift集群master節點的/etc/origin/master/ca.crt進行base64轉碼黏貼到caBundle字段即可。base64 ca.crt
也可以黏貼openshift集群master節點的/root/.kube/config文件中的
clusters.cluster.certificate-authority-data
字段- 創建命名空間:
kubectl create namespace custom-metrics
- 創建命名空間:
-
openshift的kube-system下面可能沒有role
extension-apiserver-authentication-reader
,如果不存在,則需要創建apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: extension-apiserver-authentication-reader namespace: kube-system rules: - apiGroups: - "" resourceNames: - extension-apiserver-authentication resources: - configmaps verbs: - get
-
修改custom-metrics-apiserver-deployment.yaml的
--prometheus-url
字段,指向正確的prometheus -
創建其他組件:
kubectl create -f manifests/
在部署時會創建一個名為
custom-metrics-resource-reader
的clusterRole
,用於授權adapter讀取kubernetes cluster的資源,可以看到其允許讀取的資源為namespaces/pods/services
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: custom-metrics-resource-reader rules: - apiGroups: - "" resources: - namespaces - pods - services verbs: - get - list
-
-
部署demo:
-
部署官方demo
# cat sample-app.deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: sample-app labels: app: sample-app spec: replicas: 1 selector: matchLabels: app: sample-app template: metadata: labels: app: sample-app spec: containers: - image: docker-local.art.aliocp.csvw.com/openshift3/autoscale-demo:v0.1.2 name: metrics-provider ports: - name: http containerPort: 8080
-
創建service
apiVersion: v1 kind: Service metadata: labels: app: sample-app name: sample-app namespace: custom-metrics spec: ports: - name: http port: 80 protocol: TCP targetPort: 8080 selector: app: sample-app type: ClusterIP
在
custom-metrics
命名空間下驗證可以獲取到metricscurl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
-
-
部署serviceMonitor
由於HPA需要用到
namespace
和pod
等kubernetes的資源信息,因此需要使用servicemonitor注冊方式來為metrics添加這些信息-
openshift Prometheus operator對servicemonitor的限制如下
serviceMonitorNamespaceSelector: matchExpressions: - key: openshift.io/cluster-monitoring operator: Exists serviceMonitorSelector: matchExpressions: - key: k8s-app operator: Exists
-
因此需要給
custom-metrics
命名空間添加標簽oc label namespace custom-metrics openshift.io/cluster-monitoring=true
-
在
openshift-monitoring
命名空間中創建service-monitor# cat service-monitor.yaml kind: ServiceMonitor apiVersion: monitoring.coreos.com/v1 metadata: name: sample-app labels: k8s-app: testsample app: sample-app spec: namespaceSelector: any: true selector: matchLabels: app: sample-app endpoints: - port: http
-
添加權限
oc adm policy add-cluster-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s oc adm policy add-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s -n custom-metrics
-
-
測試HPA
-
創建HPA,表示1秒請求大於0.5個時開始擴容
# cat sample-app-hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: sample-app spec: scaleTargetRef: # point the HPA at the sample application # you created above apiVersion: apps/v1 kind: Deployment name: sample-app # autoscale between 1 and 10 replicas minReplicas: 1 maxReplicas: 10 metrics: # use a "Pods" metric, which takes the average of the # given metric across all pods controlled by the autoscaling target - type: Pods pods: # use the metric that you used above: pods/http_requests metricName: http_requests_per_second # target 500 milli-requests per second, # which is 1 request every two seconds targetAverageValue: 500m
通過
oc describe hpa sample-app
查看hpa是否運行正常 -
持續執行命令
curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
發出請求 -
通過命令
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"
查看其對應的value
值,當其值大於500m時開始擴容# oc get pod NAME READY STATUS RESTARTS AGE sample-app-6d55487cdd-dc6qz 1/1 Running 0 18h sample-app-6d55487cdd-w6bbb 1/1 Running 0 5m sample-app-6d55487cdd-zbdbr 1/1 Running 0 5m
-
過段時間,當
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"
的值持續低於500m時進行縮容,縮容時間由--horizontal-pod-autoscaler-downscale-stabilization
指定,默認5分鍾。提供
oc get hpa
的TARGETS
字段可以查看擴縮容比例# oc get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-app Deployment/sample-app 66m/500m 1 10 1 3h
-
Adapter config
部署adapter前需要配置adapter的rule,用於預處理metrics,默認配置為manifests/custom-metrics-config-map.yaml
。adapter的配置主要分為4個:
-
Discovery:指定需要處理的Prometheus的metrics。通過seriesQuery挑選需要處理的metrics集合,可以通過seriesFilters精確過濾metrics。
seriesQuery可以根據標簽進行查找(如下),也可以直接指定metric name查找
seriesQuery: '{__name__=~"^container_.*_total",container_name!="POD",namespace!="",pod_name!=""}' seriesFilters: - isNot: "^container_.*_seconds_total"
seriesFilters:
is: <regex>, 匹配包含該正則表達式的metrics. isNot: <regex>, 匹配不包含該正則表達式的metrics.
-
Association:設置metric與kubernetes resources的映射關系,kubernetes resorces可以通過
kubectl api-resources
命令查看。overrides會將Prometheus metric label與一個kubernetes resource(下例為deployment)關聯。需要注意的是該label必須是一個真實的kubernetes resource,如metric的pod_name可以映射為kubernetes的pod resource,但不能將container_image映射為kubernetes的pod resource,映射錯誤會導致無法通過custom metrics API獲取正確的值。這也表示metric中必須存在一個真實的resource 名稱,將其映射為kubernetes resource。resources: overrides: microservice: {group: "apps", resource: "deployment"}
-
Naming:用於將prometheus metrics名稱轉化為custom metrics API所使用的metrics名稱,但不會改變其本身的metric名稱,即通過
curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
獲得的仍然是老的metric名稱。如果不需要可以不執行這一步。# match turn any name <name>_total to <name>_per_second # e.g. http_requests_total becomes http_requests_per_second name: matches: "^(.*)_total$" as: "${1}_per_second"
如本例中HPA后續可以通過
/apis/{APIService-name}/v1beta1/namespaces/{namespaces-name}/pods/*/http_requests_per_second
獲取metrics -
Querying:處理調用custom metrics API獲取到的metrics的value,該值最終提供給HPA進行擴縮容
# convert cumulative cAdvisor metrics into rates calculated over 2 minutes metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
metricsQuery
字段使用Go template將URL請求轉變為Prometheus的請求,它會提取custom metrics API請求中的字段,並將其划分為metric name,group-resource,以及group-resource中的一個或多個objects,對應如下字段:Series
: metric名稱LabelMatchers
: 以逗號分割的objects,當前表示特定group-resource加上命名空間的label(如果該group-resource 是namespaced的)GroupBy
:以逗號分割的label的集合,當前表示LabelMatchers中的group-resource label
假設metrics
http_requests_per_second
如下http_requests_per_second{pod="pod1",service="nginx1",namespace="somens"} http_requests_per_second{pod="pod2",service="nginx2",namespace="somens"}
當調用
kubectl get --raw "/apis/{APIService-name}/v1beta1/namespaces/somens/pods/*/http_request_per_second"
時,metricsQuery
字段的模板的實際內容如下:Series: "http_requests_total"
LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"
GroupBy:pod
adapter使用字段
rules
和externalRules
分別表示custom metrics和external metrics,如本例中apiVersion: v1 kind: ConfigMap metadata: name: adapter-config namespace: openshift-monitoring data: config.yaml: | externalRules: - seriesQuery: '{namespace!="",pod!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod: resource: pod metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[22m])) by (<<.GroupBy>>) rules: - seriesQuery: '{namespace!="",pod!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod: resource: pod name: matches: "^(.*)_total" as: "${1}_per_second" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
HPA的配置
HPA通常會根據type從aggregated APIs (metrics.k8s.io
, custom.metrics.k8s.io
, external.metrics.k8s.io
)的資源路徑上拉取metrics
HPA支持的metrics類型有4種(下述為v2beta2的格式):
-
resource:目前僅支持
cpu
和memory
。target可以指定數值(targetAverageValue
)和比例(targetAverageUtilization
)進行擴縮容HPA從
metrics.k8s.io
獲取resource metrics
-
pods:custom metrics,這類metrics描述了pod類型,target僅支持按指定數值(
targetAverageValue
)進行擴縮容。targetAverageValue
用於計算所有相關pods上的metrics的平均值type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k
HPA從
custom.metrics.k8s.io
獲取custom metrics
-
object:custom metrics,這類metrics描述了相同命名空間下的(非pod)類型。target支持通過
value
和AverageValue
進行擴縮容,前者直接將metric與target比較進行擴縮容,后者通過metric/相關的pod數目
與target比較進行擴縮容type: Object object: metric: name: requests-per-second describedObject: apiVersion: extensions/v1beta1 kind: Ingress name: main-route target: type: Value value: 2k
-
external:kubernetes 1.10+。這類metrics與kubernetes集群無關(pods和object需要與kubernetes中的某一類型關聯)。與object類似,target支持通過
value
和AverageValue
進行擴縮容。由於external會嘗試匹配所有kubernetes資源的metrics,因此實際中不建議使用該類型。HPA從
external.metrics.k8s.io
獲取external metrics
- type: External external: metric: name: queue_messages_ready selector: "queue=worker_tasks" target: type: AverageValue averageValue: 30
-
1.6版本支持多metrics的擴縮容,當其中一個metrics達到擴容標准時就會創建pod副本(當前副本<maxReplicas)
注:target的value的一個單位可以划分為1000份,每一份以m
為單位,如500m表示1/2
個單位。參見Quantity
kubernetes HPA的算法如下:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
當使用targetAverageValue
或targetAverageUtilization
時,currentMetricValue會取HPA指定的所有pods的metric的平均值
Kubernetes metrics的獲取
假設注冊的APIService為custom.metrics.k8s.io/v1beta1,在注冊好APIService后HorizontalPodAutoscaler controller會從以/apis/custom.metrics.k8s.io/v1beta1
為根API的路徑上抓取metrics。metrics的API path可以分為namespaced
和non-namespaced
類型的。通過如下方式校驗HPA是否可以獲取到metrics:
namespaced
- 獲取指定namespace下指定object類型和名稱的metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}"
如獲取monitor
命名空間下名為grafana
的pod的start_time_seconds
metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/grafana/start_time_seconds"
- 獲取指定namespace下所有特定object類型的metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}"
如獲取monitor
命名空間下名為所有pod的start_time_seconds
metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds"
- 使用labelSelector可以選擇帶有特定label的object
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}?labelSelector={label-name}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}?labelSelector={label-name}"
non-namespaced
non-namespaced和namespaced的類似,主要有node,namespace,PersistentVolume等。non-namespaced訪問有些與custom metrics API描述不一致。
- 訪問object為namespace的方式如下如下
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/metrics/{metric-name...}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/{metric-name...}"
- 訪問node的方式如下
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/nodes/{node-name}/{metric-name...}"
DEBUG:
-
使用如下方式查看注冊的APIService發現的所有rules
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
如果獲取失敗,可以看下使用
oc get apiservice v1beta1.custom.metrics.k8s.io -oyaml
查看status
和message
的相關信息如果獲取到的resource為空,則需要校驗deploy中的Prometheus url是否正確,是否有權限等
-
通過如下方式查看完整的請求過程(--v=8)
kubectl get --raw “/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}" --v=8
-
如果上述過程正確,但獲取到的items為空
- 首先保證k8s-prometheus-adapter的參數
--metrics-relist-interval
設置值大於Prometheus的參數scrape_interval
- 確保k8s-prometheus-adapter
rules
的seriesQuery
規則可以抓取到Prometheus的數據 - 確保k8s-prometheus-adapter
rules
的metricsQuery
規則可以抓取到計算出數據,此處需要注意的是,如果使用到了計算某段時間的數據,如果時間設置過短,可能導致沒有數據生成
- 首先保證k8s-prometheus-adapter的參數
TIPS:
-
官方提供了End-to-end walkthrough,但需要采集的metrics中包含
pod
和namespace
label,否則在官方默認配置下無法采集到metrics。 -
Configuration Walkthroughs一步步講解了如何配置adapter config
-
在goland里面使用如下參數可以遠程調試adapter:
--secure-port=6443 --tls-cert-file=D:\adapter\serving.crt --tls-private-key-file=D:\adapter\serving.key --logtostderr=true --prometheus-url=${prometheus-url} --metrics-relist-interval=70s --v=10 --config=D:\adapter\config.yaml --lister-kubeconfig=D:\adapter\k8s-config.yaml --authorization-kubeconfig=D:\adapter\k8s-config.yaml --authentication-kubeconfig=D:\adapter\k8s-config.yaml
參考:
Kubernetes pod autoscaler using custom metrics
Kubernetes API Aggregation Setup — Nuts & Bolts