一、自動發現類型

監控某個statefulset服務的時候,我在service文件中定義了個EP,然后把pod的ip寫死在配置文件中,這樣,當pod重啟后,IP地址變化,就監控不到數據了,這肯定是不合理的。
如果在我們的 Kubernetes 集群中有了很多的 Service/Pod,那么我們都需要一個一個的去建立一個對應的 ServiceMonitor 對象來進行監控嗎?這樣豈不是也很麻煩么?
為解決上面的問題,Prometheus Operator 為我們提供了一個額外的抓取配置的來解決這個問題,我們可以通過額外的配置來獲取k8s的資源監控(pod、service、node等)。
promethues支持多種文件發現。

其中通過kubernetes_sd_configs,可以達到我們想要的目的,監控其各種資源。kubernetes SD 配置允許從kubernetes REST API接受搜集指標,且總是和集群保持同步狀態,以下任何一種role類型都能夠配置來發現我們想要的對象,來自官網翻譯的。
1、Node
Node role發現每個集群中的目標是通過默認的kubelet的HTTP端口。目標地址默認是kubernetes如下地址中node的第一個地址(NodeInternalIP, NodeExternalIP,NodeLegacyHostIP, and NodeHostName.)
可用的meta標簽有:
__meta_kubernetes_node_name: The name of the node object. __meta_kubernetes_node_label_<labelname>: Each label from the node object. __meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object. __meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object. __meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object. __meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.
此外,node的實例標簽將會被設置成從API server傳遞過來的node的name。
2、Service
service角色會為每個服務發現一個服務端口。對於黑盒監控的服務,這個比較有用。address將會被設置成service的kubernetes DNS名稱以及各自的服務端口。
Available meta labels:
__meta_kubernetes_namespace: The namespace of the service object. __meta_kubernetes_service_annotation_<annotationname>: Each annotation from the service object. __meta_kubernetes_service_annotationpresent_<annotationname>: "true" for each annotation of the service object. __meta_kubernetes_service_cluster_ip: The cluster IP address of the service. (Does not apply to services of type ExternalName) __meta_kubernetes_service_external_name: The DNS name of the service. (Applies to services of type ExternalName) __meta_kubernetes_service_label_<labelname>: Each label from the service object. __meta_kubernetes_service_labelpresent_<labelname>: true for each label of the service object. __meta_kubernetes_service_name: The name of the service object. __meta_kubernetes_service_port_name: Name of the service port for the target. __meta_kubernetes_service_port_protocol: Protocol of the service port for the target.
3、Pod
Pod role會發現所有pods以及暴露的容器作為target。每個容器聲明一個端口,一個單獨的target就會生成。如果一個容器沒有指定端口,通過relabel手動指定一個端口,一個port-free target容器將會生成。
Available meta labels:
__meta_kubernetes_namespace: The namespace of the pod object. __meta_kubernetes_pod_name: The name of the pod object. __meta_kubernetes_pod_ip: The pod IP of the pod object. __meta_kubernetes_pod_label_<labelname>: Each label from the pod object. __meta_kubernetes_pod_labelpresent_<labelname>: truefor each label from the pod object. __meta_kubernetes_pod_annotation_<annotationname>: Each annotation from the pod object. __meta_kubernetes_pod_annotationpresent_<annotationname>: true for each annotation from the pod object. __meta_kubernetes_pod_container_init: true if the container is an InitContainer __meta_kubernetes_pod_container_name: Name of the container the target address points to. __meta_kubernetes_pod_container_port_name: Name of the container port. __meta_kubernetes_pod_container_port_number: Number of the container port. __meta_kubernetes_pod_container_port_protocol: Protocol of the container port. __meta_kubernetes_pod_ready: Set to true or false for the pod's ready state. __meta_kubernetes_pod_phase: Set to Pending, Running, Succeeded, Failed or Unknown in the lifecycle. __meta_kubernetes_pod_node_name: The name of the node the pod is scheduled onto. __meta_kubernetes_pod_host_ip: The current host IP of the pod object. __meta_kubernetes_pod_uid: The UID of the pod object. __meta_kubernetes_pod_controller_kind: Object kind of the pod controller. __meta_kubernetes_pod_controller_name: Name of the pod controller.
4、endpoints
endpoints role從每個服務監聽的endpoints發現。每個endpoint都會發現一個port。如果endpoint是一個pod,所有包含的容器不被綁定到一個endpoint port,也會被targets被發現。
Available meta labels:
__meta_kubernetes_namespace: The namespace of the endpoints object. __meta_kubernetes_endpoints_name: The names of the endpoints object. For all targets discovered directly from the endpoints list (those not additionally inferred from underlying pods), the following labels are attached: __meta_kubernetes_endpoint_hostname: Hostname of the endpoint. __meta_kubernetes_endpoint_node_name: Name of the node hosting the endpoint. __meta_kubernetes_endpoint_ready: Set to true or false for the endpoint's ready state. __meta_kubernetes_endpoint_port_name: Name of the endpoint port. __meta_kubernetes_endpoint_port_protocol: Protocol of the endpoint port. __meta_kubernetes_endpoint_address_target_kind: Kind of the endpoint address target. __meta_kubernetes_endpoint_address_target_name: Name of the endpoint address target. If the endpoints belong to a service, all labels of the role: service discovery are attached. For all targets backed by a pod, all labels of the role: pod discovery are attached.
5、ingress
ingress role將會發現每個ingress。ingress在黑盒監控上比較有用。address將會被設置成ingress指定的配置。
Available meta labels:
__meta_kubernetes_namespace: The namespace of the ingress object. __meta_kubernetes_ingress_name: The name of the ingress object. __meta_kubernetes_ingress_label_<labelname>: Each label from the ingress object. __meta_kubernetes_ingress_labelpresent_<labelname>: true for each label from the ingress object. __meta_kubernetes_ingress_annotation_<annotationname>: Each annotation from the ingress object. __meta_kubernetes_ingress_annotationpresent_<annotationname>: true for each annotation from the ingress object. __meta_kubernetes_ingress_scheme: Protocol scheme of ingress, https if TLS config is set. Defaults to http. __meta_kubernetes_ingress_path: Path from ingress spec. Defaults to /.
二、自動發現Pod配置
比如業務上有一個微服務,類型為statefulset,啟動后是2個pod的副本集,pod暴露的數據接口為http://pod_ip:7000/metrics。由於pod每次重啟后,ip都會變化,所以只能通過自動發現的方式獲取數據。
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
run: jx3recipe
name: jx3recipe
annotations:
prometheus.io/scrape: "true"
spec:
selector:
matchLabels:
app: jx3recipe
serviceName: jx3recipe-service
replicas: 2
template:
metadata:
labels:
app: jx3recipe
appCluster: jx3recipe-cluster
spec:
terminationGracePeriodSeconds: 20
containers:
- image: hub.kce.ooo.com/jx3pvp/jx3recipe:qa-latest
imagePullPolicy: Always
securityContext:
runAsUser: 1000
name: jx3recipe
lifecycle:
preStop:
exec:
command: ["kill","-s","SIGINT","1"]
volumeMounts:
- name: config-volume
mountPath: /data/conf.yml
subPath: conf.yml
resources:
requests:
cpu: "100m"
memory: "500Mi"
env:
- name: JX3PVP_ENV
value: "qa"
- name: JX3PVP_RUN_MODE
value: "k8s"
- name: JX3PVP_SERVICE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: JX3PVP_LOCAL_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: JX3PVP_CONSUL_IP
value: $(CONSUL_AGENT_SERVICE_HOST)
ports:
- name: biz
containerPort: 8000
protocol: "TCP"
- name: admin
containerPort: 7000
protocol: "TCP"
volumes:
- name: config-volume
configMap:
name: app-configure-file-jx3recipe
items:
- key: jx3recipe.yml
path: conf.yml
1、創建發現規則
設定發現pod規則:文件名為promethues-additional.yaml
- pod名稱的label為jx3recipe
- pod的label_appCluster匹配為 jx3recipe-cluster
- pod的address為http://.*:7000/metrics格式
- job_name: 'kubernetes-service-pod'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: jx3recipe
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: ["__meta_kubernetes_pod_label_appCluster"]
regex: "jx3recipe-cluster"
action: keep
- source_labels: [__address__]
action: keep
regex: '(.*):7000'
2、創建對應的Secret對象
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
創建完成后,會將上面配置信息進行 base64 編碼后作為 prometheus-additional.yaml 這個 key 對應的值存在:
apiVersion: v1 data: prometheus-additional.yaml: LSBqb2JfbmFtZTogJ2t1YmVybmV0ZXMtc2VydmljZS1wb2QnCiAga3ViZXJuZXRlc19zZF9jb25maWdzOgogIC0gcm9sZTogcG9kCiAgcmVsYWJlbF9jb25maWdzOgogIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX3BvZF9jb250YWluZXJfbmFtZV0KICAgIGFjdGlvbjogcmVwbGFjZQogICAgdGFyZ2V0X2xhYmVsOiBqeDNyZWNpcGUKICAtIGFjdGlvbjogbGFiZWxtYXAKICAgIHJlZ2V4OiBfX21ldGFfa3ViZXJuZXRlc19wb2RfbGFiZWxfKC4rKQogIC0gc291cmNlX2xhYmVsczogIFsiX19tZXRhX2t1YmVybmV0ZXNfcG9kX2xhYmVsX2FwcENsdXN0ZXIiXQogICAgcmVnZXg6ICJqeDNyZWNpcGUtY2x1c3RlciIKICAgIGFjdGlvbjoga2VlcAogIC0gc291cmNlX2xhYmVsczogW19fYWRkcmVzc19fXQogICAgYWN0aW9uOiBrZWVwCiAgICByZWdleDogJyguKik6NzAwMCcK kind: Secret metadata: creationTimestamp: "2019-09-10T09:32:22Z" name: additional-configs namespace: monitoring resourceVersion: "1004681" selfLink: /api/v1/namespaces/monitoring/secrets/additional-configs uid: e455d657-d3ad-11e9-95b4-fa163e3c10ff type: Opaque
然后我們只需要在聲明 prometheus 的資源對象文件中添加上這個額外的配置:(prometheus-prometheus.yaml)
3、promethues添加資源對象
修改prometheus-prometheus.yaml文件
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
beta.kubernetes.io/os: linux
replicas: 2
secrets:
- etcd-certs
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.5.0
增加了下面這一段:
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
4、應用配置
kubectl apply -f prometheus-prometheus.yaml
過一段時間,刷新promethues上的config,將會看到下面紅色框框的配置。

5、添加權限
在 Prometheus Dashboard 的配置頁面下面我們可以看到已經有了對應的的配置信息了,但是我們切換到 targets 頁面下面卻並沒有發現對應的監控任務,查看 Prometheus 的 Pod 日志:

可以看到有很多錯誤日志出現,都是xxx is forbidden,這說明是 RBAC 權限的問題,通過 prometheus 資源對象的配置可以知道 Prometheus 綁定了一個名為 prometheus-k8s 的 ServiceAccount 對象,而這個對象綁定的是一個名為 prometheus-k8s 的 ClusterRole:(prometheus-clusterRole.yaml)
修改為:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-k8s rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get
更新上面的 ClusterRole 這個資源對象,然后重建下 Prometheus 的所有 Pod,正常就可以看到 targets 頁面下面有 kubernetes-service-pod這個監控任務了:

至此,一個自動發現pod的配置就完成了,其他資源(service、endpoint、ingress、node同樣也可以通過自動發現的方式實現。)
