prometheus添加自定義監控與告警(etcd為例)


一、步驟及注意事項(前提,部署參考部署篇)

  1. 一般etcd集群會開啟HTTPS認證,因此訪問etcd需要對應的證書
  2. 使用證書創建etcd的secret
  3. 將etcd的secret掛在到prometheus
  4. 創建etcd的servicemonitor對象(匹配kube-system空間下具有k8s-app=etcd標簽的service)
  5. 創建service關聯被監控對象

二、實際操作步驟(etcd證書默認路徑:/etc/kubernetes/pki/etcd/)

1、創建etcd的secret

cd /etc/kubernetes/pki/etcd/
kubectl create secret generic etcd-certs --from-file=healthcheck-client.crt --from-file=healthcheck-client.key --from-file=ca.crt -n monitoring

2、添加secret到名為k8s的prometheus對象上(kubectl edit prometheus k8s -n monitoring或者修改yaml文件並更新資源)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.11.0

3、創建servicemonitoring對象

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd-k8s
spec:
  jobLabel: k8s-app
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  selector:
    matchLaels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - kube-system

4、創建service並自定義endpoint(考慮到etcd可能部署在kubernetes集群外,因此自定義endpoint)

apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 1.1.1.11
-
ip: 1.1.1.12
- ip: 1.1.1.13
    nodeName: etcd-master
  ports:
  - name: port
    port: 2379
    protocol: TCP

此處正常能通過prometheus的頁面看到對應的監控信息了

若監控中出現報錯:connection refused,修改/etc/kubernetes/manifests下的etcd.yaml文件

方法一:--listen-client-urls=https://0.0.0.0:2379

方法二:--listen-client-urls=https://127.0.0.1:2379,https://1.1.1.11:2379

 

三、創建自定義告警

  1. 創建一個prometheusRule資源后再prometheus的pod中會生成對應的告警配置文件
  2. 注意:此處的標簽一定要匹配
  3. 告警項:若etcd集群有一半以上的節點可用,則認為集群可用,否則產生告警
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: etcd-rules
  namespace: monitoring
spec:
  groups:
  - name: etcd-exporter.rules
    rules:
    - alert: EtcdClusterUnavailable
      annotations:
        summary: etcd cluster small
        description: If one more etcd peer goes down the cluster will be unavailable
      expr: |
        count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2-1)
      for: 3m
      labels:
        severity: critical


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM