prometheus-operator 詳細總結(helm一鍵安裝)


一、介紹prometheus-operator 二、查看配置rbac授權 三、helm安裝prometheus-operator 四、配置監控k8s組件 五、granafa添加新數據源 六、監控mysql
七、alertmanager配置 最后、卸載prometheus
-operator

 

一、概述

The Prometheus resource 聲明性地描述了Prometheus deployment所需的狀態,而ServiceMonitor描述了由Prometheus 監視的目標集

   

Service

  

ServiceMonitor

  通過selector匹配service。ps:這里的team:frontend,下面會提及到。通過標簽選擇endpoints,實現動態發現服務

  port:web  #對應service的端口名

   

Prometheus

  通過matchLabels匹配ServiceMonitor的標簽

  

  規則綁定:通過ruleSelector(匹配標簽 prometheus:service-prometheus)選擇PrometheusRule里面的labels  prometheus:service-prometheus

  

PrometheusRule

      規則配置

  

上面的架構配置后,使得前端團隊能夠創建新的servicemonitor和serive,從而允許對Prometheus進行動態重新配置

 Altertmanager

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  generation: 1
  labels:
    app: prometheus-operator-alertmanager
    chart: prometheus-operator-0.1.27
    heritage: Tiller
    release: my-release
  name: my-release-prometheus-oper-alertmanager
  namespace: default
spec:
  baseImage: quay.io/prometheus/alertmanager
  externalUrl: http://my-release-prometheus-oper-alertmanager.default:9093
  listenLocal: false
  logLevel: info
  paused: false
  replicas: 1
  retention: 120h
  routePrefix: /
  serviceAccountName: my-release-prometheus-oper-alertmanager
  version: v0.15.2

 

 

二、查看配置rbac授權(默認下面的不用配置

  如果激活了RBAC授權,則必須為prometheus和prometheus-operator創建RBAC規則,為prometheus-operator創建了一個ClusterRole和一個ClusterRoleBinding。

  2.1 為prometheus sa賦予相關權限

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

  2.2為prometheus-operator sa賦予相關權限,詳細參考官方文檔,這里就補貼出來了

    https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html

 

三、通過helm安裝prometheus-operator

github官方鏈接

  https://github.com/helm/charts/tree/master/stable/prometheus-operator

安裝命令

  $ helm install --name my-release stable/prometheus-operator

安裝指定參數,比如prometheus的serivce type改為nodeport,默認為ClusterIP,(prometheus-operator service文件 官方的文檔設置了cluster:None導致不能直接修改,辦法是部署后,再通過kubectl -f service.yaml實現修改為nodeport

  $ helm install --name my-release stable/prometheus-operator --set prometheus.service.type=NodePort  --set prometheus.service.nodePort=30090

或者安裝指定yaml文件

      $  helm install --name my-release stable/prometheus-operator  -f values1.yaml,values2.yaml

 

四、配置監控k8s組件

  4.1配置監控kubelet(默認沒監控上,因為名字為kubelet的servicemonitor 使用了http方式訪問endpoint的10255,我在rancher搭建的k8s上是使用https的10250端口),默認配置如下:

   

  參考官方文檔https://coreos.com/operators/prometheus/docs/latest/user-guides/cluster-monitoring.html,修改servicemonitor,如下

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelet
  labels:
    k8s-app: kubelet
spec:
  jobLabel: k8s-app
  endpoints: #這里默認使用http方式,而且沒有使用tls,修改為如下紅色配置
  - port: https-metrics scheme: https interval: 30s tlsConfig: insecureSkipVerify: true bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token  - port: https-metrics scheme: https path: /metrics/cadvisor interval: 30s honorLabels: true tlsConfig: insecureSkipVerify: true bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 
  selector:
    matchLabels:
      k8s-app: kubelet
  namespaceSelector:
    matchNames:
    - kube-system

     執行修改kubectl apply -f 上面的文件.yaml

   4.2配置監控kube-controller-manager

     由於我這里部署的kube-controller-manager不是pod形式啟動的,而是直接容器啟動,導致Service selector無法選擇對應的pod,因此查看Endpoints的配置是沒有subset.ip的,最后導致prometheus的target不能抓取到數據,因此我修改endpoints文件(添加紅色字段的內容,ip改為master運行的主機ip),同時取消Service的selector如下:

    

 

     kubectl apply  -f   上面的文件.yaml

    kubectl edit svc  my-release-prometheus-oper-kube-scheduler 畫面如下,把紅色的selector刪除,:wq保存

    

  4.3同理配置kube-scheduler,端口改為10252,省略。

   4.4配置etcd

   Service配置:

     

    ServiceMonitor配置:

      

   4.5jobLabel的作用:

    我配置Service的jobLabel為kube-schedulerservi

      

    target顯示(刷新頁面等待一些時間,才會看到結果)如下:

        

五、granafa添加新數據源(默認有一個數據源,為了區分應用和默認的監控,這里再添加一個應用的)

  5.1定義資源Prometheus

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app: prometheus
    prometheus: service-prometheus
  name: service-prometheus
  namespace: monitoring
spec:
    ....

  5.2 查看grafana-datasource configmap默認配置

kubectl  get configmap my-release-prometheus-oper-grafana-datasource -o yaml
apiVersion: v1
data:
  datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: service-prometheus
      type: prometheus
      url: http://service-ip:9090/ #這個沒測試過,有空再研究
      access: proxy
      isDefault: true
kind: ConfigMa

  5.3修改grafana-datasource configmap

 

六、監控mysql

  要修改的默認值如下,values.yaml

mysqlRootPassword: testing
mysqlUser: mysqlu
mysqlPassword: mysql123
mysqlDatabase: mydb

metrics:
  enabled: true
  image: prom/mysqld-exporter
  imageTag: v0.10.0
  imagePullPolicy: IfNotPresent
  resources: {}
  annotations: {}
    # prometheus.io/scrape: "true"
    # prometheus.io/port: "9104"
  livenessProbe:
    initialDelaySeconds: 15
    timeoutSeconds: 5
  readinessProbe:
    initialDelaySeconds: 5
    timeoutSeconds: 1

 6.1安裝mysql

  helm install --name my-release2 -f values.yaml stable/mysql    

   6.2創建pv

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-release2-mysql
spec:
  capacity:
    storage: 8Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  hostPath:
    path: /data

  6.3創建mysql對應ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: my-release2-mysql
    heritage: Tiller
    release: my-release
  name: my-release2-mysql
  namespace: default
spec:
  endpoints:
  - interval: 15s
    port: metrics
  jobLabel: jobLabel
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      app: my-release2-mysql
      release: my-release2

   6.4granafa配置

  https://grafana.com/dashboards/6239 ,這里下載json模版

  然后導入granafa,datasource選擇默認的就可以了。

 

七、alertmanager配置(默認不用配置)

  7.1那prometheus資源如何識別alertmanager呢?那是通過prometheus的字段alerting實現匹配alertmanager  service,如下:

  prometheus實例

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app: prometheus-operator-prometheus
  name: my-release-prometheus-oper-prometheus
  namespace: default
spec:
  alerting:
    alertmanagers:
    - name: my-release-prometheus-oper-alertmanager #匹配名為my-release-prometheus-alertmanager 的service namespace: default
      pathPrefix: /
      port: web  
  ruleSelector:   #選擇label為如下的PrometheusRule
   matchLabels:
      app: promethetus-operator  
      release: my-release

   alertmanager實例

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  labels:
    app: prometheus-operator-alertmanager
    chart: prometheus-operator-0.1.27
    heritage: Tiller
    release: my-release
  name: my-release-prometheus-oper-alertmanager #secretname用到這里的name namespace: default
spec:
  baseImage: quay.io/prometheus/alertmanager
  externalUrl: http://my-release-prometheus-oper-alertmanager.default:9093
  listenLocal: false
  logLevel: info
  paused: false
  replicas: 1
  retention: 120h
  routePrefix: /
  serviceAccountName: my-release-prometheus-oper-alertmanager
  version: v0.15.2

  7.2 alertmanager實例如何重新讀取alertmanager的配置文件配置呢???是通過prometheus-operator/deployment.yaml里面的- --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1實現

  secret22.yaml 

apiVersion: v1
data:
  alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IERlYWRNYW5zU3dpdGNoCiAgICByZWNlaXZlcjogIm51bGwiCg==
kind: Secret #這些加密內容是alertmanager的配置參數,在linux可以通過 echo "上面data序列"|base64 -d 解密
metadata:
  labels:
    app: prometheus-operator-alertmanager
    chart: prometheus-operator-0.1.27
    heritage: Tiller
    release: my-release
  name: alertmanager-my-release-prometheus-oper-alertmanager #必須為alertmanager-名字 namespace: default
type: Opaque

   詳情:https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus-operator/deployment.yaml

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: my-release-prometheus-oper-operator
  namespace: default
  template:
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/my-release-prometheus-oper-kubelet
        - --localhost=127.0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.25.0
     - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1 #通過這個容器重新加載alertmanager的配置,具體實現官網沒寫
        image: quay.io/coreos/prometheus-operator:v0.25.0

  PrometheusRule實現規則讀取

   all.rules.yaml      參考:https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/alertmanager/rules/all.rules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: prometheus-operator
  labels:
    app: prometheus-operator  #Prometheus資源的ruleSelector會選擇這個標簽

  

   7.3 重點:重新加載alertmanager配置的操作,如下:

         7.3.1:定義alertmanager.yaml文件   

global:
  resolve_timeout: 5m
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'webhook'
receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://alertmanagerwh:30500/'
ps:不能用tab作為空格,否則會報錯

     7.3.2:先刪除再創建名為alertmanager-{ALERTMANAGER_NAME}的secret其中{ALERTMANAGER_NAME}對應alertmanager實例名稱,按照上面例子就是my-release-prometheus-oper-alertmanager)    

kubectl delete secret alertmanager-my-release-prometheus-oper-alertmanager
kubectl create secret generic alertmanager-my-release-prometheus-oper-alertmanager --from-file=alertmanager.yaml

               7.3.3 :查看是否生效

      等幾秒鍾中,在alertmanager的ui界面status就可以看看是否生效了。其他配置請查看https://prometheus.io/docs/alerting/configuration/

    微信告警方法   https://www.cnblogs.com/jiuchongxiao/p/9024211.html

 

最后、如何卸載prometheus-operator(重新安裝,可以參考這個)

    1、直接通過helm delete刪除

     $ helm delete my-release

 2、刪除相關crd (helm install的時候自動安裝了crd資源)

kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com

   3、刪除helm 上的my-release

  helm del --purge my-release

其他

  

  

  

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM