prometheus-operator 安裝部署

本文轉載自查看原文 2021-11-29 11:46 1126 docker

我們用的是阿里雲托管的K8S集群1.21版本，用的 kube-prometheus 0.9 版本，如果你也是用的阿里雲托管的ACK，提前提工單打開授權管理，不然安裝的時候會找不到RoleBinding。

參考文檔：

http://www.servicemesher.com/blog/prometheus-operator-manual/
https://github.com/coreos/prometheus-operator
https://github.com/coreos/kube-prometheus
https://www.cnblogs.com/twobrother/p/11165417.html

1、概述

1.1在k8s中部署Prometheus監控的方法

通常在k8s中部署prometheus監控可以采取的方法有以下三種

通過yaml手動部署
operator部署
通過helm chart部署

1.2 什么是Prometheus Operator

Prometheus Operator的本職就是一組用戶自定義的CRD資源以及Controller的實現，Prometheus Operator負責監聽這些自定義資源的變化，並且根據這些資源的定義自動化的完成如Prometheus Server自身以及配置的自動化管理工作。以下是Prometheus Operator的架構圖：

在配置prometheus-operator 監控jvm之前，我們必須要了解prometheus-operator的4個crd組件，這四個CRD作用如下：

Prometheus: 由 Operator 依據一個自定義資源kind: Prometheus類型中，所描述的內容而部署的 Prometheus Server 集群，可以將這個自定義資源看作是一種特別用來管理Prometheus Server的StatefulSets資源。
ServiceMonitor: 一個Kubernetes自定義資源(和kind: Prometheus一樣是CRD)，該資源描述了Prometheus Server的Target列表，Operator 會監聽這個資源的變化來動態的更新Prometheus Server的Scrape targets並讓prometheus server去reload配置(prometheus有對應reload的http接口/-/reload)。而該資源主要通過Selector來依據 Labels 選取對應的Service的endpoints，並讓 Prometheus Server 通過 Service 進行拉取（拉）指標資料(也就是metrics信息),metrics信息要在http的url輸出符合metrics格式的信息,ServiceMonitor也可以定義目標的metrics的url.
Alertmanager：Prometheus Operator 不只是提供 Prometheus Server 管理與部署，也包含了 AlertManager，並且一樣通過一個 kind: Alertmanager 自定義資源來描述信息，再由 Operator 依據描述內容部署 Alertmanager 集群。
PrometheusRule:對於Prometheus而言，在原生的管理方式上，我們需要手動創建Prometheus的告警文件，並且通過在Prometheus配置中聲明式的加載。而在Prometheus Operator模式中，告警規則也編程一個通過Kubernetes API 聲明式創建的一個資源.告警規則創建成功后，通過在Prometheus中使用想servicemonitor那樣用ruleSelector通過label匹配選擇需要關聯的PrometheusRule即可。

2.安裝部署

1.下載部署包

wget -c https://github.com/prometheus-operator/kube-prometheus/archive/v0.7.0.zip

2.修改文件

其中kubelet的metrics采集端口，10250是https的，10255是http的

kube-scheduler的metrics采集端，10259是https的，10251是http的

Kube-controller的metrics采集端，10257是https的，10252是http的

測試：在主機上curl相關端口/metrics，即可獲取相關metrics，如獲取kubelet相關指標只需curl http://127.0.0.1:10255/metrics即可

kubernetes-serviceMonitorKubeScheduler.yaml
kubernetes-serviceMonitorKubeControllerManager.yaml
kubernetes-serviceMonitorKubelet.yaml

Yaml文件中相關信息采集默認采用https的端口，即10250端口，這樣我們需要將port的端口改為http-metrics,同樣的scheme改為http

參考:https://www.cnblogs.com/xinbat/p/15116903.html

3.部署

# cd kube-prometheus\manifests\setup

# kubectl apply .

# cd kube-prometheus\manifests\

# kubectl apply .

為prometheus、grafana、alertmanager 創建 ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: prometheus-alertmangaer-grafana-ingress
  namespace: monitoring
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: 'true'
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/connection-proxy-header: "keep-alive"
    nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
    nginx.ingress.kubernetes.io/proxy-body-size: 80m
spec:
  tls:
  - hosts:
    - 'prometheus.xxx.com'
    secretName: xxx-com-secret
  - hosts:
    - 'grafana.xxx.com'
    secretName: xxx-com-secret
  - hosts:
    - 'alertmanager.xxx.com'
    secretName: xxx-com-secret
  rules:
  - host: prometheus.xxx.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-k8s
          servicePort: 9090
  - host: grafana.xxx.com
    http:
      paths:
      - path: /
        backend:
          serviceName: grafana
          servicePort: 3000
  - host: alertmanager.xxx.com
    http:
      paths:
      - path: /
        backend:
          serviceName: alertmanager-main
          servicePort: 9093

解決Watchdog、ControllerManager、Scheduler監控問題

Watchdog是一個正常的報警，這個告警的作用是：如果alermanger或者prometheus本身掛掉了就發不出告警了，因此一般會采用另一個監控來監控prometheus，或者自定義一個持續不斷的告警通知，哪一天這個告警通知不發了，說明監控出現問題了。prometheus operator已經考慮了這一點，本身攜帶一個watchdog，作為對自身的監控。

如果需要關閉，刪除或注釋掉Watchdog部分

prometheus-rules.yaml

...
  - name: general.rules
    rules:
    - alert: TargetDown
      annotations:
        message: 'xxx'
      expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10
      for: 10m
      labels:
        severity: warning
#    - alert: Watchdog
#      annotations:
#        message: |
#          This is an alert meant to ensure that the entire alerting pipeline is functional.
#          This alert is always firing, therefore it should always be firing in Alertmanager
#          and always fire against a receiver. There are integrations with various notification
#          mechanisms that send a notification when this alert is not firing. For example the
#          "DeadMansSnitch" integration in PagerDuty.
#      expr: vector(1)
#      labels:
#        severity: none

對應的Watchdog的ServiceMonitor也可以刪除。

KubeControllerManagerDown、KubeSchedulerDown的解決

原因是因為在prometheus-serviceMonitorKubeControllerManager.yaml中有如下內容，但默認安裝的集群並沒有給系統kube-controller-manager組件創建svc

  selector:
    matchLabels: k8s-app: kube-controller-manager

修改kube-controller-manager的監聽地址：

# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=0.0.0.0


# netstat -lntup|grep kube-contro                                      
tcp6       0      0 :::10257                :::*                    LISTEN      38818/kube-controll

創建

prometheus-kube-controller-manager-service.yaml

prometheus-kube-scheduler-service.yaml，以便serviceMonitor監聽

# cat  prometheus-kube-controller-manager-service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

# cat  prometheus-kube-scheduler-service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP

#10251是kube-scheduler組件 metrics 數據所在的端口，10252是kube-controller-manager組件的監控數據所在端口。

上面 labels 和 selector 部分，labels 區域的配置必須和我們上面的 ServiceMonitor 對象中的 selector 保持一致，selector下面配置的是component=kube-scheduler，為什么會是這個 label 標簽呢，我們可以去 describe 下 kube-scheduelr 這個 Pod

#  kubectl describe pod kube-scheduler-k8s-master -n kube-system
Name:                 kube-scheduler-k8s-master
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 k8s-master/10.6.76.25
Start Time:           Thu, 29 Aug 2019 09:21:01 +0800
Labels:               component=kube-scheduler
                      tier=control-plane

#  kubectl describe pod kube-controller-manager-k8s-master -n kube-system
Name:                 kube-controller-manager-k8s-master
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 k8s-master/10.6.76.25
Start Time:           Thu, 29 Aug 2019 09:21:01 +0800
Labels:               component=kube-controller-manager
                      tier=control-plane

瀏覽器ingress方式訪問

https://prometheus.xxx.com/

https://alertmanager.xxx.com/

https://grafana.xxx.com/
grafana默認賬號密碼admin admin需要重置密碼進入

參考:

https://www.cnblogs.com/huss2016/p/14865316.html

http://t.zoukankan.com/ssgeek-p-14441149.html

https://www.cnblogs.com/xinbat/p/15116903.html

https://www.cnblogs.com/zhangrui153169/p/13609172.html

https://blog.csdn.net/twingao/article/details/105261641

https://www.cnblogs.com/twobrother/p/11165417.html

https://blog.csdn.net/qq_43164571/article/details/119990724

https://www.kococ.cn/20210302/cid=697.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 prometheus-operator安裝部署 helm安裝prometheus-operator prometheus-operator prometheus-operator 詳細總結（helm一鍵安裝） Kubernetes1.13.1安裝prometheus-operator監控 prometheus-operator 詳細總結（helm一鍵安裝） prometheus-operator監控Kubernetes Prometheus-operator架構詳解使用 Prometheus-Operator 監控 Calico Prometheus-operator 介紹和配置解析