k8s Helm安裝Prometheus Operator


知識要求:
對於prometheus/alertmanager/grafana會簡單使用,知道配置文件大概是做什么的,要不一些概念性東西你可能不理解,頁面也不會操作,這里我不會太細的解釋。
 
 

 

 
1. 系統環境
  • 系統版本號CentOS 7.6
  • docker Client版本號18.09.7, Server版本號18.09.7
  • k8s版本號v1.16.2
  • helm Client版本號v2.13.1,Server版本號v2.13.1
 
確認heml鏡像源並更新鏡像倉庫
[root@ops1 test]# helm repo add stable http://mirror.azure.cn/kubernetes/charts/
[root@ops1 test]# helm repo list
NAME URL
local           http://127.0.0.1:8879/charts 
stable          http://mirror.azure.cn/kubernetes/charts/
incubator       http://mirror.azure.cn/kubernetes/charts-incubator/
[root@ops1 test]# helm repo update
 
2. 安裝Prometheus Operator
查看並拉取prometheus壓縮包,有興趣的同學可以看看具體內容
[root@ops1 test]# helm search prometheus
stable/prometheus-operator 8.12.0 0.37.0 Provides easy monitoring definitions for Kubernetes servi...
[root@ops1 test]# helm fetch stable/prometheus-operator --version 8.12.0
[root@ops1 test]# tar -zxf prometheus-operator-8.12.0.tgz 
tar: prometheus-operator/Chart.yaml:不可信的舊時間戳 1970-01-01 08:00:00
[root@ops1 test]# ls prometheus-operator
charts  Chart.yaml CONTRIBUTING.md crds README.md requirements.lock requirements.yaml templates values.yaml
 
 
helm安裝prometheus Operater,他的文件都是安裝在命名空間monitoring下
[root@ops1 test]# cat <<EOF > prometheus-operator-values.yaml
 
alertmanager:
  service: # 設置alertmanager網絡類型,方便外網測試訪問
    nodePort: 30091
    type: NodePort 
 
          
  alertmanagerSpec:
    storage: # 我使用了永久存儲,如果做測試,不用寫這一段
      volumeClaimTemplate:
        spec:
          storageClassName: prometheus-k8s
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
 
grafana:
  service: # 設置prometheus網絡類型,方便外網測試訪問
    type: NodePort
    nodePort: 30092
 
prometheus:
  service: # 設置prometheus網絡類型,方便外網測試訪問
    nodePort: 30090
    type: NodePort 
  prometheusSpec:
    storageSpec: # 我使用了永久存儲,如果做測試,不用寫這一段
       volumeClaimTemplate:
         spec:
           storageClassName: prometheus-k8s
           accessModes: ["ReadWriteOnce"]
           resources:
             requests:
               storage: 20Gi
kubeEtcd:
  service: # 1.16.2版本的etcd的檢測端口為2381
    port: 2381
    targetPort: 2381
EOF
[root@ops1 test]# helm install --name prometheus-operator --version=8.12.0 -f prometheus-operator-values.yaml \
    --namespace=monitoring stable/prometheus-operator
 
NAME: prometheus-operator
...... .......
NOTES:
The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus-operator"
 
Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.
 
[root@ops1 test]# kubectl get crd | grep monitoring
alertmanagers.monitoring.coreos.com 2020-04-08T02:59:54Z
podmonitors.monitoring.coreos.com 2020-04-08T02:59:57Z
prometheuses.monitoring.coreos.com 2020-04-08T02:59:57Z
prometheusrules.monitoring.coreos.com 2020-04-08T03:00:00Z
servicemonitors.monitoring.coreos.com 2020-04-08T03:00:02Z
thanosrulers.monitoring.coreos.com 2020-04-08T03:00:05Z
 
 
[root@ops1 prometheus-operator]# kubectl get svc -n monitoring
 
 
 
1. prometheus訪問界面: http://192.168.70.122:30090/graph#/alerts
 
2. alertmanager告警界面: http://192.168.70.122:30091/#/alerts
 
3. grafana界面,默認賬號密碼:admin/prom-operatorhttp://192.168.70.122:30092/dashboards ,
 
3. 配置prometheus監控和告警規則
 
我們查看prometheus文件,
[root@ops1 test]# kubectl get sts prometheus-prometheus-operator-prometheus -o yaml
      - args:
        - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
 
        volumeMounts:
        - mountPath: /etc/prometheus/config_out
          name: config-out
          readOnly: true
 
       - emptyDir: {}
        name: config-out
 
我們發現,prometheus的yaml配置文件就是pod本地文件,並沒有用已有的,這里就引出了我們原來的的概念,即prometheus的yaml配置文件是由Operator控制的,如圖。
上圖是Prometheus-Operator官方提供的架構圖,其中Operator是最核心的部分,作為一個控制器,他會去創建 Prometheus、ServiceMonitor、AlertManager以及PrometheusRule4個CRD資源對象,然后會一直監控並維持這4個資源對象的狀態。
其中創建的prometheus這種資源對象就是作為Prometheus Server存在,而ServiceMonitor就是exporter的各種抽象,exporter前面我們已經學習了,是用來提供專門提供metrics數據接口的工具,Prometheus就是通過ServiceMonitor提供的metrics數據接口去 pull 數據的,當然alertmanager這種資源對象就是對應的AlertManager的抽象,而PrometheusRule是用來被Prometheus實例使用的報警規則文件。
這樣我們要在集群中監控什么數據,就變成了直接去操作 Kubernetes 集群的資源對象了,是不是方便很多了。上圖中的 Service 和 ServiceMonitor 都是 Kubernetes 的資源,一個 ServiceMonitor 可以通過 labelSelector 的方式去匹配一類 Service,Prometheus 也可以通過 labelSelector 去匹配多個ServiceMonitor。
 
 
[root@ops1 test]# kubectl get prometheus
NAME VERSION REPLICAS AGE
prometheus-operator-prometheus v2.15.2 1 21m
[root@ops1 test]# kubectl get prometheus prometheus-operator-prometheus -o yaml
 
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: prometheus-operator-alertmanager
      namespace: monitoring
      pathPrefix: /
      port: web
  baseImage: quay.io/prometheus/prometheus
  enableAdminAPI: false
  externalUrl: http://prometheus-operator-prometheus.monitoring:9090
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector: # 監控規則,通過crd對象中podmonitors帶有這兩個標簽會被選中
    matchLabels:
      release: prometheus-operator
  portName: web
  replicas: 1
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector: # 報警規則,通過crd對象中prometheusrules帶有這兩個標簽會被選中
    matchLabels:
      app: prometheus-operator
      release: prometheus-operator
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-operator-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prometheus-operator
  storage: # 這里的存儲,就是咱們原來定義的,如果不定義,則為空 
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        storageClassName: prometheus-k8s
  version: v2.15.2
 
我們先來配置服務監控.
首先,我們建立兩個tomcat,提供metrics接口,你用其他的服務,也可以。
 
[root@ops1 test]# kubectl create ns tomcat
namespace/tomcat created
[root@ops1 test]# cat <<EOF > tomcat-test1.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-test1
  namespace: tomcat
  labels:
    k8s.eip.work/layer: svc
    k8s.eip.work/name: tomcat-test1
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s.eip.work/layer: svc
      k8s.eip.work/name: tomcat-test1
  template:
    metadata:
      labels:
        k8s.eip.work/layer: svc
        k8s.eip.work/name: tomcat-test1
    spec:
      containers:
        - name: tomcat-test1
          image: 'registry.cn-beijing.aliyuncs.com/wangzt/k8s/tomcat:v1.3'
 
---
apiVersion: v1
kind: Service
metadata:
  name: tomcat-test1
  namespace: tomcat
  labels:
    k8s.eip.work/layer: svc
    k8s.eip.work/name: tomcat-test1
spec:
  selector:
    k8s.eip.work/layer: svc
    k8s.eip.work/name: tomcat-test1
  type: NodePort
  ports:
    - name: tomcat-web
      port: 80
      targetPort: 8080
    - name: metrics
      port: 9090
      targetPort: 9090
EOF
[root@ops1 test]# kubectl apply -f tomcat-test1.yaml
[root@ops1 test]# cp tomcat-test1.yaml tomcat-test2.yaml && sed -i 's&tomcat-test1&tomcat-test2&' tomcat-test2.yaml && \
     sed -i 's&v1.3&v0.8&' tomcat-test2.yaml && kubectl apply -f tomcat-test2.yaml
 
這時我們可以發現,tomcat-test1是好的,tomcat-test2是不可用的,這樣方便對比
[root@ops1 test]# curl http://10.100.33.236:9090/metrics
# HELP tomcat_bytesreceived_total Tomcat global bytesReceived
# TYPE tomcat_bytesreceived_total counter
 
         
 
對命名空間tomcat帶有標簽k8s.eip.work/layer: svc的服務進行監控
[root@ops1 test]# cat <<EOF > prometheus-serviceMonitorTomcatTest.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor # 提交給這個crd接收
metadata:
  labels:
    app: prometheus-operator-tomcat-test
    chart: prometheus-operator-8.12.3
    release: prometheus-operator # 根據這個標簽進行篩選
  name: prometheus-operator-tomcat-test
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s # 每30s獲取一次信息
    path: /metrics # 對應service的訪問路徑
    port: metrics # 對應service的端口名
  jobLabel: k8s.eip.work/layer
  namespaceSelector: # 表示去匹配某一命名空間中的service,如果想從所有的namespace中匹配用any: true
    matchNames:
    - tomcat 
  selector: # 匹配的 Service 的labels,如果使用mathLabels,則下面的所有標簽都匹配時才會匹配該service,如果使用matchExpressions,則至少匹配一個標簽的service都會被選擇
    matchLabels:
      k8s.eip.work/layer: svc # 匹配servic帶這個標簽
EOF
[root@ops1 test]# kubectl apply -f prometheus-serviceMonitorTomcatTest.yaml 
servicemonitor.monitoring.coreos.com/prometheus-operator-tomcat-test created
 
 
這時我們訪問prometheus配置界面,就顯示一個好用一個不好用了
 
3. 配置報警觸發規則
 
服務可用性超過一半,我們先來看這條語句
 
然后我們來配置報警規則。服務死亡率超過一半則報警。
添加標簽k8s_eip_work_layer: svc,alertmanager將會根據此標簽選擇報警規則
[root@ops1 test]# cat <<EOF > prometheus-operator-tomcat-rules.yaml 
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    chart: prometheus-operator-8.12.3
    heritage: Tiller
    app: prometheus-operator
    release: prometheus-operator
  name: prometheus-operator-tomcat-test.rules
  namespace: monitoring
spec:
  groups:
  - name: tomcat-test.rules
    rules:
    - alert: tomcat-down
      expr: count( up{namespace="tomcat"} == 0 )by (job) > ( count(up{namespace="tomcat"})by (job) / 2 - 1)
      for: 2m
      labels:
        alertManagerRule: node # 注意這行,alertmangar要根據這個標簽來進行報警
      annotations:
        description: "{{$labels.instance}}: Tomcat Service Is Down"
EOF
[root@ops1 test]# kubectl apply -f prometheus-operator-tomcat-rules.yaml
prometheusrule.monitoring.coreos.com/prometheus-operator-tomcat-test.rules created
 
 
我們可以進入到容器里,看到規則已經添加進去了
[root@ops1 test]# kubectl exec -it prometheus-prometheus-operator-prometheus-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-prometheus-operator-prometheus-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-prometheus-operator-prometheus-rulefiles-0/
 
然后我們就能在 http://192.168.70.122:30090/alerts 里的Pending里看見了。
 
我們可以看到頁面中出現了我們剛剛定義的報警規則信息,而且報警信息中還有狀態顯示。一個報警信息在生命周期內有下面3種狀態:
  • inactive: 表示當前報警信息既不是firing狀態也不是pending狀態
  • pending: 表示在設置的閾值時間范圍內被激活了
  • firing: 表示超過設置的閾值時間被激活了
 
等時間到了,默認2分鍾,狀態就會變為Firing,然后出發報警規則,發送報警信息給alertmanager
 
這時我們去alertmanager里就能看到報警被觸發了。
好了,下一步我們去配置郵件和釘釘報警
 
4. alertmanager告警
 
我們先查看alertmanager的配置文件,發現alertmanager配置文件是通過secret來配置的。
[root@ops1 test]# kubectl get sts alertmanager-prometheus-operator-alertmanager -o yaml
 
      - args:
        - --config.file=/etc/alertmanager/config/alertmanager.yaml
 
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
 
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-prometheus-operator-alertmanager # 文件位置
 
          
[root@ops1 test]# kubectl get secret alertmanager-prometheus-operator-alertmanager -o yaml > alertmanager-cm-old.yaml
apiVersion: v1
data:
  alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICByZWNlaXZlcjogIm51bGwiCg==
 
[root@ops1 test]# echo "Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICByZWNlaXZlcjogIm51bGwiCg==" | base64 -d
global:
  resolve_timeout: 5m
receivers:
- name: "null"
route:
  group_by:
  - job
  group_interval: 5m
  group_wait: 30s
  receiver: "null"
  repeat_interval: 12h
  routes:
  - match:
      alertname: Watchdog
    receiver: "null"
 
 
添加郵箱報警 alertmanager.yaml
我們原來看了,alertmanager配置文件里並沒有多少東西,我們重新配置。這里我們配置了兩種報警規則,郵件和釘釘
[root@ops1 test]# cat <<EOF > alertmanager.yaml 
global:
  # 在沒有報警的情況下聲明為已解決的時間
  resolve_timeout: 5m
  # 配置郵件發送信息
  smtp_smarthost: 'smtp.exmail.qq.com:25'
  smtp_from: 'xueting@ikongjian.com'
  smtp_auth_username: 'xueting@ikongjian.com'
  smtp_auth_password: "${mima}"
  smtp_hello: 'xueting@ikongjian.com'
  smtp_require_tls: false
# 所有報警信息進入后的根路由,用來設置報警的分發策略
route:
  # 這里的標簽列表是接收到報警信息后的重新分組標簽,例如,接收到的報警信息里面有許多具有 cluster=A 和 alertname=LatncyHigh 這樣的標簽的報警信息將會批量被聚合到一個分組里面
  group_by: ['alertname', 'cluster']
  # 當一個新的報警分組被創建后,需要等待至少group_wait時間來初始化通知,這種方式可以確保您能有足夠的時間為同一分組來獲取多個警報,然后一起觸發這個報警信息。
  group_wait: 30s
 
  # 當第一個報警發送后,等待'group_interval'時間來發送新的一組報警信息。
  group_interval: 30s
 
  # 如果一個報警信息已經發送成功了,等待'repeat_interval'時間來重新發送他們
  repeat_interval: 2m
 
  # 默認的receiver:如果一個報警沒有被一個route匹配,則發送給默認的接收器
  receiver: default
 
  # 上面所有的屬性都由所有子路由繼承,並且可以在每個子路由上進行覆蓋。
  routes:
  - receiver: email
    group_wait: 10s
    match:
      alertManagerRule: node #根據此標簽,選擇報警規則
  # - receiver: webhook
  #   match:
  #     alertManagerRule: node #根據此標簽,選擇報警規則 
 
receivers:
- name: 'default'
  email_configs:
  - to: 'xueting@ikongjian.com'
    send_resolved: true
- name: 'email'
  email_configs:
  - to: 'wangzt@ikongjian.com'
    send_resolved: true
  webhook_configs:
  - url: 'http://dingtalk-hook:5000'
    send_resolved: true
- name: 'webhook'
  webhook_configs:
  - url: 'http://dingtalk-hook:5000'
    send_resolved: true
EOF
[root@ops1 test]# kubectl delete secret alertmanager-prometheus-operator-alertmanager -n monitoring
secret "alertmanager-prometheus-operator-alertmanager" deleted
[root@ops1 test]# kubectl create secret generic alertmanager-prometheus-operator-alertmanager --from-file=alertmanager.yaml -n monitoring
secret/alertmanager-prometheus-operator-alertmanager created 
 
稍等一分鍾,等待配置生效,我們訪問alertmanager配置文件, http://192.168.70.122:30091/#/status
發現配置已經生效。
這時我們就可以等待,看看郵箱里沒有沒郵件了
 
添加釘釘報警
大家發現,我們的報警里還有另外一個規則,就是name: 'webhook',通過網絡url進行報警,我們這里配置的是釘釘,先來測試
 
curl 'https://oapi.dingtalk.com/robot/send?access_token='$token'' \ -H 'Content-Type: application/json' -d '{"msgtype": "text", "text": { "content": "我就是我, 是不一樣的煙火2"}}'
 
[root@ops1 test]# kubectl create secret generic dingtalk-secret --from-literal=token=$token -n monitoring
secret/dingtalk-secret created # 設置token
[root@ops1 test]# cat <<EOF > dingtalk-hook.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dingtalk-hook
  namespace: monitoring
spec:
  replicas: 1
  selector:
   matchLabels:
     app: dingtalk-hook
  template:
    metadata:
      labels:
        app: dingtalk-hook
    spec:
      containers:
      - name: dingtalk-hook
        image: registry.cn-beijing.aliyuncs.com/wangzt/k8s/dingtalk-hook:0.1
        # image: cnych/alertmanager-dingtalk-hook:v0.2, 修改的此鏡像,去掉了json
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5000
          name: http
        env:
        - name: ROBOT_TOKEN
          valueFrom:
            secretKeyRef:
              name: dingtalk-secret
              key: token
        resources:
          requests:
            cpu: 50m
            memory: 100Mi
          limits:
            cpu: 50m
            memory: 100Mi
---
apiVersion: v1
kind: Service
metadata:
  name: dingtalk-hook
  namespace: monitoring
spec:
  selector:
    app: dingtalk-hook
  ports:
  - name: hook
    port: 5000
    targetPort: http
EOF
[root@ops1 test]# kubectl apply -f dingtalk-hook.yaml 
deployment.apps/dingtalk-hook created
service/dingtalk-hook created
 
 
我們就可以在釘釘報警里看見了
 
prometheus添加證書
[root@ops1 prometheus]# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
secret/etcd-certs created
 
 
[root@ops1 test]# kubectl exec -it prometheus-prometheus-operator-prometheus-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-prometheus-operator-prometheus-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.crt healthcheck-client.crt healthcheck-client.key
 
 
 
5. prometheus收集java信息
 
[root@dev3_worker bin]# cat <<EOF > config.yaml 
---
lowercaseOutputLabelNames: true
lowercaseOutputName: true
rules:
- pattern: 'Catalina<type=GlobalRequestProcessor, name=\"(\w+-\w+)-(\d+)\"><>(\w+):'
  name: tomcat_$3_total
  labels:
    port: "$2"
    protocol: "$1"
  help: Tomcat global $3
  type: COUNTER
- pattern: 'Catalina<j2eeType=Servlet, WebModule=//([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), name=([-a-zA-Z0-9+/$%~_-|!.]*), J2EEApplication=none, J2EEServer=none><>(requestCount|maxTime|processingTime|errorCount):'
  name: tomcat_servlet_$3_total
  labels:
    module: "$1"
    servlet: "$2"
  help: Tomcat servlet $3 total
  type: COUNTER
- pattern: 'Catalina<type=ThreadPool, name="(\w+-\w+)-(\d+)"><>(currentThreadCount|currentThreadsBusy|keepAliveCount|pollerThreadCount|connectionCount):'
  name: tomcat_threadpool_$3
  labels:
    port: "$2"
    protocol: "$1"
  help: Tomcat threadpool $3
  type: GAUGE
- pattern: 'Catalina<type=Manager, host=([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), context=([-a-zA-Z0-9+/$%~_-|!.]*)><>(processingTime|sessionCounter|rejectedSessions|expiredSessions):'
  name: tomcat_session_$3_total
  labels:
    context: "$2"
    host: "$1"
  help: Tomcat session $3 total
  type: COUNTER
EOF
 
         
 
收集tomcat數據
Jar包應用
安裝github上的方式啟動就好了
java -javaagent:./jmx_prometheus_javaagent-0.12.0.jar=8080:config.yaml -jar yourJar.jar
 
Tomcat war包應用
進入bin目錄$TOMCAT_HOME/bin,將jmx_exporter.jar包文件和config.yaml文件復制到這里。然后修改里面的一個catalina.sh的腳本,找到JAVA_OPTS,加上以下配置(代理):
如果有多tomcat,建議將jmx_prometheus_javaagent和config.yaml文件放到固定的目錄,$TOMCAT_HOME/bin/catalina.sh文件中寫絕對路徑.
#修改bin/catalina.sh 文件 添加: JAVA_OPTS="-javaagent:bin/jmx_prometheus_javaagent-0.12.0.jar=39081:bin/config.yaml"
 
如果是war包應用
-Djava.util.logging.config.file=/path/to/logging.properties
 
7. prometheus修改默認不可監控服務
1. prometheus-operator-kube-etcd和prometheus-operator-kube-proxy異常
 
1. prometheus-operator-kube-etcd,查看配置文件發現,默認監聽端口號為 1381
curl http://127.0.0.1:2381/metrics | head
. /etc/kubernetes/manifests/etcd.yaml
修改- --listen-metrics-urls=http://127.0.0.1:2381 為本機地址
- --listen-metrics-urls=http://0.0.0.0:2381
. kubectl edit svc prometheus-operator-kube-etcd -n kube-system 修改svc監聽端口端口號為1381
 
2. prometheus-operator-kube-proxy異常
訪問發現,他也是只監聽127.0.0.1端口
[root@ops1 manifests]# kubectl get svc prometheus-operator-kube-proxy -o yaml -n kube-system
 
[root@ops1 manifests]# kubectl get ds kube-proxy -o yaml -n kube-system
kubectl edit cm kube-proxy -n kube-system
改為監聽所有地址
 
8. 刪除prometheus服務
如果crd不需要也要一起刪除
helm del --purge prometheus-operator
# removed CRDS
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
kubectl get crd | grep monitoring


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM