K8s之Prometheus監控


目錄

容器監控與報警

容器監控的實現方對比虛擬機或者物理機來說比大的區別,比如容器在k8s環境中可以任意橫向擴容與縮容,那么就需要監控服務能夠自動對新創建的容器進行監控,當容器刪除后又能夠及時的從監控服務中刪除,而傳統的zabbix的監控方式需要在每一個容器中安裝啟動agent,並且在容器自動發現注冊及模板關聯方面並沒有比較好的實現方式。

role host port
Prometheus master2(10.203.104.21) 9090
node exporter master/node 9100
Grafana master3(10.203.104.22) 3000
cadvisor node 8080
alertmanager master3 9093
haproxy_exporter HA1(10.203.104.30) 9101

Prometheus

k8s的早期版本基於組件heapster實現對pod和node節點的監控功能,但是從k8s 1.8版本開始使用metrics API的方式監控,並在1.11版本 正式將heapster替換,后期的k8s監控主要是通過metrics Server提供核心監控指標,比如Node節點的CPU和內存使用率,其他的監控交由另外一個組件Prometheus 完成

prometheus簡介

https://prometheus.io/docs/ #官方文檔

https://github.com/prometheus #github地址

Prometheus是基於go語言開發的一套開源的監控、報警和時間序列數據庫的組合,是由SoundCloud公司開發的開源監控系統,Prometheus是CNCF(Cloud Native Computing Foundation,雲原生計算基金會)繼kubernetes 之后畢業的第二個項目,prometheus在容器和微服務領域中得到了廣泛的應用,其特點主要如下

使用key-value的多維度格式保存數據
數據不使用MySQL這樣的傳統數據庫,而是使用時序數據庫,目前是使用的TSDB
支持第三方dashboard實現更高的圖形界面,如grafana(Grafana 2.5.0版本及以上)
功能組件化
不需要依賴存儲,數據可以本地保存也可以遠程保存
服務自動化發現
強大的數據查詢語句功(PromQL,Prometheus Query Language)

prometheus系統架構

prometheus server:主服務,接受外部http請求,收集、存儲與查詢數據等
prometheus targets: 靜態收集的目標服務數據
service discovery:動態發現服務
prometheus alerting:報警通知
pushgateway:數據收集代理服務器(類似於zabbix proxy)
data visualization and export: 數據可視化與數據導出(訪問客戶端)

prometheus 安裝方式

https://prometheus.io/download/ #官方二進制下載及安裝,prometheus server的監聽端口為9090
https://prometheus.io/docs/prometheus/latest/installation/ #docker鏡像直接啟動
https://github.com/coreos/kube-prometheus #operator部署
容器方式安裝prometheus

本次環境在Master2(10.203.104.21)中安裝prometheus

運行prometheus容器

root@master2:~# docker run \
    -p 9090:9090 \
    prom/prometheus

在瀏覽器中訪問master2節點的9090端口測試prometheus

operator部署

https://github.com/coreos/kube-prometheus

克隆項目
root@master1:/usr/local/src# git clone https://github.com/coreos/kube-prometheus.git
root@master1:/usr/local/src# cd kube-prometheus-release-0.4/
root@master1:/usr/local/src/kube-prometheus-release-0.4# ls
build.sh            DCO   example.jsonnet  experimental  go.sum  jsonnet           jsonnetfile.lock.json  LICENSE   manifests  OWNERS     scripts                            tests
code-of-conduct.md  docs  examples         go.mod        hack    jsonnetfile.json  kustomization.yaml     Makefile  NOTICE     README.md  sync-to-internal-registry.jsonnet  test.sh

root@master1:/usr/local/src/kube-prometheus-release-0.4# cd manifests/
root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# ls

創建賬號規則
root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f setup/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
創建prometheus
root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f .
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
設置端口轉發
$ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/grafana 3000:3000
$ kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090:9090

web訪問master1節點3000端口測試(http://10.203.104.20:3000)

基於NodePort暴露服務

grafana

root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# cat grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: 3000
    nodePort: 33000
  selector:
    app: grafana
    
root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f grafana-service.yaml

web訪問master1節點3000端口測試(http://10.203.104.20:33000)

prometheus

root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# cat prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9090
    targetPort: web
  selector:
    app: prometheus
    prometheus: k8s
    nodePort: 39090
  sessionAffinity: ClientIP

 root@master1:/usr/local/src/kube-prometheus-release-0.4/manifests# kubectl apply -f prometheus-service.yaml
二進制方式安裝

本次環境在Master2中安裝prometheus

解壓二進制壓縮包文件
root@master2:/usr/local/src# ls
prometheus-2.17.1.linux-amd64.tar.gz

root@master2:/usr/local/src# tar -zxvf prometheus-2.17.1.linux-amd64.tar.gz
prometheus-2.17.1.linux-amd64/
prometheus-2.17.1.linux-amd64/NOTICE
prometheus-2.17.1.linux-amd64/LICENSE
prometheus-2.17.1.linux-amd64/prometheus.yml
prometheus-2.17.1.linux-amd64/prometheus
prometheus-2.17.1.linux-amd64/promtool
prometheus-2.17.1.linux-amd64/console_libraries/
prometheus-2.17.1.linux-amd64/console_libraries/menu.lib
prometheus-2.17.1.linux-amd64/console_libraries/prom.lib
prometheus-2.17.1.linux-amd64/consoles/
prometheus-2.17.1.linux-amd64/consoles/prometheus-overview.html
prometheus-2.17.1.linux-amd64/consoles/index.html.example
prometheus-2.17.1.linux-amd64/consoles/node-cpu.html
prometheus-2.17.1.linux-amd64/consoles/node-overview.html
prometheus-2.17.1.linux-amd64/consoles/node.html
prometheus-2.17.1.linux-amd64/consoles/node-disk.html
prometheus-2.17.1.linux-amd64/consoles/prometheus.html
prometheus-2.17.1.linux-amd64/tsdb
prometheus目錄創建軟鏈接
root@master2:/usr/local/src# ln -sv /usr/local/src/prometheus-2.17.1.linux-amd64 /usr/local/prometheus
'/usr/local/prometheus' -> '/usr/local/src/prometheus-2.17.1.linux-amd64'
root@master2:/usr/local/src# cd /usr/local/prometheus
root@master2:/usr/local/prometheus# ls
console_libraries  consoles  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  tsdb
創建prometheus啟動腳本
root@master2:/usr/local/prometheus# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target

[Service]
Restart=on-failure
WorkingDirectory=/usr/local/prometheus/
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
[Install]

WantedBy=multi-user.target
啟動prometheus服務
root@master2:/usr/local/prometheus# systemctl start prometheus
root@master2:/usr/local/prometheus# systemctl status prometheus
root@master2:/usr/local/prometheus# systemctl enable prometheus
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.
訪問prometheus web界面

訪問prometheus節點的9090端口

node exporter

收集各k8s node節點(master/node)上的監控指標數據,監聽端口為9100

二進制方式安裝node exporter(master/node)

解壓二進制壓縮包文件

root@node1:/usr/local/src# ls
node_exporter-0.18.1.linux-amd64.tar.gz

root@node1:/usr/local/src# tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz 
node_exporter-0.18.1.linux-amd64/
node_exporter-0.18.1.linux-amd64/node_exporter
node_exporter-0.18.1.linux-amd64/NOTICE
node_exporter-0.18.1.linux-amd64/LICENSE

node_exporter目錄創建軟鏈接

root@node1:/usr/local/src# ln -sv /usr/local/src/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter
'/usr/local/node_exporter' -> '/usr/local/src/node_exporter-0.18.1.linux-amd64'

root@node1:/usr/local/src# cd /usr/local/node_exporter
root@node1:/usr/local/node_exporter# ls
LICENSE  node_exporter  NOTICE
創建node exporter啟動腳本
root@node1:/usr/local/node_exporter# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
ExecStart=/usr/local/node_exporter/node_exporter

[Install]
WantedBy=multi-user.target
啟動node exporter服務
root@node1:/usr/local/node_exporter# systemctl start node-exporter
root@node1:/usr/local/node_exporter# systemctl status node-exporter
root@node1:/usr/local/node_exporter# systemctl enable node-exporter
Created symlink /etc/systemd/system/multi-user.target.wants/node-exporter.service → /etc/systemd/system/node-exporter.service.
訪問node exporter web界面

在k8s的master和node節點分別測試訪問9100端口

prometheus采集node 指標數據

配置prometheus通過node exporter采集 監控指標數據

prometheus配置文件

prometheus server的prometheus.yml文件

root@master2:/usr/local/prometheus# cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

        # static_configs:
        #- targets: ['localhost:9090']

  # 指定node exporter采集的IP及端口
  - job_name: 'prometheus-node'
    static_configs:
    - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']

  - job_name: 'prometheus-master'
    static_configs:
    - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']
重啟prometheus服務
root@master2:/usr/local/prometheus# systemctl restart prometheus
prometheus驗證node節點狀態

prometheus驗證node節點監控數據

Grafana

https://grafana.com/docs/ #官方安裝文檔

調用prometheus的數據,進行更專業的可視化

安裝grafana

在master3(10.203.104.22)中安裝grafana,安裝版本為v6.7.2

root@master3:/usr/local/src# apt-get install -y adduser libfontconfig1
 root@master3:/usr/local/src# wget https://dl.grafana.com/oss/release/grafana_6.7.2_amd64.deb
root@master3:/usr/local/src# dpkg -i grafana_6.7.2_amd64.deb

配置文件

root@master3:~# vim /etc/grafana/grafana.ini
[server]
# Protocol (http, https, socket)

protocol = http

# The ip address to bind to, empty will bind to all interfaces

http_addr = 0.0.0.0

# The http port to use

http_port = 3000

啟動grafana

root@master3:~# systemctl start grafana-server.service
root@master3:~# systemctl enable grafana-server.service
Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable grafana-server
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service.

grafana web界面

登錄界面

添加prometheus數據源





import模板

模板下載地址

https://grafana.com/grafana/dashboards

點擊目標模板

下載模板

通過模板ID導入

確認模板信息

驗證圖形信息

餅圖插件未安裝,需要提前安裝
https://grafana.com/grafana/plugins/grafana-piechart-panel

在線安裝:
# grafana-cli plugins install grafana-piechart-panel

離線安裝:
root@master3:/var/lib/grafana/plugins# pwd
/var/lib/grafana/plugins

root@master3:/var/lib/grafana/plugins# ls
grafana-piechart-panel-v1.5.0-0-g3234d63.zip

root@master3:/var/lib/grafana/plugins# unzip grafana-piechart-panel-v1.5.0-0-g3234d63.zip
root@master3:/var/lib/grafana/plugins# mv grafana-piechart-panel-3234d63/ grafana-piechart-panel
root@master3:/var/lib/grafana/plugins# systemctl restart grafana-server

監控pod資源

node節點都需安裝cadvisor

cadvisor由谷歌開源,cadvisor不僅可以搜集一台機器上所有運行的容器信息,還提供基礎查詢界面和http接口,方便其他組件如Prometheus進行數據抓取,cAdvisor可以對節點機器上的資源及容器進行實時監控和性能數據采集,包括CPU使用情況、內存使用情況、網絡吞吐量及文件系統使用情況。

k8s 1.12之前cadvisor集成在node節點的上kubelet服務中,從1.12版本開始分離為兩個組件,因此需要在node節點單獨部署cadvisor。

https://github.com/google/cadvisor

cadvisor鏡像准備

# docker load -i cadvisor_v0.36.0.tar.gz
# docker tag gcr.io/google-containers/cadvisor:v0.36.0 harbor.linux.com/baseimages/cadvisor:v0.36.0
# docker push harbor.linux.com/baseimages/cadvisor:v0.36.0

啟動cadvisor容器

# docker run \
    --volume=/:/rootfs:ro \
    --volume=/var/run:/var/run:rw \
    --volume=/sys:/sys:ro \
    --volume=/var/lib/docker/:/var/lib/docker:ro \
    --volume=/dev/disk/:/dev/disk:ro \
    --publish=8080:8080 \
    --detach=true \
    --name=cadvisor \
    harbor.linux.com/baseimages/cadvisor:v0.36.0

驗證cadvisor web界面:

訪問node節點的cadvisor監聽端口:http://10.203.104.26:8080/

prometheus采集cadvisor數據

root@master2:~# cat /usr/local/prometheus/prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

        # static_configs:
        #- targets: ['localhost:9090']

    - job_name: 'prometheus-node'
      static_configs:
      - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']

    - job_name: 'prometheus-master'
      static_configs:
      - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']

    - job_name: 'prometheus-pod-cadvisor'
      static_configs:
      - targets: ['10.203.104.26:8080','10.203.104.27:8080','10.203.104.28:8080']

重啟prometheus

root@master2:~# systemctl restart prometheus

grafana添加pod監控模板


prometheus報警設置

prometheus觸發一條告警的過程:

prometheus--->觸發閾值--->超出持續時間--->alertmanager--->分組|抑制|靜默--->媒體類型--->郵件|釘釘|微信等。

分組(group): 將類似性質的警報合並為單個通知。
靜默(silences): 是一種簡單的特定時間靜音的機制,例如:服務器要升級維護可以先設置這個時間段告警靜默。
抑制(inhibition): 當警報發出后,停止重復發送由此警報引發的其他警報即合並一個故障引起的多個報警事件,可以消除冗余告警
  • alertmanager主機的IP為10.203.104.22,主機名為master3

下載並報警組件alertmanager

root@master3:/usr/local/src# ls
alertmanager-0.20.0.linux-amd64.tar.gz  grafana_6.7.2_amd64.deb  node_exporter-0.18.1.linux-amd64.tar.gz

root@master3:/usr/local/src# tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz 
alertmanager-0.20.0.linux-amd64/
alertmanager-0.20.0.linux-amd64/LICENSE
alertmanager-0.20.0.linux-amd64/alertmanager
alertmanager-0.20.0.linux-amd64/amtool
alertmanager-0.20.0.linux-amd64/NOTICE
alertmanager-0.20.0.linux-amd64/alertmanager.yml

root@master3:/usr/local/src# ln -sv /usr/local/src/alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager
'/usr/local/alertmanager' -> '/usr/local/src/alertmanager-0.20.0.linux-amd64'

root@master3:/usr/local/src# cd /usr/local/alertmanager
root@master3:/usr/local/alertmanager# ls
alertmanager  alertmanager.yml  amtool  LICENSE  NOTICE

配置alertmanager

https://prometheus.io/docs/alerting/configuration/ #官方配置文檔

root@master3:/usr/local/alertmanager# cat alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '2973707860@qq.com'
  smtp_auth_username: '2973707860@qq.com'
  smtp_auth_password: 'udwthyyxtstcdhcj'
  smtp_hello: '@qq.com'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  #webhook_configs:
  #- url: 'http://127.0.0.1:5001/'
  email_configs:
    - to: '2973707860@qq.com'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

啟動alertmanager服務

二進制啟動

root@master3:/usr/local/alertmanager# ./alertmanager --config.file=./alertmanager.yml

服務啟動文件

root@master3:/usr/local/alertmanager# cat /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target

啟動服務

root@master3:/usr/local/alertmanager# systemctl start alertmanager.service
root@master3:/usr/local/alertmanager# systemctl enable alertmanager.service
Created symlink /etc/systemd/system/multi-user.target.wants/alertmanager.service → /etc/systemd/system/alertmanager.service.

web 測試訪問9093端口

配置prometheus報警規則

root@master2:/usr/local/prometheus# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 10.203.104.22:9093   #alertmanager地址

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/usr/local/prometheus/danran_rule.yml"   #指定規則文件
  # - "first_rules.yml"
  # - "second_rules.yml"

創建報警規則文件

root@master2:/usr/local/prometheus# cat danran_rule.yml 
groups:
  - name: danran_pod.rules
    rules:
    - alert: Pod_all_cpu_usage
      expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 75
      for: 5m
      labels:
        severity: critical
        service: pods
      annotations:
        description: 容器 {{ $labels.name }} CPU 資源利用率大於 75% , (current value is {{ $value }})
        summary: Dev CPU 負載告警

    - alert: Pod_all_memory_usage
      expr: sort_desc(avg by(name)(irate(container_memory_usage_bytes{name!=""}[5m]))*100) > 1024*10^3*2
      for: 10m
      labels:
        severity: critical
      annotations:
        description: 容器 {{ $labels.name }} Memory 資源利用率大於 2G , (current value is {{ $value }})
        summary: Dev Memory 負載告警

    - alert: Pod_all_network_receive_usage
      expr: sum by (name)(irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1024*1024*50
      for: 10m
      labels:
        severity: critical
      annotations:
        description: 容器 {{ $labels.name }} network_receive 資源利用率大於 50M , (current value is {{ $value }})

報警規則驗證

root@master2:/usr/local/prometheus# ./promtool check rules danran_rule.yml
Checking danran_rule.yml
  SUCCESS: 3 rules found

重啟prometheus

root@master2:/usr/local/prometheus# systemctl restart prometheus

驗證報警規則匹配

10.203.104.22為alertmanager主機

root@master3:/usr/local/alertmanager# ./amtool alert --alertmanager.url=http://10.203.104.22:9093

prometheus首頁狀態

prometheus web界面驗證報警規則

prometheus監控haproxy

haproxy_exporter安裝在HA1(10.203.104.30)節點上

部署haproxy_exporter

root@ha1:/usr/local/src# ls
haproxy_exporter-0.10.0.linux-amd64.tar.gz
root@ha1:/usr/local/src# tar -zxvf haproxy_exporter-0.10.0.linux-amd64.tar.gz 
haproxy_exporter-0.10.0.linux-amd64/
haproxy_exporter-0.10.0.linux-amd64/LICENSE
haproxy_exporter-0.10.0.linux-amd64/NOTICE
haproxy_exporter-0.10.0.linux-amd64/haproxy_exporter

root@ha1:/usr/local/src# ln -sv /usr/local/src/haproxy_exporter-0.10.0.linux-amd64 /usr/local/haproxy_exporter
'/usr/local/haproxy_exporter' -> '/usr/local/src/haproxy_exporter-0.10.0.linux-amd64'
root@ha1:/usr/local/src# cd /usr/local/haproxy_exporter

啟動haproxy_exporter

root@ha1:/usr/local/haproxy_exporter# ./haproxy_exporter  --haproxy.scrape-uri=unix:/run/haproxy/admin.sock
或指定haproxy的狀態頁啟動
root@ha1:/usr/local/haproxy_exporter# ./haproxy_exporter --haproxy.scrape-uri="http://haadmin:danran@10.203.104.30:9999/haproxy-status;csv"

查看haproxy的狀態頁配置
root@ha1:/usr/local/src# cat /etc/haproxy/haproxy.cfg
listen stats
    mode http
    bind 0.0.0.0:9999
    stats enable
    log global
    stats uri /haproxy-status
    stats auth haadmin:danran

編輯啟動腳本

root@ha1:~# cat /etc/systemd/system/haproxy-exporter.service
[Unit]
Description=Prometheus Haproxy Exporter
After=network.target

[Service]
ExecStart=/usr/local/haproxy_exporter/haproxy_exporter  --haproxy.scrape-uri=unix:/run/haproxy/admin.sock


[Install]
WantedBy=multi-user.target

root@ha1:~# systemctl restart haproxy-exporter.service

驗證web界面數據

prometheus server端添加haproxy數據采集

root@master2:/usr/local/prometheus# cat prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  #- job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

        # static_configs:
        #- targets: ['localhost:9090']

- job_name: 'prometheus-node'
  static_configs:
  - targets: ['10.203.104.26:9100','10.203.104.27:9100','10.203.104.28:9100']

- job_name: 'prometheus-master'
  static_configs:
  - targets: ['10.203.104.20:9100','10.203.104.21:9100','10.203.104.22:9100']

- job_name: 'prometheus-pod'
  static_configs:
  - targets: ['10.203.104.26:8080','10.203.104.27:8080','10.203.104.28:8080']

- job_name: 'prometheus-haproxy'
  static_configs:
  - targets: ['10.203.104.30:9101']

重啟prometheus

root@master2~# systemctl restart prometheus

grafana添加數據模板

獲取模板
https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=asc&orderBy=name&search=haproxy

在grafana中import 導入下載模版ID或JSON文件

驗證haproxy監控數據


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM