Prometheus+Grafana+Altermanager釘釘報警

本文轉載自查看原文 2018-08-28 15:26 2515 grafana/ prometheus/ alertmanager

一.prometheus介紹

Prometheus是一個開源的系統監控和報警的工具包，最初由SoundCloud發布。

特點：

多維數據模型（有metric名稱和鍵值對確定的時間序列）
靈活的查詢語言
不依賴分布式存儲
通過pull方式采集時間序列，通過http協議傳輸
支持通過中介網關的push時間序列的方式
監控數據通過服務或者靜態配置來發現
支持圖表和dashboard等多種方式

組件：　

Prometheus ：主程序，Prometheus服務端，由於存儲及收集數據，提供相關api對外查詢用，主要是負責存儲、抓取、聚合、查詢方面。
Alertmanager：程序，主要是負責實現報警功能。
Pushgateway ：程序，主要是實現接收由Client push過來的指標數據，在指定的時間間隔，由主程序來抓取。
*_exporter ：類似傳統意義上的被監控端的agent，有區別的是，它不會主動推送監控數據到server端，而是等待server端定時來收集數據，即所謂的主動監控。

架構：

二.prometheus部署

Prometheus官網下載：https://prometheus.io/download/

1. 下載&部署

# 下載
[root@prometheus src]# cd /usr/local/src/
[root@prometheus src]# wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz

# 部署到/usr/local/目錄
# promethus不用編譯安裝，解壓目錄中有配置文件與啟動文件

[root@prometheus src]#tar zxf prometheus-2.3.2.linux-amd64.tar.gz -C /usr/local/

[root@prometheus src]# cd /usr/local/ 
[root@prometheus local]# mv prometheus-2.0.0.linux-amd64/ prometheus/ 
# 驗證 
[root@prometheus local]# cd prometheus/ 
[root@prometheus prometheus]# ./prometheus --version

2. 配置文件

# 解壓目錄中的prometheus.yml
# 簡單驗證，主要配置采用默認文件配置，有修改/新增處用紅色標示
[root@prometheus prometheus]# vim prometheus.yml
# 全局配置
global:
  scrape_interval:     15s # 設置抓取(pull)時間間隔，默認是1m
  evaluation_interval: 15s # 設置rules評估時間間隔，默認是1m
  # scrape_timeout is set to the global default (10s).

# 告警管理配置，暫未使用，默認配置
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# 加載rules，並根據設置的時間間隔定期評估，暫未使用，默認配置
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# 抓取(pull)，即監控目標配置
# 默認只有主機本身的監控配置
scrape_configs:
  # 監控目標的label（這里的監控目標只是一個metric，而不是指某特定主機，可以在特定主機取多個監控目標），在抓取的每條時間序列表中都會添加此label
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    # 可覆蓋全局配置設置的抓取間隔，由15秒重寫成5秒。
    scrape_interval: 5s

    # 靜態指定監控目標，暫不涉及使用一些服務發現機制發現目標
static_configs:
      - targets: ['localhost:9090']
        # (opentional)再添加一個label，標識了監控目標的主機
labels:
          instance: prometheus

  - job_name: 'linux'
    scrape_interval: 10s
static_configs:
  # 采用node_exporter默認開放的端口
      - targets: ['172.20.1.212:9100','192.168.233.131:9100']
labels:
          instance: node1

3. 設置用戶

# 添加用戶，后期用此賬號啟動服務
[root@prometheus prom etheus]# groupadd prometheus
[root@prometheus prometheus]# useradd -g prometheus -s /sbin/nologin prometheus

# 賦權
[root@prometheus prometheus]# cd ~
[root@prometheus ~]# chown -R prometheus:prometheus /usr/local/prometheus/

# 創建prometheus運行數據目錄
[root@prometheus ~]# mkdir -p /var/lib/prometheus
[root@prometheus ~]# chown -R prometheus:prometheus /var/lib/prometheus/

4. 設置開機啟動




[root@prometheus ~]# touch /usr/lib/systemd/system/prometheus.service 
[root@prometheus ~]# chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service

[root@prometheus ~]# vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
# Type設置為notify時，服務會不斷重啟
Type=simple
User=prometheus
# --storage.tsdb.path是可選項，默認數據目錄在運行目錄的./dada目錄中
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus
Restart=on-failure

[Install]
WantedBy=multi-user.target

# 設置開機啟動
[root@prometheus ~]# systemctl enable Prometheus
[root@prometheus ~]# systemctl start prometheus

5. 啟動並驗證

1）查看服務狀態

[root@prometheus ~]# systemctl status prometheus

[root@prometheus ~]# netstat -tunlp | grep 9090

2）web ui

Prometheus自帶有簡單的UI,http://localhost:9090

在Status菜單下，Configuration，Rule，Targets等，

Statu-->Configuration展示prometheus.yml的配置，

三．部署node_exporter

Node_exporter收集機器的系統數據，這里采用prometheus官方提供的exporter，除node_exporter外，官方還提供consul，memcached，haproxy，mysqld等exporter，具體可查看官網或去github下載（官網沒有的github有可能有如：windows的exporter）。

這里在prometheus node節點部署相關服務。

1. 下載&部署

# 下載
[root@node1 ~]# cd /usr/local/src/
[root@node1 src]# wget https://github.com/prometheus/node_exporter/releases/download/v0.16.0/node_exporter-0.16.0.linux-amd64.tar.gz

# 部署
[root@node1 src]# tar -zxvf node_exporter-0.16.0.linux-amd64.tar.gz-C /usr/local/
[root@node1 src]# cd /usr/local/
[root@node1 local]# mv node_exporter-0.16.0.linux-amd64/ node_exporter/

2. 設置用戶

[root@node1 ~]# groupadd prometheus
[root@node1 ~]# useradd -g prometheus -s /sbin/nologin prometheus
[root@node1 ~]# chown -R prometheus:prometheus /usr/local/node_exporter/

3. 設置開機啟動

[root@node1 ~]# vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

[root@node1 ~]# systemctl enable node_exporter
[root@node1 ~]# systemctl start node_exporter

4. 驗證

訪問prometheus，查看node1主機已經可被監控

5. 繪圖

訪問：http://192.168.233.131:9100/metrics，查看從exporter具體能抓到的數據，如下：

訪問：prometheus，在輸入框中任意輸入1個exporter能抓取得值，點擊"Execute"與"Execute"按鈕，即可見相應抓取數據的圖形，同時可對時間與unit做調整，

四．部署grafana

在prometheus& grafana server節點部署grafana服務。

1. 下載&安裝

# 下載
[root@prometheus ~]# cd /usr/local/src/
[root@prometheus src]# wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2-1.x86_64.rpm

# 安裝 
sudo yum localinstall grafana-5.2.2-1.x86_64.rpm

2. 配置文件

配置文件位於/etc/grafana/grafana.ini，這里暫時保持默認配置即可。

3. 設置開機啟動

[root@prometheus src]# systemctl enable grafana-server
[root@prometheus src]# systemctl start grafana-server

5. 添加數據源

1）登陸

訪問：http://localhost:3000，默認賬號/密碼：admin/admin

2）添加數據源

在登陸首頁，點擊"Configuration-Data Sources"按鈕，跳轉到添加數據源頁面，配置如下：

Name: prometheus

Type: prometheus

URL: http://localhost:9090/

Access: Server

取消Default的勾選，其余默認，點擊"Add"，如下：

在"Dashboards"頁簽下"import"自帶的模版，如下：

6. 導入dashboard

從grafana官網下載相關dashboard到本地，如：https://grafana.com/dashboards/1860

Upload已下載至本地的json文件

Grafana.com Dashboard輸入grafana官網的Dashboard鏈接（如：https://grafana.com/dashboards/1860）

可以下載使用upload上傳，也可不下載直接復制鏈接

7. 查看dashboard

Grafana首頁-->Dashboard-->Home，Home下拉列表中可見有已添加的兩個dashboard，"Prometheus Stats"與"Node Exporter Full"，選擇1個即可

補充

grafana官網如果沒有你想要的dashboard，你可去github上看看。

大部分的dashborad是無法直接使用，它們呈現不出圖像顯示“no data”或者顯示的圖像和本來的圖像不符合，比如你要顯示磁盤剩余但他顯示的是磁盤已使用多少。這就很尷尬了。可以通過修改Metrics的計算公式來是之有效。

選中一個不好使的圖標點擊Edit

、

Add Query添加一個監控值

五．部署Alertmanager 釘釘報警

雖然說grafana也有報警但是使用過后感覺不太好用，grafana報警無法使用模板變量並且報警規則比較繁瑣，然后重新比對決定使用Alertmanager的釘釘報警。但是alertmanager不止是釘釘報警，還有微信，郵件等。

1. 下載&安裝

[root@localhost src]# wget https://github.com/prometheus/alertmanager/releases/download/v0.15.2/alertmanager-0.15.2.linux-amd64.tar.gz
[root@localhost src]# tar zxf alertmanager-0.15.2.linux-amd64.tar.gz

2.配置文件

alertmanager的webhook集成了釘釘報警，所以他不是本來就有的。釘釘對格式要求很嚴格，一會還需要使用插件進行格式轉換。

cat alertmanager.yml
global:
  resolve_timeout: 5m
route:
  receiver: webhook
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  group_by: [alertname]
  routes:
  - receiver: webhook
    group_wait: 10s
    match:
      team: node
receivers:
- name: webhook
  webhook_configs:
  - url: http://localhost:8060/dingtalk/ops_dingding/send 
    send_resolved: true

3.啟動alertmanager

nohup ./alertmanager --config.file=alertmanager.yml 2>&1 1>altermanager.log &
#查看端口
netstat -anpt | grep 9093

4.報警規則

監控主機是否存活

cd /usr/local/prometheus
cat rules.yml
groups:
    - name: test-rule
      rules:
      - alert: 主機狀態
        expr: up == 0
        for: 2m
        labels:
          status: warning
        annotations:
          summary: "{{$labels.instance}}:服務器關閉"
          description: "{{$labels.instance}}:服務器關閉"

5.修改prometheus配置文件

修改alerting和rule_file

rule_files可以指定多個規則

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ["localhost:9093"]
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules.yml"
  # - "second_rules.yml"

重啟

6.將釘釘接入 Prometheus AlertManager WebHook

參考文檔：http://theo.im/blog/2017/10/16/release-prometheus-alertmanager-webhook-for-dingtalk/

插件下載地址：https://github.com/timonwong/prometheus-webhook-dingtalk

安裝

先安裝go環境

建議把主機名改成主機ip，方便報警時提供url或者改成域名也可以

cd /root/go/src/github.com/timonwong/

git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git

cd prometheus-webhook-dingtalk
make（出錯不要管他）
如果沒有生成prometheus-webhook-dingtalk，創建新目錄，進入目錄git clone軟件重新編譯

mkdir -p /usr/lib/golang/src/github.com/timonwong/

啟動

不會加機器人的去網上搜

ding.profile是釘釘機器人的webhook

nohup ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxx"   2>&1 1>dingding.log & 
netstat -anpt | grep 8060

7.測試

把監控主機的exporter關閉或者關閉主機

再啟動exporter，已經恢復

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus+Grafana+Altermanager釘釘報警基於k8s Prometheus+Grafana+Altermanager釘釘報警（主要是webhook-dingtalk的實現） Prometheus+Altermanager釘釘報警 Prometheus+Grafana+Altermanager監控告警 Prometheus+Grafana+Altermanager搭建監控系統 Grafana+Prometheus系統監控之釘釘報警功能 prometheus報警消息釘釘通知釘釘報警-prometheus-alertmanager Prometheus之Alertmanager釘釘報警配置 Prometheus + Grafana（四）系統監控之釘釘預警