grafana對報警的支持真的很弱,而Prometheus提供的報警系統就強大很多
Prometheus將數據采集和報警分成了兩個模塊。報警規則配置在Prometheus Servers上,然后發送報警信息到AlertManger,然后我們的AlertManager就來管理這些報警信息,包括silencing、inhibition,聚合報警信息過后通過email、PagerDuty、HipChat、Slack 等方式發送消息提示。
讓AlertManager提供服務總的來說就下面3步:
1.安裝和配置AlertManger
2.配置Prometheus來和AlertManager通信
3.在Prometheus中創建報警規則
一個報警信息在生命周期內有下面3種狀態:
1.inactive: 表示當前報警信息既不是firing狀態也不是pending狀態
2.pending: 表示在設置的閾值時間范圍內被激活了
3.firing: 表示超過設置的閾值時間被激活了
Alertmanager配置文件
global:
resolve_timeout: 5m
# smtp配置
smtp_from: "prom-alert@example.com"
smtp_smarthost: 'email-smtp.us-west-2.amazonaws.com:465'
smtp_auth_username: "user"
smtp_auth_password: "pass"
smtp_require_tls: true
templates:
- '/data/alertmanager/templates/*.tmpl'
route:
receiver: test1
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [alertname]
routes:
# ads webhook
- receiver: test1
group_wait: 10s
match:
team: ads
# ops webhook
- receiver: test2
group_wait: 10s
match:
team: operations
receivers:
- name: test1
email_configs:
- to: '9935226@qq.com'
headers: { Subject: "[ads] 報警郵件"} # 接收郵件的標題
webhook_configs:
- url: http://localhost:8060/dingtalk/ads/send
- name: test2
email_configs:
- to: '9935226@qq.com,deniss.wang@gmail.com'
send_resolved: true
headers: { Subject: "[ops] 報警郵件"} # 接收郵件的標題
webhook_configs:
- url: http://localhost:8060/dingtalk/ops/send
# wx config
wechat_configs:
- corp_id: 'wwxxxxxxxxxxxxxx'
api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
send_resolved: true
to_party: '2'
agent_id: '1000002'
api_secret: '1FvHxuGbbG35FYsuW0YyI4czWY/.2'
將Dingtalk接入 Prometheus AlertManager WebHook
在釘釘中申請釘釘機器人:
二進制方式安裝Dingtalk-Webhook插件插件
cd /usr/local/src/
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
tar -zxvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
mv prometheus-webhook-dingtalk-0.3.0.linux-amd64 /data/alertmanager/webhook-dingtalk
# 創建Systemd webhook-dingtalk 服務
cat > /etc/systemd/system/webhook-dingtalk.service << EOF
[Unit]
Description=webhook-dingding
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/data/alertmanager/webhook-dingtalk/prometheus-webhook-dingtalk \
--ding.profile="ads=https://oapi.dingtalk.com/robot/send?access_token=284de68124e97420a2ee8ae1b8f12fabe3213213213" \
--ding.profile="ops=https://oapi.dingtalk.com/robot/send?access_token=8bce3bd11f7040d57d44caa5b6ef9417eab24e1123123123213"
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 啟動服務
systemctl enable webhook-dingtalk
systemctl start webhook-dingtalk
systemctl status webhook-dingtalk
# 查看端口是否正常
netstat -anplt|grep 8060
tcp6 0 0 :::8060 :::* LISTEN 1635/prometheus-web