特別注意:防止發送通知過快或頻繁,導致警告通知轟炸
下載alertmanager
下載地址:https://prometheus.io/download/
下載解壓之后直接雙擊exe文件啟動,打開 http://localhost:9093,等 prometheus配置之后重啟等會,
修改alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: 'xxxxxxxx@qq.com'
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: 'xxxxxxxxxxx@qq.com'
smtp_auth_password: 'xxxxxxxxxxxxxxx'
smtp_require_tls: false
smtp_hello: 'qq.com'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'xxxxxxxxxx@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
修改prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
rule_files:
- "machine_alert_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_liux_70'
static_configs:
- targets: ['10.0.0.70:9100']
添加machine_alert_rules.yml
groups:
- name: simulator-alert-rule
rules:
- alert: check_node_liux_70
expr: sum(up{job="node_liux_70"}) == 0
for: 1m
labels:
severity: critical
annotations:
description: "已經宕機或下線超過1分鍾."