Alertmanager钉钉告警


将钉钉接入 Prometheus AlertManager WebHook
image

  • 创建钉钉自定义机器人
    钉钉点击头像,机器人--> 添加机器人,进入后添加“自定义机器人”,然后按要求操作,获取 Webhook 和 加密串

测试钉钉告警机器人:

curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

image

  • 部署钉钉告警插件
# 下载钉钉告警插件
cd /opt/alertmanager/

curl -O -L https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.0.0/prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz

# 解压并重命名
tar -zxvf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz

mv prometheus-webhook-dingtalk-2.0.0.linux-amd64/ prometheus-webhook-dingtalk

chown -R prometheus:prometheus *

  • 修改钉钉告警插件的配置文件
cd prometheus-webhook-dingtalk
cp config.example.yml config.yml
chown -R prometheus:prometheus *
vim config.yml
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    # secret for signature
    secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • 启动钉钉告警插件
cd /opt/alertmanager/prometheus-webhook-dingtalk

nohup ./prometheus-webhook-dingtalk --config.file=./config.yml &
  • 配置系统启动
vim /usr/lib/systemd/system/prometheus-webhook-dingtalk.service

[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target

[Service]
Restart=on-failure
ExecStart=/opt/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/opt/alertmanager/prometheus-webhook-dingtalk/config.yml

[Install]
WantedBy=multi-user.target
#命令行启动
systemctl daemon-reload
systemctl start prometheus-webhook-dingtalk
systemctl status prometheus-webhook-dingtalk
systemctl enable prometheus-webhook-dingtalk
ss -tnl | grep 8060
journalctl -u prometheus-webhook-dingtalk -fn 200
  • 测试
curl http://localhost:8060/dingtalk/webhook1/send -H 'Content-Type: application/json' -d '{"msgtype": "text","text": {"content": "监控告警"}}'
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m #如果告警成功,每隔10min发送一次
  #receiver: 'web.hook'
  receiver: 'dingtalk'
#receivers:
#- name: 'web.hook'
#  webhook_configs:
#  - url: 'http://127.0.0.1:5001/'

receivers:
- name: 'dingtalk'
  webhook_configs:
  - url: 'http://localhost:8060/dingtalk/webhook1/send'
    send_resolved: true

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
【告警数据的状态】:

1、Inactive
表示没有达到告警的阈值,即expr表达式不成立。

2、Pending
表示达到了告警的阈值,即expr表达式成立了,但是未满足告警的持续时间,即for的值。

3、Firing
已经达到阈值,且满足了告警的持续时间。
经测试发现,如果同一个告警数据达到了Firing,那么不会再次产生一个告警数据,除非该告警解决了。

eg:
比如:192.168.1.1:9080 这个服务的宕机时间超过了1分钟,并且产生了一个Firing的告警数据,如果这台机器没有恢复,则不会再次产生相同的告警数据。


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM