Alertmanager安裝配置
wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local cd /usr/local mv alertmanager-0.21.0.linux-amd64/ alertmanager
創建啟動文件
vim /usr/lib/systemd/system/alertmanager.service [Unit] Description=alertmanager Documentation=https://github.com/prometheus/alertmanager After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alert-test.yml --storage.path=/usr/local/alertmanager/data Restart=on-failure [Install] WantedBy=multi-user.target
Alertmanager 安裝目錄下默認有 alertmanager.yml 配置文件,可以創建新的配置文件,在啟動時指定即可。
global: resolve_timeout: 5m smtp_smarthost: 'smtp.qq.com:465' smtp_from: 'aa@qq.com' smtp_auth_username: 'aa@qq.com' smtp_auth_password: 'aa' smtp_require_tls: false templates: - '/usr/local/alertmanager/template/*.tmpl' 郵件告警模板 # route標記:告警如何發送分配 route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1m receiver: 'mail' receivers: - name: 'mail' email_configs: - to: 'dd5@qq.com'
send_resolved: true #告警恢復 html: '{{ template "default-monitor.html" }}' #應用的哪個模板 headers: {Subject: "[WARN] 報警郵件 test"} #郵件主題信息 如果不寫headers也可以再模板中自定義默認加載email.default.subject這個模板
- smtp_smarthost:是用於發送郵件的郵箱的 SMTP 服務器地址+端口;
- smtp_auth_password:是發送郵箱的授權碼而不是登錄密碼;
- smtp_require_tls:不設置的話默認為 true,當為 true 時會有 starttls 錯誤,為了簡單這里設置為 false;
- templates:指出郵件的模板路徑;
- receivers 下 html 指出郵件內容模板名,這里模板名為 “alert.html”,在模板路徑中的某個文件中定義。
- headers:為郵件標題;
配置告警規則
配置 rule.yml
groups: - name: node_alerts rules: - alert: node-up告警 expr: up==0 for: 10s labels: serverity: page annotations: summary: "{{ $labels.instance }} 已停止運行超過10s"
配置prometheus.yml指定rule.yml的路徑
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - localhost:9093 #添加alertmanager# 新增
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: #- "/usr/local/prometheus/rules/*_alerts.yml" - "rules/*_alerts.yml"# 新增
# A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['xxxxxxxxxxx:9090'] - job_name: 'xxxxxxxxxx' static_configs: - targets: ['xxxxxxxxxxxxx:9100'] labels: instance: test
重啟 Prometheus 服務:
chown -R prometheus.prometheus /usr/local/prometheus/rule.yml
systemctl restart prometheus
編寫郵件模板
注意:文件后綴為 tmpl
告警模版
vi /usr/local/alertmanager/template/mail.tmpl
{{ define "default-monitor.html" }} {{ range .Alerts }} <pre> =============start=========== 告警程序: prometheus_alert 告警級別:{{ .Labels.severity }}
告警類型:
{{ .Labels.alertname }}
故障主機:{{ .Labels.instance }}
告警主題:{{ .Annotations.summary }}
告警詳情:
{{ .Annotations.description }}
觸發時間:
{{ .StartsAt.Format "2006-01-02 15:04:23" }}
==============end============
</pre>
{{ end }}
{{ end }}
告警回復模版
vi /usr/local/alertmanager/template/mail.tmpl
{{ define "default-monitor.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}
@警報
<pre>
類型: {{ .Labels.alertname }}
實例: {{ .Labels.instance }}
信息: {{ .Annotations.summary }}
詳情: {{ .Annotations.description }}
時間: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
</pre>
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}
@恢復
<pre>
類型: {{ .Labels.alertname }}
實例: {{ .Labels.instance }}
信息: {{ .Annotations.summary }}
時間: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
恢復: {{ .EndsAt.Format "2006-01-02 15:04:05" }}
</pre>
{{ end }}{{ end -}}
{{- end }}