想好好理解下alertamanager中route的規則解讀,趁着這個機會,就直接拿着官方的demo文件進行解讀.文件內容如下:
routes:
- match_re:
service: ^(foo1|foo2|baz)$
receiver: team-X-mails
routes:
- match:
severity: critical
receiver: team-X-pager
- match:
service: files
receiver: team-Y-mails
routes:
- match:
severity: critical
receiver: team-Y-pager
- match:
service: database
receiver: team-DB-pager
# Also group alerts by affected database.
group_by: [alertname, cluster, database]
routes:
- match:
owner: team-X
receiver: team-X-pager
continue: true
- match:
owner: team-Y
receiver: team-Y-pager
對文件內容進行分拆分析
- match_re:
service: ^(foo1|foo2|baz)$
receiver: team-X-mails
routes:
- match:
severity: critical
receiver: team-X-pager
當服務 foo1|foo2|baz出現問題的時候,如果告警的解決的級別是critical,就會發送給team-X-pager組;當沒有匹配到的情況下,默認發送給team-X-mails
- match:
service: database
receiver: team-DB-pager
# Also group alerts by affected database.
group_by: [alertname, cluster, database]
routes:
- match:
owner: team-X
receiver: team-X-pager
continue: true
- match:
owner: team-Y
receiver: team-Y-pager
當服務是database出現問題的時候,如果匹配的標簽是team-X,就會發給team-X-pager;繼續匹配,當匹配的標簽是team-Y,就會發給team-Y-pager;如果都沒有匹配到,則默認發送給team-DB-pager
相關組標簽的解釋
Alertmanager可以對告警通知進行分組,將多條告警合合並為一個通知。這里我們可以使用group_by來定義分組規則。基於告警中包含的標簽,如果滿足group_by中定義標簽名稱,那么這些告警將會合並為一個通知發送給接收器。
有的時候為了能夠一次性收集和發送更多的相關信息時,可以通過group_wait參數設置等待時間,如果在等待時間內當前group接收到了新的告警,這些告警將會合並為一個通知向receiver發送。
而group_interval配置,則用於定義相同的Group之間發送告警通知的時間間隔。
