【0】解決思路分析
需求:我需要每天屏蔽某個時間段的某些告警項
比如:凌晨4點會異地備份,導致流量報警,我想屏蔽每天4點-4點30分的該告警項
思路:
直接操作是沒有這個步驟的,曲線救國吧;
(1)定時任務,每天凌晨四點的 silences
(2)定時任務中使用 alertmanager api 來建設
【1】API
官網:https://www.kancloud.cn/pshizhsysu/prometheus/1872669
API Version
AlertManager有兩套API,v1與v2,不過兩套API的內部邏輯基本是一致的,調用哪套都沒有關系。v1沒有相關的文檔,不過我們可以找到v2的相關文檔。
API-v2的swagger文件的鏈接為:
https://github.com/prometheus/alertmanager/blob/master/api/v2/openapi.yaml
把這個文件的內容拷貝到 https://editor.swagger.io 里面,便可以查看API。下面羅列了v2版本的所有API:
# Alert
GET /api/v2/alerts POST /api/v2/alerts # AlertGroup GET /api/v2/alerts/groups # General GET /api/v2/status # Receiver GET /api/v2/receivers # Silence GET /api/v2/silences POST /api/v2/silences GET /api/v2/silence/{silenceID} DELETE /api/v2/silence/{silenceID}
其中最重要的是Alert與AlertGroup的那三個API,接下來我們詳細地講解一下
(1.1)獲取現有告警信息
使用其中提供的url:http://127.0.0.1:9090/api/v1/alerts ,可以獲取到報警的json信息,即可獲得json的格式
[{"annotations": {"description": "localhost:9100 of job exporter has been down for more than 5 minutes.", "summary":"Instance localhost:9100 down"}, "endsAt":"2021-11-25T08:13:56.026Z", "fingerprint":"d44e90ffc89b2ea1", "receivers":[{"name":"mail"}], "startsAt":"2021-11-25T07:36:11.026Z", "status":{"inhibitedBy":[],"silencedBy":[],"state":"active"}, "updatedAt":"2021-11-25T16:09:56.030+08:00", "generatorURL":"http://localhost.localdomain:9090/graph?g0.expr=up+%3D%3D+0\u0026g0.tab=1", "labels":{"alertname":"InstanceDown","instance":"localhost:9100","job":"exporter","severity":"page"} }]
(1.2)手動發送告警信息
alerts1='[ { "labels": { "alertname": "DiskRunningFull", "dev": "sda1", "instance": "example1" }, "annotations": { "info": "The disk sda1 is running full", "summary": "please check the instance example1" } }, { "labels": { "alertname": "DiskRunningFull", "dev": "sdb2", "instance": "example2" }, "annotations": { "info": "The disk sdb2 is running full", "summary": "please check the instance example2" } }, { "labels": { "alertname": "DiskRunningFull", "dev": "sda1", "instance": "example3", "severity": "critical" } }, { "labels": { "alertname": "DiskRunningFull", "dev": "sda1", "instance": "example3", "severity": "warning" } } ]' curl -XPOST -d"$alerts1" http://localhost:9093/api/v1/alerts curl -XPOST -d"$alerts1" http://localhost:9094/api/v1/alerts curl -XPOST -d"$alerts1" http://localhost:9095/api/v1/alerts
相關具體:編寫好的json文件可以使用curl語句進行測試
curl -i -k -H "Content-type: application/json" -X POST -d [{"annotations": {"description": "localhost:9100 of job exporter has been down for more than 5 minutes.", "summary":"Instance localhost:9100 down"}, "endsAt":"2021-11-25T08:13:56.026Z", "fingerprint":"d44e90ffc89b2ea1", "receivers":[{"name":"mail"}], "startsAt":"2021-11-25T07:36:11.026Z", "status":{"inhibitedBy":[],"silencedBy":[],"state":"active"}, "updatedAt":"2021-11-25T16:09:56.030+08:00", "generatorURL":"http://localhost.localdomain:9090/graph?g0.expr=up+%3D%3D+0\u0026g0.tab=1", "labels":{"alertname":"InstanceDown","instance":"localhost:9100","job":"exporter","severity":"page"} }] "http://192.168.217.22:9093/api/v1/alerts"
POST /api/v2/alerts
Body參數示例如下:
[ { "labels": {"label": "value", ...}, "annotations": {"label": "value", ...}, "generatorURL": "string", "startsAt": "2020-01-01T00:00:00.000+08:00", # optional "endsAt": "2020-01-01T01:00:00.000+08:00" # optional }, ... ]
Body參數是一個數組,里面是一個個的告警。其中startsAt與endsAt是可選參數,且格式必須是上面的那種,不能是時間戳。
GET /api/v2/alerts
Query參數如下,以下參數用來過濾告警
參數名 | 類型 | 默認值 | 是否必須 | 其他說明 |
---|---|---|---|---|
active | bool | true | optional | - |
silenced | bool | true | optional | - |
inhibited | bool | true | optional | - |
unprocessed | bool | true | optional | - |
filter | array[string] | 無 | optional | - |
receiver | string | 無 | optional | - |
其返回值如下:
[ { "labels": {"label": "value", ...}, "annotations": {"label": "value", ...}, "generatorURL": "string", "startsAt": "2020-01-01T00:00:00.000+08:00", "endsAt": "2020-01-01T01:00:00.000+08:00", "updatedAt": "2020-01-01T01:00:00.000+08:00", "fingerprint": "string" "receivers": [{"name": "string"}, ...], "status": { "state": "active", # active, unprocessed, ... "silencedBy": ["string", ...], "inhibitedBy": ["string", ...] }, ... ]
GET /api/v2/alerts/groups
Query參數如下,以下參數用來過濾告警
參數名 | 類型 | 默認值 | 是否必須 | 其他說明 |
---|---|---|---|---|
active | bool | true | optional | - |
silenced | bool | true | optional | - |
inhibited | bool | true | optional | - |
unprocessed | bool | true | optional | - |
filter | array[string] | 無 | optional | - |
receiver | string | 無 | optional | - |
其返回值如下:
[ { "labels": {"label": "value", ...}, "receiver": {"name": "string"}, # 注意與alert的receivers的區別 "alerts": [alert1, alert2, ...] # alert的Json結構與 `GET /api/v2/alerts` 返回值中的結構一致 }, ... ]
【2】API使用實踐
參考:https://www.kancloud.cn/pshizhsysu/prometheus/1874907
(2.1)基本案例
構造測試數據:
# 構造測試數據 aa='[ { "Labels": { "alertname": "NodeCpuPressure", "IP": "192.168.2.101" }, "Annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "StartsAt": "2020-02-17T23:00:00.000+08:00", "EndsAt": "2023-02-18T23:00:00.000+08:00" } ]' # 執行 POST curl http://127.0.0.1:9093/api/v2/alerts -XPOST -H'Content-Type: application/json' -d"$aa"
如果是使用V1版本,可以不用加 -H'Content-Type: application/json'
怎么取消掉?把結束時間修改一下就好了;
我們可以通過 curl -X GET localhost:9093/api/v2/alerts 獲取
然后修改其時間
然后我們在上述的操作前:
活躍且沒被靜默的 Alert
操作后:
【3】最佳實踐(定時靜默)
注意,靜默條目所生成、顯示的時間是UTC時間
(3.1)立馬構造出故障的告警信息
(3.2)構建好靜默,獲取靜默語句
最終結果:
curl '127.0.0.1:9093/api/v2/silences' \ -H 'Connection: keep-alive' \ -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36' \ -H 'Content-Type: application/json' \ -H 'Accept: */*' \ -H 'Origin: http://127.0.0.1:9093' \ -H 'Referer: http://127.0.0.1:9093/' \ -H 'Accept-Language: zh-CN,zh;q=0.9' \ -H 'Cookie: grafana_session=a504e4a78501efe7009fa8b7587d5fb4' \ --data-raw '{"matchers":[{"name":"alertname","value":"磁盤讀吞吐過高","isRegex":false},{"name":"instance","value":"127.0.0.1:9182","isRegex":false},{"name":"job","value":"測試_win","isRegex":false},{"name":"name","value":"測試數據庫鴨[47.103.57.124]","isRegex":false},{"name":"severity","value":"warning","isRegex":false}],"startsAt":"2022-03-17T07:44:45.446Z","endsAt":"2022-03-17T09:44:45.446Z","createdBy":"guochaoqun","comment":"tmp","id":null}' \ --compressed \ --insecure
(3.3)使用生成靜默規則
V1方式
curl -X POST http://127.0.0.1:9093/api/v1/silences -d'{"matchers":[{"name":"alertname","value":"磁盤讀吞 吐過高","isRegex":false},{"name":"instance","value":"47.103.57.124:9182","isRegex":false},{"name":"job","value":"test","isRegex":false},{"name":"name","value":"test庫[47.103.57.124]","isRegex":false},{"name":"severity","value":"warning","isRegex":false}],"startsAt":"2022-03-17T07:44:45.446Z","endsAt":"2022-03-17T09:44:45.446Z","createdBy":"guochaoqun","comment":"tmp","id":null}'
V2 方式,就必須要加 -H
curl '127.0.0.1:9093/api/v2/silences' \-H 'Content-Type: application/json' \
--data '{"matchers":[{"name":"alertname","value":"磁盤讀吞吐過高","isRegex":false},{"name":"instance","value":"127.0.0.1:9182","isRegex":false},{"name":"job","value":"測試_win","isRegex":false},{"name":"name","value":"測試庫[47.103.57.124]","isRegex":false},{"name":"severity","value":"warning","isRegex":false}],"startsAt":"2022-03-17T07:44:45.446Z","endsAt":"2022-03-17T09:44:45.446Z","createdBy":"guochaoqun","comment":"tmp","id":null}' \ --compressed \ --insecure
(3.4)自動化腳本
now_date=`date +%F --date="-1 day"` curl 'http://127.0.0.1:9093/api/v2/silences' \ -H 'Content-Type: application/json' \ --data '{"matchers":[{"name":"alertname","value":"磁盤讀吞吐過高","isRegex":false},{"name":"instance","value":"47.103.57.124:9182","isRegex":false},{"name":"job","value":"金游世界_win","isRegex":false},{"name":"name","value":"8833主庫_詳細對局[47.103.57.124]","isRegex":false},{"name":"severity","value":"warning","isRegex":false}],"startsAt":"'${now_date}'T20:10:45.446Z","endsAt":"'${now_date}'T20:40:45.446Z","createdBy":"guochaoqun","comment":"tmp","id":null}' \ --compressed --insecure