參考:https://www.bbsmax.com/A/gGdXbgXmJ4/
https://www.deathearth.com/333.html
https://www.cnblogs.com/amyzhu/p/10193557.html
ELK搭建好之后,如何利用收集到的數據進行告警呢,可以使用插件sentiel
一,安裝環境
1,系統環境
2,軟件版本選擇
java 1.8.0_171 elasticsearch 6.2.4 kibana 6.2.4
二,安裝
1,安裝ELK
略
2,安裝sentinl插件
根據ELK版本下載插件,本次下載版本為6.2.4
https://github.com/sirensolutions/sentinl/releases/
/usr/share/kibana/bin/kibana-plugin install file:///nas/nas/softs/elk/6.2.4/sentinl-v6.2.4-1.zip
安裝后查看
設置郵件,修改kibana配置文件/etc/kibana/kibana.yml在尾部添加以下內容
sentinl: settings: email: active: true user: xxx@xxx.com #郵箱地址 password: xxxx #郵箱密碼或者授權碼 host: smtp.exmail.qq.com #發送郵件服務器 ssl: true #根據實際情況添加 改成false則port修改成25,如果是阿里雲禁用25端口需要使用ssl port: 465 report: active: true
重啟kibana
systemctl restart kibana
打開head可以查看到生成了一個名字為wacter_alarms的索引
打開kibana菜單可以看到sentina選項
新建一個watchers
修改完可以編輯或者測試
點擊運行測試
查看告警信息
配置advanced文件設置查詢告警條件,一個較為完整的配置文件如下
{ "actions": { "Email_alarm_773206d5-2977-465e-882d-762a7d69fe68": { "name": "Email alarm", "throttle_period": "15m", "email": { "priority": "low", "stateless": false, "body": "Find error log {{payload.hits.total}}", #發送郵件的內容,統計出現關鍵字錯誤的匹配次數 "to": "xxx@xxx.com", #郵件接收方自定義 "from": "xxx@xxx.com" #郵件發送方為kibana配置文件里面的郵箱 } } }, "input": { "search": { "request": { "index": [ "system-log-*" #索引名 ], "body": { "query": { "bool": { "must": [ { "range": { "@timestamp": { #匹配時間 "gte": "now-5m/m", #大於或等於從現在減5分鍾 "lte": "now/m", #小於等於現在 "format": "epoch_millis" } } } ], "filter": [ { "multi_match": { "type": "best_fields", "query": "error", #匹配日志里面是否出現關鍵字error "lenient": true } } ] } }, "size": 0, "aggs": { "dateAgg": { "date_histogram": { "field": "@timestamp", "time_zone": "Asia/Shanghai", "interval": "1m", "min_doc_count": 1 } } } } } } }, "condition": { "script": { "script": "payload.hits.total>1" #匹配的次數大於1則觸發告警動作 } }, "trigger": { "schedule": { "later": "every 5 minutes" #每五分鍾執行一次 } }, "disable": false, "report": false, "title": "system-log錯誤日志監控告警", "wizard": {}, "save_payload": false, "spy": false, "impersonate": false }
PS:為方便理解加了注釋,時間配置文件不可加注釋
監控對應日志五分鍾內是否出現關鍵字error如果出現並且大於1則觸發郵件告警
往對應日志重定向幾次error即可觸發該告警
郵件內容如下
在寫一個監控CPU使用率告警配置文件
{ "actions": { "HTML_email_alarm_5fbf1925-81fc-4d73-a37e-b6ac8b9bfc06": { "name": "HTML email alarm", "throttle_period": "1m", "email_html": { "html": "五分鍾內cpu使用率超過10% 次數為{{ payload.hits.total }}", "priority": "low", "stateless": false, "to": "xxx@xxx.com", "from": "xxx@xxx.com" } } }, "input": { "search": { "request": { "index": [ "metricbeat-*" ], "body": { "query": { "bool": { "filter": [ { "range": { "system.cpu.total.pct": { "gt": 0.1 } } } ], "must": [ { "range": { "@timestamp": { "gte": "now-5m/m", "lte": "now/m", "format": "epoch_millis" } } } ] } }, "size": 0, "aggs": { "dateAgg": { "date_histogram": { "field": "@timestamp", "time_zone": "Europe/Amsterdam", "interval": "1m", "min_doc_count": 1 } } } } } } }, "condition": { "script": { "script": "payload.hits.total >=1" } }, "trigger": { "schedule": { "later": "every 5 minutes" } }, "disable": false, "report": false, "title": "metricber", "wizard": {}, "save_payload": true, "spy": false, "impersonate": false }
監控CPU使用率如果大於10%就告警,system.cpu.total.pct為浮點數,對比大於0.1就是大於10%