自定義PrometheusOperator監控項-nginx/mysql
除了 Kubernetes 集群中的一些資源對象、節點以及組件需要監控,有的時候我們可能還需要根據實際的業務需求去添加自定義的監控項,添加一個自定義監控的步驟也是非常簡單的。
- 第一步建立一個 ServiceMonitor 對象,用於 Prometheus 添加監控項
- 第二步為 ServiceMonitor 對象關聯 metrics 數據接口的一個 Service 對象
- 第三步確保 Service 對象可以正確獲取到 metrics 數據
1 自定義監控Nginx
步驟
1.創建采集工具使用deploy部署
2.創建采集工具Service ##clusterIP: None
3.創建ServiceMonitor
1.1 先helm部署Nginx
[root@k8s-master helm]# helm create nginx
Creating nginx
[root@k8s-master helm]# helm install nginx NAME: guiding-dachshund LAST DEPLOYED: Fri Sep 27 11:37:08 2019 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1/Deployment NAME READY UP-TO-DATE AVAILABLE AGE guiding-dachshund-nginx 0/1 0 0 0s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE guiding-dachshund-nginx-54475b65c8-sl78p 0/1 ContainerCreating 0 0s ==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE guiding-dachshund-nginx ClusterIP 10.101.205.141 <none> 80/TCP 0s NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=nginx,app.kubernetes.io/instance=guiding-dachshund" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl port-forward $POD_NAME 8080:80 [root@k8s-master helm]# kubectl get pod,svc | grep nginx pod/guiding-dachshund-nginx-54475b65c8-sl78p 1/1 Running 0 15s service/guiding-dachshund-nginx ClusterIP 10.101.205.141 <none> 80/TCP 15s [root@k8s-master helm]# [root@k8s-master helm]# curl -I 10.101.205.141 HTTP/1.1 200 OK Server: nginx/1.16.1 Date: Fri, 27 Sep 2019 03:37:56 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Tue, 13 Aug 2019 10:05:00 GMT Connection: keep-alive ETag: "5d528b4c-264" Accept-Ranges: bytes [root@k8s-master helm]#
1.2 nginx監控
https://blog.51cto.com/billy98/2357919
prometheus對收集的數據格式是有一定的要求的,具體格式如下,只有符合此格式的prometheus才會正常的采集,所以在應用中我們需要能把關鍵性的監控數據以此格式拋出來。
nginx_http_connections{state="active"} 2 nginx_http_connections{state="reading"} 0 nginx_http_connections{state="waiting"} 1 nginx_http_connections{state="writing"} 1 nginx_http_request_bytes_sent{host="10.46.0.4"} 11055968 nginx_http_request_bytes_sent{host="testservers"} 4640 nginx_http_request_time_bucket{host="10.46.0.4",le="00.005"} 3960
至於怎么把監控指標數據拋出來,可能需要從應用本身動手。Prometheus社區也提供了大量的官方以及第三方Exporters,可以滿足Prometheus的采納者快速實現對關鍵業務,以及基礎設施的監控需求。
官方以及第三方Exporters請參考此鏈接。
此處我們建議直接使用Prometheus的Client進行注冊監控接口。Promehtues的Client目前支持大部分編程語言,支持列表可以參考如下文章。
prometheus client支持的語言列表。
本次是使用nginx demo鏡像來演示。
1. 創建deployment和service
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nginx-demo labels: app: nginx-demo spec: replicas: 1 selector: matchLabels: app: nginx-demo template: metadata: labels: app: nginx-demo spec: containers: - name: nginx-demo image: billy98/nginx-prometheus-metrics:latest ports: - name: http-metrics containerPort: 9527 - name: web containerPort: 80 - name: test containerPort: 1314 imagePullPolicy: IfNotPresent --- apiVersion: v1 kind: Service metadata: labels: app: nginx-demo name: nginx-demo namespace: default spec: ports: - name: http-metrics port: 9527 protocol: TCP targetPort: 9527 - name: web port: 80 protocol: TCP targetPort: 80 - name: test port: 1314 protocol: TCP targetPort: 1314 selector: app: nginx-demo type: ClusterIP
2. 創建ServiceMonitor
由於prometheus里指定了serviceMonitor的標簽必須release: p
才會去抓取數據,所以在我們應用的ServiceMonitor中必須加上此標簽。
[root@node-01 ~]# kubectl -n monitoring get prometheus p-prometheus -o yaml ... serviceMonitorSelector: matchLabels: release: p ...(其余內容省略) apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: nginx-demo release: p name: nginx-demo namespace: monitoring #prometheus的namespace spec: endpoints: - interval: 15s port: http-metrics namespaceSelector: matchNames: - default #nginx demo的namespace selector: matchLabels: app: nginx-demo
此處需要特別做些解釋,serviceMonitor是Prometheus Operator中抽象的概念,他的作用就是講配置Prometheus采集Target的配置變化成為動態發現的方式,可以serviceMonitor通過Deployment對應的Service配置進行掛鈎,通過label selector選擇Service,並自動發現后端容器。其中需要注意的是namespace字段永遠為monitoring,而namespaceSelector中則是選擇的應用所在的namespace。
創建完成可以看到endpoints
[root@k8s-master mysql]# kubectl get ep| grep nginx guiding-dachshund-nginx 10.254.2.251:80 119m nginx-demo 10.254.1.189:9527,10.254.1.189:80,10.254.1.189:1314 115m [root@k8s-master mysql]#
然后我們訪問10.254.1.189:1314
生成一些測試指標。
[root@k8s-master mysql]# curl 10.254.1.189:1314 hello world [root@k8s-master mysql]#
查看監控數據
[root@k8s-master mysql]# curl 10.254.1.189:9527/metrics # HELP nginx_http_connections Number of HTTP connections # TYPE nginx_http_connections gauge nginx_http_connections{state="active"} 3 nginx_http_connections{state="reading"} 0 nginx_http_connections{state="waiting"} 2 nginx_http_connections{state="writing"} 1 # HELP nginx_http_request_bytes_sent Number of HTTP request bytes sent # TYPE nginx_http_request_bytes_sent counter nginx_http_request_bytes_sent{host="10.254.1.189"} 1165650 nginx_http_request_bytes_sent{host="testservers"} 160 # HELP nginx_http_request_time HTTP request time # TYPE nginx_http_request_time histogram nginx_http_request_time_bucket{host="10.254.1.189",le="00.005"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.010"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.020"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.030"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.050"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.075"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.100"} 417
3. 驗證
訪問Prometheus,驗證數據采集,打開Status下的Service Discovery,active的數目等於Pod數據即表示采集正常
打開Graph頁面,選擇我們剛才推送的數據指標名稱,點擊Execute,即可查看到采集上來的數據。prometheus查詢語法請參考prometheus查詢語句示例。
1.3 Nginx添加到grafanan
這個沒有特別好的模板,可以自己創建直接選prometheus,把SQL查詢語句放上去調整就行了
2 自定義監控MySQL
https://blog.csdn.net/travellersY/article/details/84632679
照舊老三步
步驟
1.創建采集工具使用deploy部署
2.創建采集工具Service ##clusterIP: None
3.創建ServiceMonitor
2.1 部署一個MySQL作為監控樣例
最簡單的k8s-mysql部署
# cat mysql-deploy.yaml apiVersion: extensions/v1beta1 kind: Deployment #副本控制器Deployment metadata: name: mysql #Deployment的名稱,全局唯一 spec: replicas: 1 #Pod副本的期待數量 template: #根據此模版創建Pod的副本(實例) metadata: labels: app: mysql #Pod副本擁有的標簽,對應Deployment的selector spec: containers: #Pod內,定義容器 - name: mysql #容器名稱 image: mysql:5.7 #Docker image ports: - containerPort: 3306 #容器應用監聽的端口 env: #注入容器內的環境變量 - name: MYSQL_ROOT_PASSWORD #這里設置root初始密碼 value: "123456" # cat mysql-svc.yaml apiVersion: v1 kind: Service metadata: name: mysql spec: type: NodePort ports: - port: 3306 nodePort: 30001 selector: app: mysql
連接測試
[root@k8s-master ~]# kubectl get pod,svc | grep mysql pod/mysql-94f6bbcfd-9nl7w 1/1 Running 0 116m service/mysql NodePort 10.106.33.138 <none> 3306:30001/TCP 110m [root@k8s-master ~]# mysql -uroot -p123456 -h10.106.33.138 -P3306 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 656 Server version: 5.7.27 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]> Bye [root@k8s-master ~]# mysql -uroot -p123456 -h10.6.76.23 -P30001 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 661 Server version: 5.7.27 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]>
2.2 下載prometheus-mysql-exporter
[root@k8s-master helm]# mkdir helm_chart [root@k8s-master helm]# cd helm_chart/ [root@k8s-master helm_chart]# ls [root@k8s-master helm_chart]# git clone https://github.com/helm/charts.git 正克隆到 'charts'... remote: Enumerating objects: 7, done. remote: Counting objects: 100% (7/7), done. remote: Compressing objects: 100% (7/7), done. remote: Total 84313 (delta 2), reused 2 (delta 0), pack-reused 84306 接收對象中: 100% (84313/84313), 23.63 MiB | 80.00 KiB/s, done. 處理 delta 中: 100% (61834/61834), done. [root@k8s-master helm_chart]# [root@k8s-master helm_chart]# cd charts/stable/prometheus-mysql-exporter/ [root@k8s-master prometheus-mysql-exporter]# ls a.yaml Chart.yaml OWNERS README.md templates values.yaml [root@k8s-master prometheus-mysql-exporter]#
2.3 創建prometheus-mysql-exporter服務
在kubernetes中添加mysql監控的exporter:prometheus-mysql-exporter 這里采用helm的方式安裝prometheus-mysql-exporter,按照github上的步驟進行安裝,修改values.yaml中的datasource為安裝在kubernetes中mysql的地址
[root@k8s-master prometheus-mysql-exporter]# cat values.yaml ... mysql: db: "" host: "10.106.33.138" param: "" pass: "123456" port: 3306 protocol: "" user: "root"
創建
helm install --name my-release -f values.yaml ../prometheus-mysql-exporter [root@k8s-master prometheus-mysql-exporter]# kubectl get pod,svc | grep mysql pod/my-release-prometheus-mysql-exporter-75cb8bffc7-qqckz 1/1 Running 0 107m pod/mysql-94f6bbcfd-9nl7w 1/1 Running 0 120m service/my-release-prometheus-mysql-exporter ClusterIP 10.104.90.123 <none> 9104/TCP 107m service/mysql NodePort 10.106.33.138 <none> 3306:30001/TCP 113m [root@k8s-master prometheus-mysql-exporter]#
測試與MySQL連接
當指標值為1時表示能夠正常獲取監控數據,如果不是1請排查MySQL連接設置,權限,日志等
[root@k8s-master prometheus-mysql-exporter]# curl 10.104.90.123:9104/metrics|grep mysql_up % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 177k 100 177k 0 0 3176k 0 --:--:-- --:--:-- --:--:-- 3231k # HELP mysql_up Whether the MySQL server is up. # TYPE mysql_up gauge mysql_up 1 [root@k8s-master prometheus-mysql-exporter]#
2.4 創建ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor #資源類型為ServiceMonitor
metadata:
labels:
prometheus: kube-prometheus #prometheus默認通過 prometheus: kube-prometheus發現ServiceMonitor,只要寫上這個標簽prometheus服務就能發現這個ServiceMonitor name: prometheus-exporter-mysql namespace: monitoring spec: jobLabel: app #jobLabel指定的標簽的值將會作為prometheus配置文件中scrape_config下job_name的值,也就是Target,如果不寫,默認為service的name selector: matchLabels: #該ServiceMonitor匹配的Service的labels,如果使用mathLabels,則下面的所有標簽都匹配時才會匹配該service,如果使用matchExpressions,則至少匹配一個標簽的service都會被選擇 app: prometheus-mysql-exporter # 由於前面查看mysql-exporter的service信息中標簽包含了app: prometheus-mysql-exporter這個標簽,寫上就能匹配到 namespaceSelector: #any: true #表示從所有namespace中去匹配,如果只想選擇某一命名空間中的service,可以使用matchNames: []的方式 matchNames: - default endpoints: - port: mysql-exporter #前面查看mysql-exporter的service信息中,提供mysql監控信息的端口是Port: mysql-exporter 9104/TCP,所以這里填mysql-exporter interval: 30s #每30s獲取一次信息 # path: /metrics HTTP path to scrape for metrics,默認值為/metrics honorLabels: true
[root@k8s-master prometheus-mysql-exporter]# kubectl get servicemonitors.monitoring.coreos.com -n monitoring
NAME AGE
alertmanager 3d3h
coredns 3d3h
grafana 3d3h
kube-apiserver 3d3h kube-controller-manager 3d3h kube-scheduler 3d3h kube-state-metrics 3d3h kubelet 3d2h node-exporter 3d3h prometheus-exporter-mysql 89m prometheus-operator 3d3h [root@k8s-master prometheus-mysql-exporter]#
部分監控指標:
查詢速率: mysql_global_status_questions 寫操作速率 sum(rate(mysql_global_status_commands_total{command=~"insert|update|delete"}[2m])) without (command) MySQL默認的最大鏈接數為151。臨時調整最大連接數,可以通過以下指令進行設置: SET GLOBAL max_connections = 200; 如果想永久化設置,則需要通過修改MySQL配置文件my.cnf,添加以下內容: max_connections = 200 剩余連接 mysql_global_variables_max_connections - mysql_global_status_threads_connected 當前拒絕連接: mysql_global_status_aborted_connects 2分鍾內磁盤讀取請求次數的增長率的變化情況: rate(mysql_global_status_innodb_buffer_pool_reads[2m])
2.5 添加到grafana
我們使用7362模板
2.6 報警規則
[root@k8s-master manifests]# pwd
/root/prometheus/operator/kube-prometheus/manifests [root@k8s-master manifests]# tail -78 prometheus-rules.yaml ############ - name: MySQL rules: - alert: 三分鍾內有重啟記錄 expr: mysql_global_status_uptime < 180 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Instance_Reboot detected" description: "{{$labels.instance}}: Mysql_Instance_Reboot in 3 minute (up to now is: {{ $value }} seconds" - alert: 每秒查詢次數 expr: rate(mysql_global_status_questions[5m]) > 500 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_High_QPS detected" description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})" - alert: 連接數 expr: rate(mysql_global_status_connections[5m]) > 100 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql Too Many Connections detected" description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})" - alert: mysql接收速率,單位Mbps expr: rate(mysql_global_status_bytes_received[3m]) * 1024 * 8 > 100 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_High_Recv_Rate detected" description: "{{$labels.instance}}: Mysql_Receive_Rate is more than 100Mbps ,(current value is: {{ $value }})" - alert: mysql傳輸速率,單位Mbps expr: rate(mysql_global_status_bytes_sent[3m]) * 1024 * 8 > 100 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_High_Send_Rate detected" description: "{{$labels.instance}}: Mysql data Send Rate is more than 100Mbps ,(current value is: {{ $value }})" - alert: 慢查詢 expr: rate(mysql_global_status_slow_queries[30m]) > 3 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Too_Many_Slow_Query detected" description: "{{$labels.instance}}: Mysql current Slow_Query Sql is more than 3 ,(current value is: {{ $value }})" - alert: 死鎖 expr: mysql_global_status_innodb_deadlocks > 0 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Deadlock detected" description: "{{$labels.instance}}: Mysql Deadlock was found ,(current value is: {{ $value }})" - alert: 活躍線程小於30% expr: mysql_global_status_threads_running / mysql_global_status_threads_connected * 100 < 30 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Too_Many_sleep_threads detected" description: "{{$labels.instance}}: Mysql_sleep_threads percent is more than {{ $value }}, please clean the sleeping threads" - alert: innodb緩存占用緩存池大小超過80% expr: (mysql_global_status_innodb_page_size * on (instance) mysql_global_status_buffer_pool_pages{state="data"} + on (instance) mysql_global_variables_innodb_log_buffer_size + on (instance) mysql_global_variables_innodb_additional_mem_pool_size + on (instance) mysql_global_status_innodb_mem_dictionary + on (instance) mysql_global_variables_key_buffer_size + on (instance) mysql_global_variables_query_cache_size + on (instance) mysql_global_status_innodb_mem_adaptive_hash ) / on (instance) mysql_global_variables_innodb_buffer_pool_size * 100 > 80 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_innodb_Cache_insufficient detected" description: "{{$labels.instance}}: Mysql innodb_Cache was used more than 80% ,(current value is: {{ $value }})"
3 配置發送報警
3.1 查看相關配置文件
添加一個報警規則配置項,可以通過 AlertManager 的配置文件去配置各種報警接收器
首先我們將 alertmanager-main 這個 Service 改為 NodePort 類型的 Service,修改完成后我們可以在頁面上的 status 路徑下面查看 AlertManager 的配置信息:
[root@k8s-master manifests]# cat alertmanager-service.yaml apiVersion: v1 kind: Service metadata: labels: alertmanager: main name: alertmanager-main namespace: monitoring spec: type: NodePort ports: - name: web port: 9093 targetPort: web selector: alertmanager: main app: alertmanager sessionAffinity: ClientIP
這些配置信息實際上是來自於我們之前在prometheus-operator/contrib/kube-prometheus/manifests
目錄下面創建的 alertmanager-secret.yaml 文件:
[root@k8s-master manifests]# cat alertmanager-secret.yaml apiVersion: v1 data: alertmanager.yaml: Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gImpvYiIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg== kind: Secret metadata: name: alertmanager-main namespace: monitoring type: Opaque
可以將 alertmanager.yaml 對應的 value 值做一個 base64 解碼:
[root@k8s-master manifests]# echo "Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gImpvYiIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==" |base64 -d "global": "resolve_timeout": "5m" "receivers": - "name": "null" "route": "group_by": - "job" "group_interval": "5m" "group_wait": "30s" "receiver": "null" "repeat_interval": "12h" "routes": - "match": "alertname": "Watchdog" "receiver": "null"[root@k8s-master manifests]#
3.2 准備釘釘機器人
這個和之前都一樣
悲催的是趕上釘釘升級,機器人新建不了,我們用之前的Jenkins留下的
3.3 配置釘釘報警發送
[root@k8s-master manifests]# cat dingtalk.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: dingtalk-hook namespace: monitoring spec: template: metadata: labels: app: dingtalk-hook spec: containers: - name: dingtalk-hook image: cnych/alertmanager-dingtalk-hook:v0.2 imagePullPolicy: IfNotPresent ports: - containerPort: 5000 name: http env: - name: ROBOT_TOKEN valueFrom: secretKeyRef: name: dingtalk-secret key: token resources: requests: cpu: 50m memory: 100Mi limits: cpu: 50m memory: 100Mi --- apiVersion: v1 kind: Service metadata: name: dingtalk-hook namespace: monitoring spec: selector: app: dingtalk-hook ports: - name: hook port: 5000 targetPort: http
要注意上面我們聲明了一個 ROBOT_TOKEN 的環境變量,由於這是一個相對於私密的信息,所以我們這里從一個 Secret 對象中去獲取,通過如下命令創建一個名為 dingtalk-secret 的 Secret 對象,然后部署上面的資源對象即可:
[root@k8s-master alertmanager]# kubectl create secret generic dingtalk-secret --from-literal=token=17549607d838b3015d183384ffe53333b13df0a98563150df241535808e10781 -n kube-system
secret/dingtalk-secret created [root@k8s-master alertmanager]# kubectl create -f dingtalk-hook.yaml deployment.extensions/dingtalk-hook created service/dingtalk-hook created [root@k8s-master manifests]# kubectl -n monitoring get secrets | grep dingtalk dingtalk-secret Opaque 1 61m [root@k8s-master manifests]# kubectl -n monitoring get pod,svc | grep dingtalk pod/dingtalk-hook-686ddd6976-pq4fk 1/1 Running 0 59m service/dingtalk-hook ClusterIP 10.111.250.130 <none> 5000/TCP 59m [root@k8s-master manifests]#
3.4 配置報警接收
部署成功后,現在我們就可以給 AlertManager 配置一個 webhook 了,在上面的配置中增加一個路由接收器
[root@k8s-master manifests]# cat alertmanager.yaml global: resolve_timeout: 5m smtp_smarthost: 'smtp.163.com:25' smtp_from: 'w.jjwx@163.com' smtp_auth_username: 'w.jjwx@163.com' smtp_auth_password: '密碼' smtp_hello: '163.com' smtp_require_tls: false route: group_by: ['job', 'severity'] group_wait: 30s #測試配置的時間較短 group_interval: 1m repeat_interval: 2m #group_interval: 5m #repeat_interval: 12h receiver: default # receiver: webhook routes: - receiver: webhook match: alertname: CPUThrottlingHigh receivers: - name: 'default' email_configs: - to: '314144952@qq.com' send_resolved: true - name: 'webhook' webhook_configs: - url: 'http://dingtalk-hook.monitoring:5000' send_resolved: true
將上面文件保存為 alertmanager.yaml,然后使用這個文件創建一個 Secret 對象:
將上面文件保存為 alertmanager.yaml,然后使用這個文件創建一個 Secret 對象:
# 先將之前的 secret 對象刪除
kubectl delete secret alertmanager-main -n monitoring kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
加載配置文件
[root@k8s-master manifests]# kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main NodePort 10.109.59.250 <none> 9093:30583/TCP 13m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 9m dingtalk-hook ClusterIP 10.111.250.130 <none> 5000/TCP 151m grafana NodePort 10.100.31.73 <none> 3000:32339/TCP 4d2h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 4d2h node-exporter ClusterIP None <none> 9100/TCP 4d2h prometheus-adapter ClusterIP 10.97.88.175 <none> 443/TCP 4d2h prometheus-k8s NodePort 10.97.199.239 <none> 9090:31466/TCP 4d2h prometheus-operated ClusterIP None <none> 9090/TCP 4d2h prometheus-operator ClusterIP None <none> 8080/TCP 4d2h [root@k8s-master manifests]# curl -X POST "http://10.109.59.250:9093/-/reload" [root@k8s-master manifests]# curl -X POST "http://10.97.199.239:9090/-/reload" [root@k8s-master manifests]#
[root@k8s-master manifests]# kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main NodePort 10.109.59.250 <none> 9093:30583/TCP 13m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 9m dingtalk-hook ClusterIP 10.111.250.130 <none> 5000/TCP 151m grafana NodePort 10.100.31.73 <none> 3000:32339/TCP 4d2h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 4d2h node-exporter ClusterIP None <none> 9100/TCP 4d2h prometheus-adapter ClusterIP 10.97.88.175 <none> 443/TCP 4d2h prometheus-k8s NodePort 10.97.199.239 <none> 9090:31466/TCP 4d2h prometheus-operated ClusterIP None <none> 9090/TCP 4d2h prometheus-operator ClusterIP None <none> 8080/TCP 4d2h [root@k8s-master manifests]# curl -X POST "http://10.109.59.250:9093/-/reload" [root@k8s-master manifests]# curl -X POST "http://10.97.199.239:9090/-/reload" [root@k8s-master manifests]#
如果配置文件有錯的,會爆出來
[root@k8s-master manifests]# curl -X POST "http://10.109.59.250:9093/-/reload" failed to reload config: undefined receiver "webhook" used in route [root@k8s-master manifests]#
我們添加了兩個接收器,默認的通過郵箱進行發送,對於 CPUThrottlingHigh這個報警我們通過 webhook 來進行發送,這個 webhook 就是我們前面課程中定義的一個釘釘接收的 Server,上面的步驟創建完成后,很快我們就會收到一條釘釘消息:
釘釘