【阿圓實驗】Alertmanager HA 高可用配置

本文轉載自查看原文 2019-04-30 15:02 1169 prometheus監控/ 監控

注意：沒有使用supervisor進程管理器的，只參考配置，忽略和supervisor相關命令。並且alertmanager的版本不得低於0.15.2，低版本alert不支持集群配置。

一.alertmanager高可用

這里使用的是supervisor配置，也可以把配置集合成命令行方式，在服務器運行配置。記得加&，后台運行。

1.配置alertmanager集群

1.1 修改各節點alertmanager.yml

cd /data/yy-monitor-server/etc

vim alertmanager.yml

 
                  # The root route on which each incoming alert enters. 
                 
                  route: 
                 
                  routes: 
                 
                  group_wait: 15s 
                 
                  group_interval: 15s

1.2 修改啟動文件

根目錄下運行 vim /etc/supervisord.d/yy-monitor-server.ini

 
                  [program:alertmanager] 
                 
                  priority = 3 
                 
                  user = yy 
                 
                  command  
                  = 
                 
                  /usr/bin/alertmanager 
                 
                  --cluster.listen-address= 
                  "10.22.0.1002:12001"    
                  # 當前節點ip和自定義的端口號 
                 
                  --log.level=debug

其他節點配置：

 
                  [program:alertmanager] 
                 
                  priority = 3 
                 
                  user = yy 
                 
                  command  
                  = 
                 
                  /usr/bin/alertmanager 
                 
                  --cluster.listen-address= 
                  "10.22.0.1001:12002"    
                  # 當前節點ip和自定義的端口號： 
                 
                  --cluster.peer=10.22.0.1002:12001                
                  # 選擇一個節點加入集群 
                 
                  --log.level=debug

重啟配置，否則不能生效：

systemctl restart supervisord

supervisorctl restart alertmanager

2.查看日志

cd /data/yy-monitor-server/log

tail -f alermanager.log

如1002機器的alertmanager日志

 
                  level=debug ts=2018-08-28T08:58:44.75092899Z caller=cluster.go:287 component=cluster memberlist= 
                  "2018/08/28 16:58:44 [DEBUG] memberlist: Initiating push/pull sync with: 10.22.0.1001:12002\n" 
                 
                  level=debug ts=2018-08-28T08:59:21.675338872Z caller=cluster.go:287 component=cluster memberlist= 
                  "2018/08/28 16:59:21 [DEBUG] memberlist: Stream connection from=10.22.0.1001:42736\n" 
                 
                  level=debug ts=2018-08-28T08:59:44.754235616Z caller=cluster.go:287 component=cluster memberlist= 
                  "2018/08/28 16:59:44 [DEBUG] memberlist: Initiating push/pull sync with: 10.22.0.1000:12003\n"

啟動完成后訪問任意Alertmanager節點http://localhost:9093/#/status,可以查看當前Alertmanager集群的狀態。

3.修改各節點prometheus.yml

cd /data/yy-monitor-server/etc

vi prometheus.yml

1002機器

 
                  global: 
                 
                  scrape_interval:     5s 
                 
                  scrape_timeout:      5s 
                 
                  evaluation_interval: 5s 
                 
                  # The labels to add to any time series or alerts when communicating with 
                 
                  # external systems (federation, remote storage, Alertmanager). 
                 
                  external_labels: 
                 
                  dc 
                  : europe1 
                 
                  # Alertmanager configuration 
                 
                  alerting: 
                 
                  alert_relabel_configs: 
                 
                  - source_labels: [ 
                  dc 
                  ] 
                 
                  regex: (.+)\d+ 
                 
                  target_label:  
                  dc 
                 
                  alertmanagers: 
                 
                  - static_configs: 
                 
                  - targets:: [ 
                  '10.22.0.1000:9093' 
                  , 
                  '10.22.0.1001:9093' 
                  ,  
                  '10.22.0.1002:9093' 
                  ]

1001機器

 
                  global: 
                 
                  scrape_interval:     5s 
                 
                  scrape_timeout:      5s 
                 
                  evaluation_interval: 5s 
                 
                  # Note that this is different only by the trailing number. 
                 
                  external_labels: 
                 
                  dc 
                  : europe2 
                 
                  # Alertmanager configuration 
                 
                  alerting: 
                 
                  alert_relabel_configs: 
                 
                  - source_labels: [ 
                  dc 
                  ] 
                 
                  regex: (.+)\d+ 
                 
                  target_label:  
                  dc 
                 
                  alertmanagers: 
                 
                  - static_configs: 
                 
                  - targets:: [ 
                  '10.22.0.1000:9093' 
                  , 
                  '10.22.0.1001:9093' 
                  ,  
                  '10.22.0.1002:9093' 
                  ]

1000機器

 
                  global: 
                 
                  scrape_interval:     5s 
                 
                  scrape_timeout:      5s 
                 
                  evaluation_interval: 5s 
                 
                  external_labels: 
                 
                  dc 
                  : europe3 
                 
                  # Alertmanager configuration 
                 
                  alerting: 
                 
                  alert_relabel_configs: 
                 
                  - source_labels: [ 
                  dc 
                  ] 
                 
                  regex: (.+)\d+ 
                 
                  target_label:  
                  dc 
                 
                  alertmanagers: 
                 
                  - static_configs: 
                 
                  - targets:: [ 
                  '10.22.0.1000:9093' 
                  , 
                  '10.22.0.1001:9093' 
                  ,  
                  '10.22.0.1002:9093' 
                  ]

2.重啟prometheus：

1000、1001、1002

 
                  # supervisorctl restart prometheus 
                 
                  prometheus: stopped 
                 
                  prometheus: started

二. Alertmanager代理配置

1.nginx配置

選取一台主機做配置（如：10.22.0.1002）

cd /data/yy-monitor-server/etc

vi nginx.conf

 
                  # Alertmanager 
                 
                  upstream alert{ 
                 
                  server 10.22.0.1002:9093; 
                 
                  server 10.22.0.1001:9093; 
                 
                  server 10.22.0.1000:9093; 
                 
                  } 
                 
                  server{ 
                 
                  # alertmanager 
                 
                  location  
                  /alertmanager/  
                  { 
                 
                  proxy_pass      http: 
                  //alert/ 
                  ; 
                 
                  } 
                 
                  }

重啟nginx

 
                  # supervisorctl restart nginx 
                 
                  nginx: stopped 
                 
                  nginx: started

2.驗證配置

停止其中兩台服務：

 
                  1002  
                  # supervisorctl stop alertmanager 
                 
                  alertmanager: stopped 
                 
                  1001  
                  # supervisorctl stop alertmanager 
                 
                  alertmanager: stopped

訪問ui正常，配置代理成功。

附錄：https://github.com/prometheus/alertmanager#high-availability

To create a highly available cluster of the Alertmanager the instances need to be configured to communicate with each other. This is configured using the --cluster.* flags.

--cluster.listen-address string: cluster listen address (default "0.0.0.0:9094")
--cluster.advertise-address string: cluster advertise address
--cluster.peer value: initial peers (repeat flag for each additional peer)
--cluster.peer-timeout value: peer timeout period (default "15s")
--cluster.gossip-interval value: cluster message propagation speed (default "200ms")
--cluster.pushpull-interval value: lower values will increase convergence speeds at expense of bandwidth (default "1m0s")
--cluster.settle-timeout value: maximum time to wait for cluster connections to settle before evaluating notifications.
--cluster.tcp-timeout value: timeout value for tcp connections, reads and writes (default "10s")
--cluster.probe-timeout value: time to wait for ack before marking node unhealthy (default "500ms")
--cluster.probe-interval value: interval between random node probes (default "1s")

The chosen port in the cluster.listen-address flag is the port that needs to be specified in the cluster.peer flag of the other peers.

To start a cluster of three peers on your local machine use goreman and the Procfile within this repository.

goreman start

To point your Prometheus 1.4, or later, instance to multiple Alertmanagers, configure them in your prometheus.yml configuration file, for example:

alerting:
  alertmanagers: - static_configs: - targets: - alertmanager1:9093 - alertmanager2:9093 - alertmanager3:9093

Important: Do not load balance traffic between Prometheus and its Alertmanagers, but instead point Prometheus to a list of all Alertmanagers. The Alertmanager implementation expects all alerts to be sent to all Alertmanagers to ensure high availability.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【阿圓實驗】Consul HA 高可用方案【阿圓實驗】Grafana HA高可用方案高可用集群（HA）配置 Alertmanager高可用 Spark集群高可用HA配置 Hadoop 2、配置HDFS HA (高可用) Alertmanager高可用 Rancher Server HA的高可用部署實驗-學習筆記高可用（HA）架構 HBase的高可用(HA)