Prometheus概述
Prometheus(普罗米修斯)是一套开源的监控&报警&时间序列数据库的组合,起始是由SoundCloud公司开发的。随着发展,越来越多公司和组织接受采用Prometheus,社会也十分活跃,他们便将它独立成开源项目,并且有公司来运作。Google SRE的书内也曾提到跟他们BorgMon监控系统相似的实现是Prometheus。现在最常见的Kubernetes容器管理系统中,通常会搭配Prometheus进行监控。
https://prometheus.io
https://github.com/prometheus
Prometheus 特点:
• 多维数据模型:由度量名称和键值对标识的时间序列数据
• PromSQL:一种灵活的查询语言,可以利用多维数据完成复杂的查询
• 不依赖分布式存储,单个服务器节点可直接工作
• 基于HTTP的pull方式采集时间序列数据
• 推送时间序列数据通过PushGateway组件支持
• 通过服务发现或静态配置发现目标
• 多种图形模式及仪表盘支持(grafana)
Prometheus 组成及架构:
• Prometheus Server:收集指标和存储时间序列数据,并提供查询接口
• ClientLibrary:客户端库
• Push Gateway:短期存储指标数据。主要用于临时性的任务
• Exporters:采集已有的第三方服务监控指标并暴露metrics
• Alertmanager:告警
• Web UI:简单的Web控制
Prometheus 部署
二进制部署:https://prometheus.io/docs/prometheus/latest/getting_started/
Docker部署:https://prometheus.io/docs/prometheus/latest/installation/
以下部署均在两台机器上:
主机 |
IP地址 |
软件 |
master |
192.168.1.128 |
prometheus+grafananode_exporter |
node |
192.168.1.129 |
node_exportermysql |
• 安装Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.17.1/prometheus-2.17.1.linux-amd64.tar.gz tar -xf prometheus-2.17.1.linux-amd64.tar.gz -C /usr/local/ cd /usr/local/ && mv prometheus-2.17.1.linux-amd64 prometheus && cd prometheus/
cat /usr/lib/systemd/system/prometheus.service [Unit] Description=https://prometheus.io/ [Service] Restart=on-failure ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml [Install] WantedBy=multi-user.target
systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus
netstat -lntp | grep prometheus
• 安装node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz tar -xf node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local cd /usr/local && mv node_exporter-0.18.1.linux-amd64 node_exporter cat /usr/lib/systemd/system/node_exporter.service [Unit] Description=Open node_exporter server daemon [Service] Restart=on-failure ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|sshd|nginx).service [Install] WantedBy=multi-user.target systemctl daemon-reload systemctl start node_exporter
systemctl enable node_exporter
ps -ef | grep node_exporter
• 安装mysqld_exporter
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz tar -xf mysqld_exporter-0.12.1.linux-amd64.tar.gz -C /usr/local/ cd /usr/local/ && mv mysqld_exporter-0.12.1.linux-amd64/ mysqld_exporter && cd mysqld_exporter/ mysqld_exporter需要连接到Mysql,所以需要Mysql的权限,我们先为它创建用户并赋予所需的权限. mysql> CREATE USER 'exporter'@'localhost' IDENTIFIED BY '123456'; mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; 创建.my.cnf文件 vim /usr/local/mysqld_exporter/.my.cnf [client] user=exporter password=123456 启动服务 cd /usr/local/mysqld_exporter ./mysqld_exporter -config.my-cnf=.my.cnf & #启动mysqld_exporter并后台运行
• 修改/添加配置文件prometheus.yml
vim prometheus.yml
scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. file_sd_configs: #动态发现 - targets: ['localhost:9090'] labels: instance: prometheus
refresh_interval: 5s #5秒加载一次
- job_name: 'master'
file_sd_configs:
- targets: ['192.168.1.128:9100'] #node_exporter安装在本地,如安装在其他机器使用IP即可,9100是node_exporter的端口号
labels:
instance: master_pro #名称自定义,最好具有代表性
refresh_interval: 5s
- job_name: 'node' file_sd_configs: - targets: ['192.168.1.129:9100'] #node_exporter安装在本地,如安装在其他机器使用IP即可,9100是node_exporter的端口号 labels: instance: node1 #名称自定义,最好具有代表性 refresh_interval: 5s
- job_name: 'mysql'
static_configs: #静态发现
- targets: ['192.168.1.129:9104']
labels:
instance: db1
•检查prometheus.yml配置是否有效
[root@k8s-master prometheus]# pwd /usr/local/prometheus
[root@k8s-master prometheus]# ./promtool check config prometheus.yml Checking prometheus.yml SUCCESS: 0 rule files found
[root@k8s-master prometheus]# ps -ef | grep prometheus
root 8740 1 0 4月09 ? 00:03:34 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
root 12856 12012 0 16:45 pts/0 00:00:00 grep --color=auto prometheus
[root@k8s-master prometheus]# kill -hup 8740
•检查prometheus.yml配置是否有效
安装完成后,查看一下prometheus是否识别到相应监控,访问
http://localhost:9090/targets Status->Targets页面,如果可以看到Target的状态已经变成UP,就是成功
Grafana部署
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-6.7.2-1.x86_64.rpm yum install grafana-enterprise-6.7.2-1.x86_64.rpm systemctl start grafana-server.service systemctl enable grafana-server.service
netstat -lntp | grep grafana-server
•添加data sources,点击添加选择prometheus即可
•添加配置信息,写入prometheus的URL,点击“Save&Test”提示绿色成功
配置grafana-node_exporter仪表版
•导入Prometheus仪表盘,import-dashboards
https://grafana.com/grafana/download
https://grafana.com/dashboards/8919
•进入仪表板查看
•导入grafana-mysqld_exporter仪表版,同上述方式一样。
mysql_exporter:用于收集MySQL性能信息。
https://grafana.com/dashboards/7362
https://github.com/prometheus/mysqld_exporter
Altermanager监控告警
地址1:https://prometheus.io/download/
地址2:https://github.com/prometheus/alertmanager/releases
实现prometheus的告警,需要通过altermanager这个组件;在prometheus服务端写告警规则,在altermanager组件配置邮箱
Alertmanager与Prometheus是相互分离的两个组件。Prometheus服务器根据报警规则将警报发送给Alertmanager,然后Alertmanager将silencing、inhibition、aggregation等消息通过电子邮件、dingtalk和HipChat发送通知。
Alertmanager处理由例如Prometheus服务器等客户端发来的警报。它负责删除重复数据、分组,并将警报通过路由发送到正确的接收器,比如电子邮件、Slack、dingtalk等。Alertmanager还支持groups,silencing和警报抑制的机制。
-
安装altermanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz # 下载altermanager tar xvf alertmanager-0.20.0.linux-amd64.tar.gz -C /usr/local/ #解压至指定文件夹 cd /usr/local/ && mv alertmanager-0.20.0.linux-amd64 alertmanager cd alertmanager/
-
编辑alertmanager.yml配置文件
smtp_auth_password填写的是第三方登录 邮箱的授权码,非 邮箱 账户登录密码。
-
修改prometheus配置文件
vim /usr/local/prometheus/prometheus.yml
-
启动alertmanager
./amtool check-config alertmanager.yml #检查配置是否生效 ./alertmanager --config.file=alertmanager.yml & #根据配置文件启动,后台运行
重启prometheus
systemctl start prometheus
-
访问http://loalhost:9090/alerts ,即可查看规则
-
查看报警邮件
更多更参考:
https://blog.csdn.net/liukuan73/article/details/78881008 、
https://blog.csdn.net/aixiaoyang168/article/details/98474494