普羅米修斯Prometheus監控安裝 架構: 服務端:192.168.0.204 客戶端:192.168.0.206 環境准備:所有節點安裝go 語言環境 rz go1.12.linux-amd64.tar.gz tar -C /usr/local -xzf go1.12.linux-amd64.tar.gz cat >> /etc/profile<<EOF export PATH=$PATH:/usr/local/go/bin EOF source /etc/profile go version 1、server端部署 1.1、 軟件包准備 cd /usr/local/src wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz #服務端、客戶端都部署 wget https://github.com/prometheus/prometheus/releases/download/v2.7.1/prometheus-2.7.1.linux-amd64.tar.gz #服務端部署 tar xf prometheus-2.7.1.linux-amd64.tar.gz tar xf node_exporter-0.17.0.linux-amd64.tar.gz 1.2 啟動node_exporter # 驗證以Prometheus本身數據為例,在Web中查詢指定表達式及圖形化顯示查詢結果 。 mv prometheus-2.7.1.linux-amd64 /usr/local mv node_exporter-0.17.0.linux-amd64 /usr/local/ ln -s /usr/local/prometheus-2.7.1.linux-amd64/ /usr/local/prometheus ln -s /usr/local/node_exporter-0.17.0.linux-amd64/ /usr/local/node_exporter cd /usr/local/node_exporter ./node_exporter & netstat -lntp|grep 9100 http://192.168.0.204:9100/metrics 1.3 啟動Prometheus cd /usr/local/prometheus vi prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s external_labels: monitor: 'codelab-monitor' rule_files: - 'prometheus_rules.yml' #需定義 scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] labels: alias: prometheus - job_name: 'linux1' static_configs: - targets: ['192.168.0.204:9100'] #安裝node_node_exporter的節點ip地址 labels: alias: linux-node1 - job_name: 'linux2' static_configs: - targets: ['192.168.0.206:9100'] #安裝node_node_exporter的節點ip地址 labels: alias: linux-node2 ############################################################## #添加alert規則 cat>>prometheus_rules.yml<<EOF groups: - name: example rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." # Alert for any instance that has a median request latency >1s. - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on {{ $labels.instance }}" description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)" EOF 啟動Prometheus cd /usr/local/prometheus ./prometheus 瀏覽器訪問 http://192.168.0.204:9090/targets 2、客戶端部署 2.1 部署node_exporter 使用Prometheus Web來驗證客戶端Node Exporter的數據的采集。內存、CPU負載,磁盤等性能監控 wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz #客戶端部署,可針對硬件層次進行監控 tar xf node_exporter-0.17.0.linux-amd64.tar.gz mv node_exporter-0.17.0.linux-amd64 /usr/local/ ln -s /usr/local/node_exporter-0.17.0.linux-amd64/ /usr/local/node_exporter cd /usr/local/node_exporter ./node_exporter & netstat -lntp|grep 9100 http://192.168.0.206:9100/metrics #自定義Metrics 攔截器/過濾器:用於統計所有應用請求的情況 自定義Collector: 可以用於統計應用業務能力相關的監控情況 2.3、對mysql進行監控(沒做) https://www.hi-linux.com/posts/27014.html #可參考 cd /usr/local/src/ wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.10.0/mysqld_exporter-0.10.0.linux-amd64.tar.gz #部署在mysql服務器上,node_exporter也部署(參考前面) tar xf mysqld_exporter-0.10.0.linux-amd64.tar.gz mv mysqld_exporter-0.10.0.linux-amd64 /usr/local/ ln -s /usr/local/mysqld_exporter-0.10.0.linux-amd64/ /usr/local/mysqld_exporter 加載mysqld_exporter 添加配置文件(需要MySQL授權用戶) mysqld_exporter需要連接到MySQL,需要授權 mysql> grant replication client, process on *.* to prometheus@"localhost" identified by "123456"; mysql> grant select on performance_schema.* to prometheus@"localhost"; cd /usr/local/mysqld_exporter/ vim .my.cnf [client] user=prometheus password=123456 nohup ./mysqld_exporter --config.my-cnf=.my.cnf & #啟動 2.4、對nginx進行監控(沒做) cd /usr/local git clone git://github.com/vozlt/nginx-module-vts.git #在nginx主機上操作 ./configure --prefix=/usr/local/nginx-1.12.2 --user=nginx --group=nginx --with-http_stub_status_module --with-http_ssl_module --add-module=/usr/local/nginx-module-vts make nginx -s stop \cp ./objs/nginx /usr/local/nginx/sbin/ vim nginx.conf http { ..... ###Prometheus配置## vhost_traffic_status_zone; vhost_traffic_status_filter_by_host on; #打開vhost過濾 ###Prometheus配置## ..... server { location /status { #vhost_traffic_status off; vhost_traffic_status_display; vhost_traffic_status_display_format html; } } } ######################################################################################################################## wget -c https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.9.1/nginx-vts-exporter-0.9.1.linux-amd64.tar.gz tar xf nginx-vts-exporter-0.9.1.linux-amd64.tar.gz ./nginx-vts-exporter -nginx.scrape_timeout 10 -nginx.scrape_uri http://10.10.16.107/status/format/json & #啟動nginx Vhost Traffic http://10.10.16.107/status #訪問nginx主機各節點狀態 3、Alertmanager報警實現(安裝在服務端) 3.1 下載alertmanager安裝包 cd /usr/local wget https://github.com/prometheus/alertmanager/releases/download/v0.16.0/alertmanager-0.16.1.linux-amd64.tar.gz tar -axvf alertmanager-0.16.1.linux-amd64.tar.gz 3.2 配置alert默認啟動yml文件 mkdir -p /usr/local/alertmanager-0.16.1.linux-amd64/template/ cd /usr/local/alertmanager-0.16.1.linux-amd64/ cat>> /usr/local/alertmanager-0.16.1.linux-amd64/simple.yml<<EOF global: smtp_smarthost: 'smtp.163.com:25' smtp_from: '15613691030@163.com' smtp_auth_username: '15613691030' smtp_auth_password: 'Shaochuan@5tgb' smtp_require_tls: false templates: - '/usr/local/alertmanager-0.16.1.linux-amd64/template/*.html' route: group_by: ['alertname', 'cluster', 'service'] group_wait: 30s group_interval: 5m repeat_interval: 10m receiver: default-receiver receivers: - name: 'default-receiver' email_configs: - to: '15613691030@163.com' html: '{{ template "alert.html" . }}' headers: { Subject: "[WARN] 報警郵件test" } EOF 3.3 配置報警發送文件樣式模板 cat>> /usr/local/alertmanager-0.16.1.linux-amd64/template/alert.html<<EOF #template需要創建 {{ define "alert.html" }} <table> <tr><td>報警名</td><td>開始時間</td></tr> {{ range 10 := .Alerts }} <tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ $alert.StartsAt }}</td></tr> {{ end }} </table> {{ end }} EOF 3.4 配置alert.html cat>> /usr/local/alertmanager-0.16.1.linux-amd64/alert.html<<EOF {{ define "alert.html" }} <table> <tr><td>報警名</td><td>開始時間</td></tr> {{ range 10 := .Alerts }} <tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ $alert.StartsAt }}</td></tr> {{ end }} </table> {{ end }} EOF 3.5 啟動alertmanager服務 ./alertmanager --config.file=simple.yml #啟動alertmanager 4、Grafana安裝、啟動(安裝在服務端) wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.3-1.x86_64.rpm yum install -y urw-fonts rpm -i grafana-5.2.3-1.x86_64.rpm /sbin/chkconfig --add grafana-server systemctl start grafana-server.service 瀏覽器訪問: http://192.168.0.204:3000(默認賬號密碼admin/admin) 進去后會要求修改密碼,然后點擊add datasource,選中 Prometheus 2.0 Stats后,就可以呈現出監控面板6、Prometheus監控總結 6.1 做好ntp時間同步 prometheus對系統時間的准確性要求很高,必須保證本機時間與監控主機實時同步: 參照:
https://blog.csdn.net/csolo/article/details/82460539
http://www.cnblogs.com/qianjingchen/articles/9578341.html