Docker搭建Prometheus+grafana監控系統

本文轉載自查看原文 2021-06-04 15:22 3404 Docker/ 監控

一、Prometheus簡介

1、簡介

Prometheus是由SoundCloud開發的開源監控報警系統和時序列數據庫(TSDB)。
Prometheus使用Go語言開發，是Google BorgMon監控系統的開源版本。 2016年由Google發起Linux基金會旗下的原生雲基金會(Cloud Native Computing Foundation), 將Prometheus納入其下第二大開源項目。 Prometheus目前在開源社區相當活躍。 
Prometheus和Heapster(Heapster是K8S的一個子項目，用於獲取集群的性能數據。)相比功能更完善、更全面。Prometheus性能也足夠支撐上萬台規模的集群。

2、基本原理

Prometheus的基本原理是通過HTTP協議周期性抓取被監控組件的狀態，任意組件只要提供對應的HTTP接口就可以接入監控。不需要任何SDK或者其他的集成過程。
這樣做非常適合做虛擬化環境監控系統，比如VM、Docker、Kubernetes等。輸出被監控組件信息的HTTP接口被叫做exporter。目前互聯網公司常用的組件大部分
都有exporter可以直接使用，比如Varnish、Haproxy、Nginx、MySQL、Linux系統信息(包括磁盤、內存、CPU、網絡等等)。

3、架構

組件

Prometheus Sever：是Prometheus組件中的核心部分，負責實現對監控數據的獲取，存儲及查詢。
Prometheus Server可以通過靜態配置管理監控目標，也可以配合使用Service Discovery的方式動態管理監控目標，並從這些監控目標中獲取數據。其次Prometheus Sever需要對采集到的數據進行存儲，Prometheus Server本身就是一個實時數據庫，將采集到的監控數據按照時間序列的方式存儲在本地磁盤當中。Prometheus Server對外提供了自定義的PromQL，實現對數據的查詢以及分析。另外Prometheus Server的聯邦集群能力可以使其從其他的Prometheus Server實例中獲取數據。

Exporter：將監控數據采集的端點通過HTTP服務的形式暴露給Prometheus Server，Prometheus Server通過訪問該Exporter提供的Endpoint端點，即可以獲取到需要采集的監控數據。可以將Exporter分為2類：
   直接采集：這一類Exporter直接內置了對Prometheus監控的支持，比如cAdvisor，Kubernetes，Etcd，Gokit等，都直接內置了用於向Prometheus暴露監控數據的端點。
   間接采集：原有監控目標並不直接支持Prometheus，因此需要通過Prometheus提供的Client Library編寫該監控目標的監控采集程序。例如：Mysql Exporter，JMX Exporter，Consul Exporter等。

Service Discovery：服務發現，Prometheus支持多種服務發現機制：文件，DNS，Consul,Kubernetes,OpenStack,EC2等等。基於服務發現的過程並不復雜，通過第三方提供的接口，Prometheus查詢到需要監控的Target列表，然后輪詢這些Target獲取監控數據。

AlertManager：在Prometheus Server中支持基於Prom QL創建告警規則，如果滿足Prom QL定義的規則，則會產生一條告警。在AlertManager從 Prometheus server 端接收到 alerts后，會進行去除重復數據，分組，並路由到對收的接受方式，發出報警。常見的接收方式有：電子郵件，pagerduty，webhook 等。

PushGateway：Prometheus數據采集基於Prometheus Server從Exporter pull數據，因此當網絡環境不允許Prometheus Server和Exporter進行通信時，可以使用PushGateway來進行中轉。通過PushGateway將內部網絡的監控數據主動Push到Gateway中，Prometheus Server采用針對Exporter同樣的方式，將監控數據從PushGateway pull到Prometheus Server。

工作流

1 Prometheus server定期從配置好的jobs或者exporters中拉取metrics，或者接收來自 Pushgateway發送過來的metrics，或者從其它的Prometheus server中拉metrics。
2 Prometheus server在本地存儲收集到的metrics，並運行定義好的alerts.rules，記錄新的時間序列或者向Alert manager推送警報。
3 Alertmanager根據配置文件，對接收到的警報進行處理，發出告警。
4 在圖形界面中，可視化采集數據。

常用的exporter整理

node-exporter: 用來監控運算節點上的宿主機的資源信息，需要部署到所有運算節點
kube-state-metric：prometheus采集k8s資源數據的exporter，能夠采集絕大多數k8s內置資源的相關數據，例如pod、deploy、service等等。同時它也提供自己的數據，主要是資源采集個數和采集發生的異常次數統計
cAdvisor （Container Advisor） ：用於監控正在運行的容器資源使用和性能信息。
    https://github.com/google/cadvisor
Blackbox_exporter：監控業務容器存活性。可以提供 http、dns、tcp、icmp 的監控數據采集

二、前提准備

1、docker環境2台 server：192.168.1.20  client：192.168.1.30
2、nginx服務：192.168.1.10
3、監控服務器 需要安裝4個服務
    Prometheus Server(普羅米修斯監控主服務器 )
    Node Exporter (收集Host硬件和操作系統信息)
    cAdvisor (負責收集Host上運行的容器信息)
    Grafana (展示普羅米修斯監控界面）
4、被監控的只需安裝2個
    Node Exporter (收集Host硬件和操作系統信息)
    cAdvisor (負責收集Host上運行的容器信息)

三、部署node_exporter（server、client都安裝）

docker pull prom/node-exporter

docker run --name=node-exporter -p 9100:9100 -itd prom/node-exporter

訪問：http://192.168.1.20:9100　　　　　　#查看節點信息

　　　http://192.168.1.30:9100

四、安裝prometheus server（server安裝）

mkdir -p /server/docker/prometheus/{server,client}
touch /server/docker/prometheus/server/rules.yml
編輯prometheus.yml文件，添加客戶端信息

vim /server/docker/prometheus/server/prometheus.yml

global:
  scrape_interval:
  external_labels:
    monitor: 'codelab-monitor'
# 這里表示抓取對象的配置
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s  #重寫了全局抓取間隔時間，由15秒重寫成5秒
    static_configs:
      - targets: ['localhost:9090','192.168.1.20:9100']
  - job_name: 'client-node1'
    static_configs:
      - targets: ['192.168.1.30:9100']

docker啟動prometheus：

docker pull prom/prometheus

docker run --name prometheus -p 9090:9090 \
-v /server/docker/prometheus/server/prometheus.yml:/etc/prometheus/prometheus.yml \
-v /server/docker/prometheus/server/rules.yml:/etc/prometheus/rules.yml \
-itd prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--web.enable-lifecycle

注：啟動時加上--web.enable-lifecycle： 啟用遠程熱加載配置文件
 　　　　　　--config.file：啟動時加載配置文件

瀏覽器訪問prometheus：http://192.168.1.20:9090

五、安裝Grafana展示（server安裝）

Grafana是用於可視化大型測量數據的開源程序，它提供了強大和優雅的方式去創建、共享、瀏覽數據。

1、先啟動測試grafana

docker pull grafana/grafana

docker run --name=grafana -p 3000:3000 -itd grafana/grafana

將配置文件復制到宿主機： 
　　docker cp grafana:/etc/grafana/grafana.ini /server/docker/prometheus/grafana/ 
　　docker rm -f grafana 
修改配置文件grafana.ini，配置smtp郵件報警信息（報警會用到）

vim /server/docker/prometheus/grafana/grafana.ini

根據實際添加上面這幾條，host、password可以登錄郵箱查看。

2、啟動正式grafana

docker run -p 3000:3000 --name grafana \
-v /server/docker/prometheus/grafana/grafana.ini:/etc/grafana/grafana.ini \
-v /server/docker/prometheus/grafana/data:/var/lib/grafana \
-e "GF_SECURITY_ADMIN_PASSWORD=grafana123" \
-itd grafana/grafana

注：-e "GF_SECURITY_ADMIN_PASSWORD=grafana123" 是設置grafana登陸頁面的密碼,如不添加這條，默認賬號密碼為admin/admin

訪問：http://192.168.1.20:3000 賬號密碼為：admin/grafana123

3、nginx實現域名訪問grafana

server {
    server_name grafana.aa.com;
    listen 80;
    
    location / {
        proxy_pass http://192.168.1.20:3000;
    }
}

訪問：http://grafana.aa.com

4、添加prometheus數據源

5、添加模板文件：（監控主機信息）

https://www.jianshu.com/p/367d52fe1171         #grafana常用監控模板大全
https://grafana.com/grafana/dashboards/        #grafana官網模板

根據自己需求下載網址中模板，下載到本地后，導入grafana：

也可以根據官網的模板ID號下載：

展示：

六、安裝cAdvisor——監控容器信息（server、client都安裝）

1、docker安裝cAdvisor

docker pull google/cadvisor

docker run -p 8088:8080 --name cadvisor \
-v /:/rootfs:ro \
-v /var/run:/var/run:rw \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-itd google/cadvisor:latest

將ip、端口加入到prometheus.yml 文件，重啟prometheus服務

global:
  scrape_interval:
  external_labels:
    monitor: 'codelab-monitor'
# 這里表示抓取對象的配置
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s  #重寫了全局抓取間隔時間，由15秒重寫成5秒
    static_configs:
      - targets: ['localhost:9090','192.168.1.20:9100','192.168.1.20:8088']
  - job_name: 'client-node1'
    static_configs:
      - targets: ['192.168.1.30:9100','192.168.1.30:8088']

docker restart prometheus     　　#重啟prometheus讓配置生效

2、添加模板，監控docker容器

模板在上文給的倆個地址里，請自行選擇，展示：

七、blackbox_exporter監控端口（server安裝）

1、docker安裝blackbox_exporter

docker pull prom/blackbox-exporter

docker run --name blackbox -p 9115:9115-itd prom/blackbox-exporter

在prometheus.yml文件添加監控配置：（注意：格式不對prometheus會報錯）

#監控主機端口存活狀態
  - job_name: 'prometheus_port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: ['192.168.1.20:8088','192.168.1.20:9100','192.168.1.20:8091','192.168.1.30:8088','192.168.1.30:9100','192.168.1.30:8091']
        labels:
          instance: 'port_status'
          group: 'tcp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.1.20:9115
#監控主機存活狀態
  - job_name: 'node_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: ['192.168.1.20','192.168.1.30']
        labels:
          instance: 'node_status'
          group: 'node'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 192.168.1.20:9115
#監控網站狀態
  - job_name: 'web_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['https://www.baidu.com']
        labels:
          instance: 'web_status'
          group: 'web'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 192.168.1.20:9115

blackbox-exporter在grafana頁面展示需要安裝grafana-piechart-panel插件
下載完后，上傳到/server/docker/prometheus/grafana/data/plugins
docker restart grafana    #重載grafana配置
docker exec -it grafana grafana-cli plugins install grafana-piechart-panel  #grafana安裝插件
docker restart grafana

2、在grafana面板添加模板：（模板ID：9965）

這個儀表盤監控信息不完全，目前還沒找到更合適的。

八、Alert警報（兩種方式）

1、AlertManager告警（server安裝）

Alertmanager是一個獨立的告警模塊，接收Prometheus等客戶端發來的警報。
Prometheus的警報分為兩個部分：
    報警：Prometheus服務器中的警報規則將警報發送到Alertmanager。
    通知：該Alertmanager 管理這些警報， 通過郵件、hipchat等方式發送通知

運行測試alertmanager：

docker pull prom/alertmanager

docker run --name alert -p 9093:9093 -itd prom/alertmanager

docker cp alert:/etc/alertmanager/alertmanager.yml /server/docker/prometheus/alert/

編寫alertmanager配置文件實現郵件告警：vim /server/docker/prometheus/alert/alertmanager.yml

# 全局配置
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '166xxxxxxx@qq.com'
  smtp_auth_username: '166xxxxxxx@qq.com'
  smtp_auth_password: 'bxxxxxxxxxxacdif'
  smtp_require_tls: true
# 定義路由樹信息
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'mail'
#定義警報接收者信息
receivers:
- name: 'mail'
  email_configs:
  - to: '136xxxxxxx@qq.com'
    send_resolved: true

停止測試alertmanager，啟動正式：

docker rm -f alert

docker run -p 9093:9093 --name alert \
-v /server/docker/prometheus/alert/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
-itd prom/alertmanager

編寫報警規則：vim /server/docker/prometheus/server/rules.yml （參考配置）

groups:
    - name: test-rules
      rules:
      - alert: HttpprometheusDown
        expr: up == 0
        for: 1m
        labels:
　　　     team: node
	annotations: 
	  summary: "{{$labels.instance}}: has been down"
	  description: "{{$labels.instance}}: job {{$labels.job}} has been down "
	  value: {{$value}}

參數解釋：
    - alert:  # 告警名稱
    expr: # 告警的判定條件，參考Prometheus高級查詢來設定（是promSQL簡單改寫的，參考上面CPU的,建議先做一個低數值測試,成功后改為需要的告警數值）
    for:  # 滿足告警條件持續時間多久后，才會發送告警
    labels: #標簽項
    team: node
    annotations: # 解析項，詳細解釋告警信息
    {{$labels.instance}}可以顯示故障機ip端口
    {{ $value }}當前的值

在prometheus配置文件中引入AlertManager配置文件：

global:
  scrape_interval:
  external_labels:
    monitor: 'codelab-monitor'
rule_files:
  - rules.yml
# 這里表示抓取對象的配置
scrape_configs:
     #這個配置是表示在這個配置內的時間序例，每一條都會自動添加上這個{job_name:"prometheus"}的標簽 
  - job_name: 'prometheus'
    scrape_interval: 5s #重寫了全局抓取間隔時間，由15秒重寫成5秒
    static_configs:
      - targets: ['localhost:9090','192.168.1.20:9100','192.168.1.20:8088']
  - job_name: 'node1'
    static_configs:
      - targets: ['192.168.1.30:9100','192.168.1.30:8088']
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['192.168.1.20:9093']

docker restart prometheus     　　#重啟prometheus讓配置生效
訪問：http://192.168.1.20:9093/#/alerts  即可查看所監控到的報警信息

2、grafana自帶alert告警（在grafana頁面配置）

設置完成，但是grafana自帶的alert功能有限，太復雜的報警功能無法實現。

原文鏈接：https://www.cnblogs.com/cfzy/p/14750004.html

人生的意義是什么，人生真的有意義嗎？

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus+Grafana 監控 Docker 基於docker 搭建Prometheus+Grafana ubuntu 18 docker 搭建Prometheus+Grafana 部署Prometheus+Grafana監控 prometheus+grafana監控redis 基於Prometheus+Grafana搭建可視化監控服務 (一) Prometheus監控 Prometheus+Grafana監控宿主機 prometheus+grafana 監控生產環境機器的系統信息、redis、mongodb以及jmx 使用prometheus+grafana監控MySQL監控Oracle 使用Prometheus+Grafana進行Apache Hadoop集群監控