Prometheus+Grafana監控

本文轉載自查看原文 2019-12-09 11:20 2973 運維知識庫/ Linux/ 監控/ prometheus/ Linux System Manager

什么是Prometheus?

Prometheus是由SoundCloud開發的開源監控報警系統和時序列數據庫(TSDB)。Prometheus使用Go語言開發，是Google BorgMon監控系統的開源版本。
2016年由Google發起Linux基金會旗下的原生雲基金會(Cloud Native Computing Foundation), 將Prometheus納入其下第二大開源項目。
Prometheus目前在開源社區相當活躍。
Prometheus和Heapster(Heapster是K8S的一個子項目，用於獲取集群的性能數據。)相比功能更完善、更全面。Prometheus性能也足夠支撐上萬台規模的集群。

Prometheus的特點

多維度數據模型。
靈活的查詢語言。
不依賴分布式存儲，單個服務器節點是自主的。
通過基於HTTP的pull方式采集時序數據。
可以通過中間網關進行時序列數據推送。
通過服務發現或者靜態配置來發現目標服務對象。
支持多種多樣的圖表和界面展示，比如Grafana等。

Prometheus監控基本原理

Prometheus的基本原理是通過HTTP協議周期性抓取被監控組件的狀態，任意組件只要提供對應的HTTP接口就可以接入監控。不需要任何SDK或者其他的集成過程。這樣做非常適合做虛擬化環境監控系統，比如VM、Docker、Kubernetes等。輸出被監控組件信息的HTTP接口被叫做exporter 。目前互聯網公司常用的組件大部分都有exporter可以直接使用，比如Varnish、Haproxy、Nginx、MySQL、Linux系統信息(包括磁盤、內存、CPU、網絡等等)。

Prometheus服務過程

Prometheus Daemon 負責定時去目標上抓取metrics(指標)數據，每個抓取目標需要暴露一個http服務的接口給它定時抓取。Prometheus支持通過配置文件、文本文件、Zookeeper、Consul、DNS SRV Lookup等方式指定抓取目標。Prometheus采用PULL的方式進行監控，即服務器可以直接通過目標PULL數據或者間接地通過中間網關來Push數據。
Prometheus在本地存儲抓取的所有數據，並通過一定規則進行清理和整理數據，並把得到的結果存儲到新的時間序列中。
Prometheus通過PromQL和其他API可視化地展示收集的數據。Prometheus支持很多方式的圖表可視化，例如Grafana、自帶的Promdash以及自身提供的模版引擎等等。Prometheus還提供HTTP API的查詢方式，自定義所需要的輸出。
PushGateway支持Client主動推送metrics到PushGateway，而Prometheus只是定時去Gateway上抓取數據。
Alertmanager是獨立於Prometheus的一個組件，可以支持Prometheus的查詢語句，提供十分靈活的報警方式。

Prometheus 三大套件

Server 主要負責數據采集和存儲，提供PromQL查詢語言的支持。
Alertmanager 警告管理器，用來進行報警。
Push Gateway 支持臨時性Job主動推送指標的中間網關。

1. 安裝 Prometheus Server

1.1 運行用戶創建

groupadd prometheus
useradd -g prometheus -m -d /opt/prometheus/ -s /sbin/nologin prometheus

1.2 prometheus server安裝

wget http://10.200.77.3:90/Monitor/prometheus/prometheus-2.14.0.linux-amd64.tar.gz
tar xzf prometheus-2.14.0.linux-amd64.tar.gz -C /opt/
cd /opt/prometheus-2.14.0.linux-amd64

1.3 prometheus配置語法校驗

建議每次修改prometheus配置之后, 都進行語法校驗, 以免導致 prometheus server無法啟動.

./promtool check config prometheus.yml

1.4 啟動Prometheus

此時采用默認配置啟動 prometheus server 看下界面, 稍后介紹如何監控Linux 服務器.

./prometheus --config.file=prometheus.yml

1.5 通過瀏覽器訪問prometheus

發現 target 中只有 prometheus server, 因為我們還沒有加入其他監控, 下面進行介紹, 后續博文中還將陸續介紹如何監控 redis, RabbitMQ, Kafka, nginx, java等常見服務.

prometheus默認配置:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    scrape_interval: 10s
    static_configs:
    - targets: ['localhost:9090']

1.6 設置prometheus系統服務,並配置開機啟動

touch /usr/lib/systemd/system/prometheus.service
chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service
vim /usr/lib/systemd/system/prometheus.service

將如下配置寫入prometheus.servie

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
# --storage.tsdb.path是可選項，默認數據目錄在運行目錄的./dada目錄中
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/opt/prometheus/data --storage.tsdb.retention=60d
Restart=on-failure

[Install]
WantedBy=multi-user.target

Prometheus啟動參數說明

--config.file -- 指明prometheus的配置文件路徑
--web.enable-lifecycle -- 指明prometheus配置更改后可以進行熱加載
--storage.tsdb.path -- 指明監控數據存儲路徑
--storage.tsdb.retention --指明數據保留時間

設置開機啟動

systemctl daemon-reload
systemctl enable prometheus.service
systemctl status prometheus.service
systemctl restart prometheus.service

說明: prometheus在2.0之后默認的熱加載配置沒有開啟, 配置修改后, 需要重啟prometheus server才能生效, 這對於生產環境的監控是不可容忍的, 所以我們需要開啟prometheus server的配置熱加載功能.

在啟動prometheus時加上參數 web.enable-lifecycle , 可以啟用配置的熱加載, 配置修改后, 熱加載配置:

curl -X POST  http://localhost:9090/-/reload

2. Prometheus 配置監控其他Linux主機

2.1 node_exporter安裝配置

# 運行用戶添加

groupadd prometheus
useradd -g prometheus -m -d /usr/local/node_exporter/ -s /sbin/nologin prometheus

# 下載node_server
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz 

# 解壓到指定目錄並刪除下載文件 
tar -zxf node_exporter-0.18.1.linux-amd64.tar.gz 
mv node_exporter-0.18.1.linux-amd64 /usr/local/ 
ln -sv /usr/local/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter 
rm -f node_exporter-0.18.1.linux-amd64.tar.gz

# 系統服務配置 node_exporter 
touch /usr/lib/systemd/system/node_exporter.service 
chown prometheus:prometheus /usr/lib/systemd/system/node_exporter.service 
chown -R prometheus:prometheus /usr/local/node_exporter* 
vim /usr/lib/systemd/system/node_exporter.service

在node_exporter.service中加入如下代碼:

[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

啟動 node_exporter 服務並設置開機啟動

systemctl daemon-reload
systemctl enable node_exporter.service
systemctl start node_exporter.service
systemctl status node_exporter.service
systemctl restart node_exporter.service
systemctl start node_exporter.service
systemctl stop node_exporter.service

node_exporter啟動成功后, 你就可以通過如下api看到你的監控數據了(將下面的node_exporter_server_ip替換成你的node_exporter的IP地址, 放到瀏覽器中訪問就可以了 ).

http://node_exporter_server_ip:9100/metrics

為了更好的展示, 接下來我們將這個api 配置到 prometheus server中, 並通過grafana進行展示.

將 node_exporter 加入 prometheus.yml配置中

  - job_name: 'Linux'
    file_sd_configs:
    - files: ['/opt/prometheus/sd_cfg/Linux.yml']
      refresh_interval: 5s

並在文件/opt/prometheus/sd_cfg/Linux.yml中寫入如下內容

- targets: ['IP地址:9100']
  labels:
    name: Linux-node1[這里建議給每個主機打個有意義的標簽,方便識別.]

如果你按照上面的方式配置了, 但是使用工具 promtool檢測prometheus配置時,沒有通過, 那肯定是你寫的語法有問題, 不符合yml格式. 請仔細檢查下. 如有疑問, 可以在下方評論區留言.

這樣做的好處是, 方便以后配置監控自動化, 規范化, 將每一類的監控放到自己的配置文件中, 方便維護.

當然, 如果你的服務器少, 要監控的組件少的話, 你也可以將配置都寫入prometheus的主配置文件prometheus.yml中, 如:.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    scrape_interval: 10s
    static_configs:
    - targets: ['localhost:9090']

 - job_name: 'Linux' static_configs:
 targets: ['http://10.199.111.110:9100']
 labels: group: 'client-node-exporter'

重載prometheus配置

curl -X POST  http://localhost:9090/-/reload

3 數據展示Grafana安裝配置

下載地址: https://grafana.com/grafana/download

wget https://dl.grafana.com/oss/release/grafana-6.5.1-1.x86_64.rpm
sudo yum localinstall grafana-6.5.1-1.x86_64.rpm

granafa默認端口為3000，可以在瀏覽器中輸入http://localhost:3000/
granafa首次登錄賬戶名和密碼admin/admin，可以修改
配置數據源Data sources->Add data source -> Prometheus，輸入prometheus數據源的信息，主要是輸入name和url

添加 Dashboard -> New Dashboard -> Import Dashboard -> 輸入11074，導入Linux監控模板. 並配置數據源為Prometheus，即上一步中的name
配置完保存后即可看到逼格非常高的系統主機節點監控信息，包括系統運行時間, 內存和CPU的配置, CPU、內存、磁盤、網絡流量等信息, 以及磁盤IO、CPU溫度等信息。

參考資料:

官網地址：https://prometheus.io/
GitHub: https://github.com/prometheus
官方文檔中文版: https://github.com/Alrights/prometheus
官方監控agent列表:https://prometheus.io/docs/instrumenting/exporters/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus+Grafana監控部署Prometheus+Grafana監控 prometheus+grafana監控redis Prometheus+Grafana 監控 Docker prometheus+grafana監控mysql prometheus+grafana監控clickhouse 監控實戰Prometheus+Grafana Prometheus+grafana監控linux Prometheus+grafana監控mysql Prometheus+Grafana監控MySQL