Prometheus + Grafana 監控系統搭

本文轉載自查看原文 2020-02-13 16:13 661 監控

本文主要介紹基於Prometheus + Grafana 監控Linux服務器。

一、Prometheus 概述（略）

與其他監控系統對比

1 Prometheus vs. Zabbix

Zabbix 使用的是 C 和 PHP, Prometheus 使用 Golang, 整體而言 Prometheus 運行速度更快一點。
Zabbix 屬於傳統主機監控，主要用於物理主機、交換機、網絡等監控，Prometheus 不僅適用主機監控，還適用於 Cloud、SaaS、Openstack、Container 監控。
Zabbix 在傳統主機監控方面，有更豐富的插件。
Zabbix 可以在 WebGui 中配置很多事情，Prometheus 需要手動修改文件配置。、

2 Prometheus vs. Nagios

Nagios 數據不支持自定義 Labels, 不支持查詢，告警也不支持去噪、分組, 沒有數據存儲，如果想查詢歷史狀態，需要安裝插件。
Nagios 是上世紀 90 年代的監控系統，比較適合小集群或靜態系統的監控Nagios 太古老，很多特性都沒有，Prometheus 要優秀很多。

3 Prometheus vs Sensu

Sensu 廣義上講是 Nagios 的升級版本，它解決了很多 Nagios 的問題，如果你對 Nagios 很熟悉，使用 Sensu 是個不錯的選擇。
Sensu 依賴 RabbitMQ 和 Redis，數據存儲上擴展性更好。

4 Prometheus vs InfluxDB

InfluxDB 是一個開源的時序數據庫，主要用於存儲數據，如果想搭建監控告警系統，需要依賴其他系統。
InfluxDB 在存儲水平擴展以及高可用方面做的更好, 畢竟核心是數據庫。

二、安裝 Prometheus server

Prometheus 可以支持多種安裝方式，包括 Docker、Ansible、Chef、Puppet、Saltstack 等。下面介紹最簡單的兩種方式，一種是直接使用編譯好的可執行文件，開箱即用，另一種是使用 Docker 鏡像

2.1 開箱即用

首先從官網的下載頁面獲取 Prometheus 的最新版本和下載地址，目前最新版本是 2.4.3（2018年10月），執行下面的命令下載並解壓：

1 2	`$ wget https://github.com/prometheus/prometheus/releases/download/v2.4.3/prometheus-2.4.3.linux-amd64.tar.gz` `$ tar xvfz prometheus-2.4.3.linux-amd64.tar.gz`

然后切換到解壓目錄，檢查 Prometheus 版本：

$ cd prometheus-2.4.3.linux-amd64

$ ./prometheus --version

prometheus, version 2.4.3 (branch: HEAD, revision: 167a4b4e73a8eca8df648d2d2043e21bdb9a7449)

build user: root@1e42b46043e9

build date: 20181004-08:42:02

go version: go1.11.1

運行 Prometheus server：

1	`$ ./prometheus --config.file=prometheus.yml`

2.2 使用 Docker 鏡像

使用 Docker 安裝 Prometheus 更簡單，運行下面的命令即可：

1	`$ sudo docker run -d -p 9090:9090 prom/prometheus`

一般情況下，我們還會指定配置文件的位置：

$ sudo docker run -d -p 9090:9090 \

-v ~/docker/prometheus/:/etc/prometheus/ \

prom/prometheus

我們把配置文件放在本地 ~/docker/prometheus/prometheus.yml，這樣可以方便編輯和查看，通過 -v 參數將本地的配置文件掛載到 /etc/prometheus/ 位置，這是 prometheus 在容器中默認加載的配置文件位置。如果我們不確定默認的配置文件在哪，可以先執行上面的不帶 -v 參數的命令，然后通過 docker inspect 命名看看容器在運行時默認的參數有哪些（下面的 Args 參數）：

$ sudo docker inspect 0c

[...]

"Id": "0c4c2d0eed938395bcecf1e8bb4b6b87091fc4e6385ce5b404b6bb7419010f46",

"Created": "2018-10-15T22:27:34.56050369Z",

"Path": "/bin/prometheus",

"Args": [

"--config.file=/etc/prometheus/prometheus.yml",

"--storage.tsdb.path=/prometheus",

"--web.console.libraries=/usr/share/prometheus/console_libraries",

"--web.console.templates=/usr/share/prometheus/consoles"

],

[...]

2.3 配置 Prometheus

正如上面兩節看到的，Prometheus 有一個配置文件，通過參數 --config.file 來指定，配置文件格式為 YAML。我們可以打開默認的配置文件 prometheus.yml 看下里面的內容：

/etc/prometheus $ cat prometheus.yml

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:

- targets: ['localhost:9090']

Prometheus 默認的配置文件分為四大塊：

global 塊：Prometheus 的全局配置，比如 scrape_interval 表示 Prometheus 多久抓取一次數據，evaluation_interval 表示多久檢測一次告警規則；
alerting 塊：關於 Alertmanager 的配置，這個我們后面再看；
rule_files 塊：告警規則，這個我們后面再看；
scrape_config 塊：這里定義了 Prometheus 要抓取的目標，我們可以看到默認已經配置了一個名稱為 prometheus 的 job，這是因為 Prometheus 在啟動的時候也會通過 HTTP 接口暴露自身的指標數據，這就相當於 Prometheus 自己監控自己，雖然這在真正使用 Prometheus 時沒啥用處，但是我們可以通過這個例子來學習如何使用 Prometheus；可以訪問 http://localhost:9090/metrics 查看 Prometheus 暴露了哪些指標；

三、學習 PromQL

通過上面的步驟安裝好 Prometheus 之后，我們現在可以開始體驗 Prometheus 了。Prometheus 提供了可視化的 Web UI 方便我們操作，直接訪問 http://localhost:9090/ 即可，它默認會跳轉到 Graph 頁面：

第一次訪問這個頁面可能會不知所措，我們可以先看看其他菜單下的內容，比如：Alerts 展示了定義的所有告警規則，Status 可以查看各種 Prometheus 的狀態信息，有 Runtime & Build Information、Command-Line Flags、Configuration、Rules、Targets、Service Discovery 等等。

實際上 Graph 頁面才是 Prometheus 最強大的功能，在這里我們可以使用 Prometheus 提供的一種特殊表達式來查詢監控數據，這個表達式被稱為 PromQL（Prometheus Query Language）。通過 PromQL 不僅可以在 Graph 頁面查詢數據，而且還可以通過 Prometheus 提供的 HTTP API 來查詢。查詢的監控數據有列表和曲線圖兩種展現形式（對應上圖中 Console 和 Graph 這兩個標簽）。

我們上面說過，Prometheus 自身也暴露了很多的監控指標，也可以在 Graph 頁面查詢，展開 Execute 按鈕旁邊的下拉框，可以看到很多指標名稱，我們隨便選一個，譬如：promhttp_metric_handler_requests_total，這個指標表示 /metrics 頁面的訪問次數，Prometheus 就是通過這個頁面來抓取自身的監控數據的。在 Console 標簽中查詢結果如下：

上面在介紹 Prometheus 的配置文件時，可以看到 scrape_interval 參數是 15s，也就是說 Prometheus 每 15s 訪問一次 /metrics 頁面，所以我們過 15s 刷新下頁面，可以看到指標值會自增。在 Graph 標簽中可以看得更明顯：

3.1 數據模型

要學習 PromQL，首先我們需要了解下 Prometheus 的數據模型，一條 Prometheus 數據由一個指標名稱（metric）和 N 個標簽（label，N >= 0）組成的，比如下面這個例子：

1	`promhttp_metric_handler_requests_total{code="200",instance="192.168.0.107:9090",job="prometheus"} 106`

這條數據的指標名稱為 promhttp_metric_handler_requests_total，並且包含三個標簽 code、instance 和 job，這條記錄的值為 106。上面說過，Prometheus 是一個時序數據庫，相同指標相同標簽的數據構成一條時間序列。如果以傳統數據庫的概念來理解時序數據庫，可以把指標名當作表名，標簽是字段，timestamp 是主鍵，還有一個 float64 類型的字段表示值（Prometheus 里面所有值都是按 float64 存儲）。

這種數據模型和 OpenTSDB 的數據模型是比較類似的，詳細的信息可以參考官網文檔 Data model。另外，關於指標和標簽的命名，官網有一些指導性的建議，可以參考 Metric and label naming 。

雖然 Prometheus 里存儲的數據都是 float64 的一個數值，但如果我們按類型來分，可以把 Prometheus 的數據分成四大類：

Counter
Gauge
Histogram
Summary

Counter 用於計數，例如：請求次數、任務完成數、錯誤發生次數，這個值會一直增加，不會減少。Gauge 就是一般的數值，可大可小，例如：溫度變化、內存使用變化。Histogram 是直方圖，或稱為柱狀圖，常用於跟蹤事件發生的規模，例如：請求耗時、響應大小。它特別之處是可以對記錄的內容進行分組，提供 count 和 sum 的功能。Summary 和 Histogram 十分相似，也用於跟蹤事件發生的規模，不同之處是，它提供了一個 quantiles 的功能，可以按百分比划分跟蹤的結果。例如：quantile 取值 0.95，表示取采樣值里面的 95% 數據。更多信息可以參考官網文檔 Metric types，Summary 和 Histogram 的概念比較容易混淆，屬於比較高階的指標類型，可以參考 Histograms and summaries 這里的說明。

這四種類型的數據只在指標的提供方作區分，也就是上面說的 Exporter，如果你需要編寫自己的 Exporter 或者在現有系統中暴露供 Prometheus 抓取的指標，你可以使用 Prometheus client libraries，這個時候你就需要考慮不同指標的數據類型了。如果你不用自己實現，而是直接使用一些現成的 Exporter，然后在 Prometheus 里查查相關的指標數據，那么可以不用太關注這塊，不過理解 Prometheus 的數據類型，對寫出正確合理的 PromQL 也是有幫助的。

四、安裝 Grafana

下載：wget https://dl.grafana.com/oss/release/grafana-6.3.3-1.x86_64.rpm

安裝：sudo yum localinstall grafana-6.3.3-1.x86_64.rpm -y

啟動：systemctl enable grafana-server.service

systemctl start grafana-server.service

# web頁面3000 登錄信息：admin/admin

# 安裝插件

grafana-cli plugins install grafana-piechart-panel

systemctl restart grafana-server

安裝完成之后，我們訪問 http://localhost:3000/ 進入 Grafana 的登陸頁面，輸入默認的用戶名和密碼（admin/admin）即可。

要使用 Grafana，第一步當然是要配置數據源，告訴 Grafana 從哪里取數據，我們點擊 Add data source 進入數據源的配置頁面：

我們在這里依次填上：

Name: prometheus
Type: Prometheus
URL: http://localhost:9090
Access: Browser

要注意的是，這里的 Access 指的是 Grafana 訪問數據源的方式，有 Browser 和 Proxy 兩種方式。Browser 方式表示當用戶訪問 Grafana 面板時，瀏覽器直接通過 URL 訪問數據源的；而 Proxy 方式表示瀏覽器先訪問 Grafana 的某個代理接口（接口地址是 /api/datasources/proxy/），由 Grafana 的服務端來訪問數據源的 URL，如果數據源是部署在內網，用戶通過瀏覽器無法直接訪問時，這種方式非常有用。

配置好數據源，Grafana 會默認提供幾個已經配置好的面板供你使用，如下圖所示，默認提供了三個面板：Prometheus Stats、Prometheus 2.0 Stats 和 Grafana metrics。點擊 Import 就可以導入並使用該面板。

我們導入 Prometheus 2.0 Stats 這個面板，可以看到下面這樣的監控面板。如果你的公司有條件，可以申請個大顯示器掛在牆上，將這個面板投影在大屏上，實時觀察線上系統的狀態，可以說是非常 cool 的。

五、使用 Exporter 收集指標

目前為止，我們看到的都還只是一些沒有實際用途的指標，如果我們要在我們的生產環境真正使用 Prometheus，往往需要關注各種各樣的指標，譬如服務器的 CPU負載、內存占用量、IO開銷、入網和出網流量等等。正如上面所說，Prometheus 是使用 Pull 的方式來獲取指標數據的，要讓 Prometheus 從目標處獲得數據，首先必須在目標上安裝指標收集的程序，並暴露出 HTTP 接口供 Prometheus 查詢，這個指標收集程序被稱為 Exporter，不同的指標需要不同的 Exporter 來收集，目前已經有大量的 Exporter 可供使用，幾乎囊括了我們常用的各種系統和軟件，官網列出了一份常用 Exporter 的清單，各個 Exporter 都遵循一份端口約定，避免端口沖突，即從 9100 開始依次遞增，這里是完整的 Exporter 端口列表。另外值得注意的是，有些軟件和系統無需安裝 Exporter，這是因為他們本身就提供了暴露 Prometheus 格式的指標數據的功能，比如 Kubernetes、Grafana、Etcd、Ceph 等。

這一節就讓我們來收集一些有用的數據。

5.1 收集服務器指標

首先我們來收集服務器的指標，這需要安裝 node_exporter，這個 exporter 用於收集 *NIX 內核的系統，如果你的服務器是 Windows，可以使用 WMI exporter。

和 Prometheus server 一樣，node_exporter 也是開箱即用的：

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.16.0/node_exporter-0.16.0.linux-amd64.tar.gz

$ tar xvfz node_exporter-0.16.0.linux-amd64.tar.gz

$ cd node_exporter-0.16.0.linux-amd64

$ ./node_exporter

或 $ nohup /usr/local/node_exporter//node_exporter & ---永久啟動

查看端口 ss -naltp |grep 9100

node_exporter 啟動之后，我們訪問下 /metrics 接口看看是否能正常獲取服務器指標：

1	`$ curl http://localhost:9100/metrics`

如果一切 OK，我們可以修改 Prometheus 的配置文件，將服務器加到 scrape_configs 中：

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['192.168.0.107:9090']

- job_name: 'server'

static_configs:

- targets: ['192.168.0.107:9100']

修改配置后，需要重啟 Prometheus 服務，或者發送 HUP 信號也可以讓 Prometheus 重新加載配置：

1	`$ killall -HUP prometheus`

在 Prometheus Web UI 的 Status -> Targets 中，可以看到新加的服務器：

在 Graph 頁面的指標下拉框可以看到很多名稱以 node 開頭的指標，譬如我們輸入 node_load1 觀察服務器負載：

六、一些常用監控舉例

1、監控linux機器（node-exporter）

https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz

（1）被監控的機器安裝node-exporter

tar -xvf node_exporter-0.17.0.linux-amd64.tar.gz -C /usr/local/

（2）啟動node-exporter

/usr/local/node_exporter-0.17.0.linux-amd64/node_exporter &

（3）普羅米修斯配置文件添加監控項

vim /usr/local/Prometheus/prometheus.yml

默認node-exporter端口為9100

- job_name: 'Prometheus'

static_configs:

- targets: ['192.168.0.102:9100']

labels:

instance: Prometheus

重啟普羅米修斯或重新加載配置文件 killall -HUP prometheus

（4）grafana導入畫好的dashboard

dashboard json

鏈接：https://pan.baidu.com/s/1Dlm0IHTgRmc0q2P82cDjKg 提取碼：myv6

修改名字，選擇我們前文創建好的數據源，點擊導入即可

如果沒有任何顯示，是grafana缺少相關顯示需要用到的插件piechart，grafana的默認插件目錄是/var/lib/grafana/plugins，可以將下載好的插件解壓到這個目錄，重啟grafana即可

piechart插件：

鏈接：https://pan.baidu.com/s/1tvZWI9vhAqvJhojKmDlmew 提取碼：tlyl

service grafana-server restart

/usr/sbin/grafana-cli plugins ls #查看已安裝插件

再刷新grafana頁面，即可看到我們剛才設置好的node監控

https://github.com/martinlindhe/wmi_exporter/releases

（1）被監控windows機器安裝wmi-exporter，會自動創建一個開機自啟的服務

（2）普羅米修斯配置文件添加配置項

vim /usr/local/Prometheus/prometheus.yml

默認wmi-exporter端口為9182

- job_name: 'Prometheus'

static_configs:

- targets: ['192.168.0.102:9182']

重啟普羅米修斯

（3）grafana導入畫好的dashboard，選擇普羅米修斯數據源

鏈接：https://pan.baidu.com/s/1nfTE2dqcr6NYldlBm_lnfw 提取碼：ohv4

七：一些網些

1、https://grafana.com/grafana/dashboards --監控模板下載

參考

1、https://blog.csdn.net/wshl1234567/article/details/100107167

2、https://juejin.im/post/5d79d804e51d453b7779d5ce#heading-26

3、https://blog.csdn.net/ywd1992/article/details/85989259

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus + Grafana（七）系統監控之Redis Prometheus + Grafana（八）系統監控之Kafka Prometheus + Grafana（九）系統監控之RabbitMQ Prometheus + Grafana（十四）系統監控之Canal Prometheus + Grafana（四）系統監控之釘釘預警 Grafana+Prometheus系統監控之webhook Grafana+Prometheus系統監控之Redis Grafana+Zabbix+Prometheus 監控系統 Prometheus + Grafana（十二）系統監控之Spark Docker搭建Prometheus+grafana監控系統