Prometheus 快速入門教程（一）：Prometheus 快速入門

本文轉載自查看原文 2020-10-15 09:20 13307 prometheus/ 14. Prometheus 快速入門教程

博主個人獨立站點開通啦！歡迎點擊訪問：https://shuyi.tech

ZXlBPHGcWeYh2hjBzacc3A

Prometheus 是任何一個高級工程師必須要掌握的技能。那么如何從零部署一套 Prometheus 監控系統呢？本篇文章將從 Prometheus 的原理講起，手把手帶你用一個最簡單的例子部署一套 Prometheus 監控系統。

基本原理

Prometheus的基本架構如下圖所示：

從上圖可以看到，整個 Prometheus 可以分為四大部分，分別是：

Prometheus 服務器

Prometheus Server 是 Prometheus組件中的核心部分，負責實現對監控數據的獲取，存儲以及查詢。

NodeExporter 業務數據源

業務數據源通過 Pull/Push 兩種方式推送數據到 Prometheus Server。

AlertManager 報警管理器

Prometheus 通過配置報警規則，如果符合報警規則，那么就將報警推送到 AlertManager，由其進行報警處理。

可視化監控界面

Prometheus 收集到數據之后，由 WebUI 界面進行可視化圖標展示。目前我們可以通過自定義的 API 客戶端進行調用數據展示，也可以直接使用 Grafana 解決方案來展示。

簡單地說，Prometheus 的實現架構也並不復雜。其實就是收集數據、處理數據、可視化展示，再進行數據分析進行報警處理。 但其珍貴之處在於提供了一整套可行的解決方案，並且形成了一整個生態，能夠極大地降低我們的研發成本。

文章首發於【陳樹義】公眾號，點擊跳轉到原文：https://mp.weixin.qq.com/s/ZXlBPHGcWeYh2hjBzacc3A

快速入門

下面我們通過一個簡單的例子來實現對服務器 CPU、內存等硬件信息的監控。

安裝運行 Prometheus 服務端

Prometheus 服務端負責數據的收集，因此我們應該首先安裝並運行 Prometheus Server。

從 https://prometheus.io/download/ 找到最新版本的Prometheus Sevrer軟件包：

下載后解壓，可以看到如下目錄結構：

其中 data 目錄是數據的存儲路徑，也可以通過運行時的 --storage.tsdb.path="data/" 命令另行指定。Prometheus.yml 是 Prometheus的配置文件，prometheus 是運行的命令。

啟動prometheus服務，其會默認加載當前路徑下的prometheus.yaml文件。當然我們也可以手動指定配置文件地址：

./prometheus --config.file=prometheus.yml

文章首發於【陳樹義】公眾號，點擊跳轉到原文：https://mp.weixin.qq.com/s/ZXlBPHGcWeYh2hjBzacc3A

正常的情況下，你可以看到以下輸出內容：

level=info ts=2020-07-18T06:48:52.454Z caller=main.go:694 fs_type=18
level=info ts=2020-07-18T06:48:52.454Z caller=main.go:695 msg="TSDB started"
level=info ts=2020-07-18T06:48:52.454Z caller=main.go:799 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2020-07-18T06:48:53.056Z caller=main.go:827 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2020-07-18T06:48:53.056Z caller=main.go:646 msg="Server is ready to receive web requests."

輸入 http://localhost:9090/graph 可以看到如下頁面，這個是 Prometheus 自帶的監控管理界面。

運行 NodeExporter 客戶端數據源

NodeExporter 是 Prometheus 提供的一個可以采集到主機信息的應用程序，它能采集到機器的 CPU、內存、磁盤等信息。

我們從 https://prometheus.io/download/ 獲取最新的 Node Exporter 版本的二進制包。

下載解壓后運行 Node Exporter，我們指定用 8080 端口運行:

./node_exporter --web.listen-address 127.0.0.1:8080

啟動成功后，可以看到以下輸出：

level=info ts=2020-07-18T06:52:42.132Z caller=node_exporter.go:191 msg="Listening on" address=127.0.0.1:8080
level=info ts=2020-07-18T06:52:42.132Z caller=tls_config.go:170 msg="TLS is disabled and it cannot be enabled on the fly." http2=false

訪問 http://localhost:8080/ 可以看到以下頁面：

訪問 http://localhost:8080/metrics，可以看到當前 node exporter 獲取到的當前主機的所有監控數據，如下所示：

每一個監控指標之前都會有一段類似於如下形式的信息：

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="idle"} 362812.7890625
# HELP node_load1 1m load average.
# TYPE node_load1 gauge
node_load1 3.0703125

其中 HELP 用於解釋當前指標的含義，TYPE 則說明當前指標的數據類型。

在上面的例子中 node_cpu 的注釋表明當前指標是 cpu0 上 idle 進程占用 CPU 的總時間，CPU 占用時間是一個只增不減的度量指標，從類型中也可以看出 node_cpu 的數據類型是計數器(counter)，與該指標的實際含義一致。

又例如node_load1該指標反映了當前主機在最近一分鍾以內的負載情況，系統的負載情況會隨系統資源的使用而變化，因此node_load1反映的是當前狀態，數據可能增加也可能減少，從注釋中可以看出當前指標類型為儀表盤(gauge)，與指標反映的實際含義一致。

除了這些以外，在當前頁面中根據物理主機系統的不同，你還可能看到如下監控指標：

node_boot_time：系統啟動時間
node_cpu：系統CPU使用量
nodedisk*：磁盤IO
nodefilesystem*：文件系統用量
node_load1：系統負載
nodememeory*：內存使用量
nodenetwork*：網絡帶寬
node_time：當前系統時間
go_*：node exporter中go相關指標
process_*：node exporter自身進程相關運行指標

配置 Prometheus 的監控數據源

現在我們運行了 Prometheus 服務器，也運行了業務數據源 NodeExporter。但此時 Prometheus 還獲取不到任何數據，我們還需要配置下 prometheus.yml 文件，讓其去拉取 Node Exporter 的數據。

我們配置一下 Prometheus 的配置文件，讓 Prometheus 服務器定時去業務數據源拉取數據。編輯prometheus.yml 並在 scrape_configs 節點下添加以下內容:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  # 采集node exporter監控數據
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:8080']

上面配置文件配置了兩個任務。一個是名為 prometheus 的任務，其從「localhost:9090」地址讀取數據。另一個是名為 node 的任務，其從「localhost:8080」地址讀取數據。

配置完成后，我們重新啟動 Prometheus。

./prometheus --config.file=prometheus.yml

查詢監控數據

配置完 Prometheus 讀取的數據源之后，Prometheus 便可以從 Node Exporter 獲取到數據了。那么接下來我們如何查看到這些數據呢？答案是：Prometheus UI！

Prometheus UI 是 Prometheus 內置的一個可視化管理界面，我們通過 http://localhost:9090 就可以訪問到該頁面。

通過 Prometheus UI 可以查詢 Prometheus 收集到的數據，而 Prometheus 定義了 PromQL 語言來作為查詢監控數據的語言，其余 SQL 類似。

接下來我們訪問 http://localhost:9090，進入到 Prometheus Server。如果輸入「up」並且點擊執行按鈕以后，可以看到如下結果：

可以看到 Element 處有幾條記錄，其中 instance 值為 localhost:8080 的記錄，value 是 1，這代表對應應用是存活狀態。

up{group="production",instance="localhost:8080",job="node"}	1

例如查看我們所運行 NodeExporter 節點所在機器的內存使用情況，可以輸入 node_memory_active_bytes/(1024*1024*1024) 查看。

查看 NodeExporter 節點所在機器 CPU 1 分鍾的負載情況，可以輸入 node_load1 即可查看。

到這里，我們基本上為完成了數據的收集過程，即數據從業務側收集到 Prometheus 側，並且還學會了如何使用 Prometheus 自帶的控制台。

文章首發於【陳樹義】公眾號，點擊跳轉到原文：https://mp.weixin.qq.com/s/ZXlBPHGcWeYh2hjBzacc3A

總結

這篇文章我們從 Prometheus 的原理入門，剖析了 Prometheus 實現的原理，並且用一個簡單的例子從零實現了一套業務監控系統。通過 Prometheus UI，我們可以實時獲取到機器的存活信息、CPU、內存信息。

文章首發於【陳樹義】公眾號，點擊跳轉到原文：https://mp.weixin.qq.com/s/ZXlBPHGcWeYh2hjBzacc3A

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。