使用Prometheus+Grafana進行Apache Hadoop集群監控

本文轉載自查看原文 2021-03-16 00:45 363 hadoop

簡介

Prometheus——從指標到洞察

開源監控解決方案

維度模型
- 一個指標，可以通過多種維度來分析。
- Metrics + Multi-dimension，以key-value簡潔描述時間序列數據
PromQL
- 基於維度模型的查詢語言
可視化支撐
- 支持build-in表達式方式瀏覽
- 支持Grafana集成
- 以及控制台模板語言
高效存儲
- 在內存和磁盤中，存儲時間序列數據
- 支持分片和聯邦存儲
簡單易用
- 每個服務器都是獨立可靠的，並且運行在本地存儲上
- 基於Go語言實現，所有的二進制庫都采用靜態鏈接，容易部署
精准alert
- 基於PromQL定義，由Alert Manager處理通知、以及抑制通知
眾多客戶端庫支持
- 客戶端易於嵌入到服務中，自定義庫也很容易實現
- 支持數十種語言實現
廣泛集成
- 現有的exporter允許橋接第三方數據到Promethus，例如：操作系統統計指標、Docker、HAProxy、JMX指標等。
100%開源，並且為社區驅動。所有組件均可使用

Prometheus生態圈組件

Prometheus Server

主服務器，負責收集和存儲時間序列數據

client libraies

應用程序代碼插樁，將監控指標嵌入到被監控應用程序中

推送網關

為支持short-lived作業提供一個推送網關

exporter

專門為一些應用開發的數據攝取組件——exporter，例如：HAProxy、StatsD、Graphite等等。

alertmanager

專門用於處理alert的組件

架構

Prometheus既然設計為一個維度存儲模型，可以把它理解為一個OLAP系統。

存儲計算層

Prometheus Server，里面包含了存儲引擎和計算引擎。
Retrieval組件為取數組件，它會主動從Pushgateway或者Exporter拉取指標數據。
Service discovery，可以動態發現要監控的目標。
TSDB，數據核心存儲與查詢。
HTTP server，對外提供HTTP服務。

采集層

采集層分為兩類，一類是生命周期較短的作業，還有一類是生命周期較長的作業。

短作業：直接通過API，在退出時間指標推送給Pushgateway。
長作業：Retrieval組件直接從Job或者Exporter拉取數據。

應用層

應用層主要分為兩種，一種是AlertManager，另一種是數據可視化。

AlertManager
- 可以對接Pagerduty，是一套付費的監控報警系統。可實現短信報警、5分鍾無人ack打電話通知、仍然無人ack，通知值班人員Manager...
- Emial，發送郵件
- 等等
數據可視化
- Prometheus build-in WebUI
- Grafana
- 其他基於API開發的客戶端

安裝Prometheus

下載地址

https://prometheus.io/download/

本次安裝的版本為：2.25.0。

創建prometheus用戶

useradd prometheus
passwd prometheus

# 授予sudo權限
visudo
prometheus    ALL=(ALL)    NOPASSWD:ALL

上傳解壓server包

# 上傳
[prometheus@ha-node1 ~]$ ll -h
總用量 64M
-rw-r--r-- 1 root root 64M 3月   8 14:07 prometheus-2.25.0.linux-amd64.tar.gz

# 解壓
[prometheus@ha-node1 ~]$ tar -xvzf prometheus-2.25.0.linux-amd64.tar.gz -C /opt/

# 創建超鏈接
[prometheus@ha-node1 ~]$ cd /opt/
[prometheus@ha-node1 opt]$ ln -s /opt/prometheus-2.25.0.linux-amd64/ /opt/prometheus

# 進入到安裝目錄
[prometheus@ha-node1 opt]$ cd prometheus
[prometheus@ha-node1 prometheus]$ ll
總用量 167984
drwxr-xr-x 2 prometheus prometheus       38 2月  18 00:11 console_libraries
drwxr-xr-x 2 prometheus prometheus      173 2月  18 00:11 consoles
-rw-r--r-- 1 prometheus prometheus    11357 2月  18 00:11 LICENSE
-rw-r--r-- 1 prometheus prometheus     3420 2月  18 00:11 NOTICE
-rwxr-xr-x 1 prometheus prometheus 91044140 2月  17 22:19 prometheus
-rw-r--r-- 1 prometheus prometheus      926 2月  18 00:11 prometheus.yml
-rwxr-xr-x 1 prometheus prometheus 80948693 2月  17 22:21 promtool

# 查看版本號
[prometheus@ha-node1 prometheus]$ ./prometheus --version
prometheus, version 2.25.0 (branch: HEAD, revision: a6be548dbc17780d562a39c0e4bd0bd4c00ad6e2)
  build user:       root@615f028225c9
  build date:       20210217-14:17:24
  go version:       go1.15.8
  platform:         linux/amd64

可以看到，Promethues是基於go語言開發的。

配置

Prometheus Server啟動需要指定一個重要配置文件：prometheus.yml。

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

該配置文件主要包含三大塊：

global
rule_files
scrape_configs

global配置塊

global配置塊控制Prometheus服務器的全局配置。可以看到，global下有兩個配置：

scrape_interval：配置拉取數據的時間間隔，默認為1分鍾。
evaluation_interval：規則驗證（生成alert）的時間間隔，默認為1分鍾。

rule_files配置塊

規則配置文件

scrape_configs配置塊

配置采集目標相關，表示prometheus要監視哪些目標。Prometheus自身的運行信息可以通過HTTP訪問，所以Prometheus可以監控自己的運行數據。

job_name：監控作業的名稱
static_configs：表示靜態目標配置，就是固定從某個target拉取數據
targets：指定監控的目標，其實就是從哪兒拉取數據。Prometheus會從http://localhost:9090/metrics上拉取數據。

Prometheus是可以在運行時自動加載配置的。啟動時需要添加：--web.enable-lifecycle

啟動

./prometheus --config.file=prometheus.yml

http://ha-node1:9090/metrics

可以通過/metrics可以查看到Prometheus的一些指標

表達式瀏覽器

http://ha-node1:9090/graph

查詢監控指標（可以切換Table或者Graph視圖）

使用PromQL查詢每分鍾的請求成功率：

rate(promhttp_metric_handler_requests_total{code="200"}[1m])

概念

數據模型

Prometheus將所有的數據存儲為時間序列。一組時間戳對應的流數據，屬於同樣的指標、同樣的一組維度。也就是說，例如：監控的時間為5s。那么5秒內流中的所有監控指標，都將放在一組維度中進行計算。而存儲下來的就是5s的指標。除了存儲的時間序列之外，Prometheus還可以生成臨時派生的時間序列作為查詢結果。例如：基於5s上卷，計算1小時、半天的維度指標。

時間序列

針對任意時間戳中發生的流數據的一條數據稱為時間序列。例如：

net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0

指標名和標簽

每個時間序列都由指標的名稱（Metrics Name）、以及可選的key-value組成的標簽（Labels）組成。例如：

net_conntrack_dialer_conn_failed_total{dialer_name="alertmanager",reason="timeout"} 0
go_gc_duration_seconds{quantile="0"} 2.9347e-05
go_gc_duration_seconds{quantile="0.25"} 5.3684e-05
go_gc_duration_seconds{quantile="0.5"} 9.8082e-05

# go_gc_duration_seconds為指標名稱
# quantile="0"、dialer_name="alertmanager"、reason="timeout"為標簽
# 0、2.9347e-05為指標值

標簽其實組成了Prometheus的維度模型。可以按照相同的指標名稱可以任意組合（很類似於數倉的維度建模）。使用PromQL可以按照這些組合的維度進行聚合和過濾。

但標簽的值、添加標簽、刪除標簽都會生成一個新的時間序列。

語法

<metric name>{<label name>=<label value>, ...}

指標類型

指標是來源於客戶端的，Prometheus提供給客戶端4類的指標類型。除此之外的類型的指標類型，Prometheus會將它們划分為未啟用的類型信息。這4類指標類型分別為：

Counter（計數器）
Gauge（精確度量值）
Histogram（矩形圖）
Summary（總結）

指標類型會影響時間序列聚合計算的方式。就像SQL的count和sum采用的是不同的計算方式一樣。

Counter

是一個累計度量指標類型，這種類型的指標計算的方式是單調遞增的計數器。它的值要么是增加，要么被重置為零。例如：使用Counter來統計服務請求的數量、已完成的任務、錯誤的數量等等。

不要使用Counter來處理會發生減少的值，例如：當前正在運行的進程。應該使用Gauge。

Java API

導入Maven依賴

<!-- The client -->
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient</artifactId>
  <version>0.10.0</version>
</dependency>
<!-- Hotspot JVM metrics-->
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient_hotspot</artifactId>
  <version>0.10.0</version>
</dependency>
<!-- Exposition HTTPServer-->
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient_httpserver</artifactId>
  <version>0.10.0</version>
</dependency>
<!-- Pushgateway exposition-->
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient_pushgateway</artifactId>
  <version>0.10.0</version>
</dependency>

客戶端代碼

import io.prometheus.client.Counter;

class YourClass {
  static final Counter requests = Counter.build()
     .name("requests_total").help("Total requests.").register();

  void processRequest() {
    requests.inc();
    // Your code here.
  }
}

Python API

導入包

pip install prometheus-client

from prometheus_client import Counter
c = Counter('my_failures', 'Description of counter')
c.inc()     # Increment by 1
c.inc(1.6)  # Increment by given value

Gauge

使用Gauge指標類型可以任意增加或者減少單個數字。例如：統計溫度、內存使用量等。或者是上升、下降的計數。例如：並發請求數。

Java API

class YourClass {
  static final Gauge inprogressRequests = Gauge.build()
     .name("inprogress_requests").help("Inprogress requests.").register();

  void processRequest() {
    inprogressRequests.inc();
    // Your code here.
    inprogressRequests.dec();
  }
}

Python API

from prometheus_client import Gauge
g = Gauge('my_inprogress_requests', 'Description of gauge')
g.inc()      # Increment by 1
g.dec(10)    # Decrement by given value
g.set(4.2)   # Set to a given value

Histogram

直方圖是對監控指標值（通常是請求的持續時間或者響應大小之類）進行采樣，並存儲在可配置的桶中。還可以計算所監控的值的總和。

名為<basename>的指標在拉取數據期間會顯示多個序列：

桶的累積計數，顯示為：<basename>_bucket{le="<上限>"}
觀察值的綜合，顯示為：<basename>_sum
觀察到的事件計數，顯示為：<basename>_count

使用histogram_quantile()可根據直方圖來計算分位數。

Summary

類似於Histogram，也是對監控指標進行采樣。也包含了監控指標的總數、值的總和，還可以計算滑動時間窗口類的分位數。

作業和實例

Prometheus將被監控的目標的端稱為實例（Instance），通常是單個進程。而具備有相同目的的實例集合稱為作業（Job）。例如：Hadoop中有大量的NodeManager，我們可以用一個Job包含很多的NodeManager實例。

例如：有4個NodeManager節點的YARN集群Job。

job: NodeManager
	instance_1: ha-node1:8042
	instance_2: ha-node2:8042
	instance_3: ha-node3:8042
	instance_4: ha-node4:8042

自動生成Label和時間序列

Prometheus從目標拉取數據時，會自動在時間序列上添加一些標簽，以用來識別具體的目標。包含以下Label：

job：目標對應配置的job名稱
instance：目標對應配置的host:port

每個instance，Prometheus都會存儲以下時間序列：

# 采集運行良好為1，采集運行失敗為0
up{job="<job-name>", instance="<instance-id>"}
# 采集運行持續時間
scrape_duration_seconds{job="<job-name>", instance="<instance-id>"}
# 應用度量標准重新標記后剩余的樣本數
scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"}
# 從目標采集的樣本數
scrape_samples_scraped{job="<job-name>", instance="<instance-id>"}
# 新的序列數量
scrape_series_added{job="<job-name>", instance="<instance-id>"}

Prometheus配置說明

prometheus.yaml說明

配置項	說明
global_config	全局配置指定在整個Prometheus上下文有效的參數，還有一些作用於其他配置的默認配置。
scrape_config	指定被采集監控指標的目標參數。一般針對一個作業都會有一個scrape_config
tls_config	基於TLS連接的相關配置。（安全傳輸層協議）
azure_sd_config	從Microsoft Azure虛擬機相關配置。
consul_sd_config	Consul相關配置（Google開源服務發現微服務框架）
digitalocean_sd_config	Digitalocean相關配置（也是一家雲主機廠商）
dockerswarm_sd_config	dockerswarm相關配置（Docker集群管理中間件）
dns_sd_config	基於DNS的服務發現配置。
ec2_sd_config	AWS EC2配置。
openstack_sd_config	OpenStack配置。
file_sd_config	基於文件的服務發現，也是靜態目標的一種更通用的配置方法，通過它可以實現自定義實現機制。
gce_sd_config	基於GCP、GCE實例的配置（Google Cloud Platform）。
hetzner_sd_config	基於Hetzner Clouder的配置。
kubernetes_sd_config	基於K8s的配置。
marathon_sd_config	基於Marathon的配置。
nerve_sd_config	基於AirBnB的Nerve的配置。
serverset_sd_config	存儲在ZooKeeper中的Serverset配置。
triton_sd_config	基於從Container Monitor的配置。
eureka_sd_config	基於Eureka的配置。
static_config	配置指定目標列表、和目標的通用標簽級。配置靜態目標。
relabel_config	顧名思義，它可以對目標的標簽進行重寫。
metric_relabel_configs	它不適用於自動生成的時間序列，是對攝取的最后一次重寫標記。
alert_relabel_configs	顧名思義，對alert的標簽進行重寫。
alertmanager_config	指定Prometheus服務器警報發送到的AlertManager實例。
remote_write	根據write_relabel_configs配置進行重新Label，然后將重新標記后的Label發送到遠端。
remote_read	從源端讀取標簽

定義Record Rule

在Prometheus中有兩種類型的規則：

Record Rule
Alert Rule

在Prometheus中可以通過rule_files來加載規則，rule_files是可以運行時重載的。

Record Rule

Record Rule配置預計算，一些計算量比較大的表達式，可以放在Record Rule進行預計算，並將計算結果存儲為一組新的時間序列。這樣，基於預計算后的結果要比每次都執行計算表達式要快得多。這種方式，對於一些Dashboard的展示很有效。

以下為配置示例：

groups:
  - name: example
    rules:
    - record: job:http_inprogress_requests:sum
      # PromQL表達式
      expr: sum by (job) (http_inprogress_requests)
      # 在保存結果前的添加或重寫的標簽
      labels: 
        - [ <labelname>: <labelvalue>]

Record Rule和Alert都定義在一個規則組中（Group），組中的規則以固定的時間間隔順序執行。Record Rule的名稱必須是有效的度量標准名稱。而Alert Rule的名稱必須是有效的Label值。

Alert Rule

Alert Rule可以基於Prometheus表達式語言定義警報條件，並將通知發送到外部服務。只要表達式在給定的時間點產生一個或者多個元素，就認為這些元素的標簽集處於活動狀態。

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    # 表示附加到警報的標簽
    labels:
      severity: page
    # 附加更長的描述信息。
    annotations:
      summary: High request latency

for表示第一次表達式輸出元素能夠持續一定的時間。上述的Alert Rule表示10分鍾內是否還處於活動狀態，如果處於活動狀態但尚未觸發的元素會處於掛起狀態。

被測程序埋點

INSTRUMENTING這個詞翻譯過來是檢測儀表裝置的意思，而針對Prometheus上下文表示的是將采集數據的代碼植入到被監控程序。我把它簡稱為客戶端埋點。

客戶端庫

在監控之前，需要通過Prometheus客戶端庫，在被測程序中埋點。這些客戶端實現了之前我們提到過的4種指標類型。官方支持4種客戶端：

Go
Java / Scala
Python
Ruby

第三方客戶端：

Bash
C
C++
Common Lisp
Dart
Elixir
Erlang
Haskell
Lua for Nginx
Lua for Tarantool
.NET / C#
Node.js
Perl
PHP
R
Rust

當Prometheus服務器從HTTP采集實例中拉取數據時，客戶端庫會將所有監控的指標發送到服務器。

Exporter與集成

一些庫和服務器可以將第三方系統中的現有指標導出為Prometheus指標。這種會極大降低使用Prometheus的成本。例如：使用JMXExporter可以將基於JVM的應用程序中導出為Prometheus的指標。例如：Kafka、Cassandra之類的。

Prometheus官方以及Github上開源的Exporter非常豐富。

參考：https://prometheus.io/docs/instrumenting/exporters/

推送指標

一些短作業是不利於監控的，因為它們總是在短時間內快速變化。此時，可以使用Prometheus提供的Pushgateway將指標推送到Prometheus Server的拉取中間作業。所以，即便沒有客戶端庫，也可以實現監控。

具體請參考：https://prometheus.io/docs/instrumenting/pushing/

監控Hadoop集群

監控ZooKeeper指標

修改zkServer.sh

vim /opt/apache-zookeeper-3.6.1-bin/conf/zoo.cfg

## Metrics Providers
#
# https://prometheus.io Metrics Exporter
metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
metricsProvider.httpPort=7000
metricsProvider.exportJvmInfo=true

將ZK分發到每個節點

scp /opt/apache-zookeeper-3.6.1-bin/conf/zoo.cfg ha-node2:/opt/apache-zookeeper-3.6.1-bin/conf; \
scp /opt/apache-zookeeper-3.6.1-bin/conf/zoo.cfg ha-node3:/opt/apache-zookeeper-3.6.1-bin/conf; \
scp /opt/apache-zookeeper-3.6.1-bin/conf/zoo.cfg ha-node4:/opt/apache-zookeeper-3.6.1-bin/conf; \
scp /opt/apache-zookeeper-3.6.1-bin/conf/zoo.cfg ha-node5:/opt/apache-zookeeper-3.6.1-bin/conf

啟動ZooKeeper集群

測試獲取ZooKeeper metrics

curl ha-node1:7000/metrics

馬上就能看到大量的指標了。

[zookeeper@ha-node1 bin]$ curl ha-node1:9505/metrics
# HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.
# TYPE jmx_exporter_build_info gauge
jmx_exporter_build_info{version="0.15.0",name="jmx_prometheus_javaagent",} 1.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 41.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 15.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 41.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 43.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0

Prometheus開啟監控

編輯prometheus.yaml文件：

  - job_name: 'zk_cluster'
    static_configs:
    - targets: ['ha-node1:9505','ha-node2:9505','ha-node3:9505','ha-node4:9505','ha-node5:9505']

啟動prometheus：

./prometheus --config.file=prometheus.yml

監控Hadoop運行指標

創建prometheus_client用戶

在所有節點創建prometheus_client用戶。

useradd prometheus_client
passwd prometheus_client

上傳並解壓Hadoop exporter

[prometheus_client@ha-node1 ~]$ ll
總用量 17228
-rw-r--r-- 1 root root 17639070 3月   9 15:58 hadoop_jmx_exporter.tar.gz

解壓到指定目錄

tar -xvzf hadoop_jmx_exporter.tar.gz -C /opt

修改集群節點文件

vim /opt/hadoop_jmx_exporter/apache-tomcat-8.5.63/webapps/ROOT/cluster_config.json

在第一個節點啟動tomcat

cd /opt/hadoop_jmx_exporter/apache-tomcat-8.5.63
bin/startup.sh

分發到每個節點

scp -r /opt/hadoop_jmx_exporter ha-node2:/opt; \
scp -r /opt/hadoop_jmx_exporter ha-node3:/opt; \
scp -r /opt/hadoop_jmx_exporter ha-node4:/opt; \
scp -r /opt/hadoop_jmx_exporter ha-node5:/opt

所有節點啟動Exporter

export PYTHONPATH=${PYTHONPATH}:/opt/hadoop_jmx_exporter/hadoop_exporter/modules
python /opt/hadoop_jmx_exporter/hadoop_exporter/hadoop_exporter.py -host "0.0.0.0" -P 9131 -s "ha-node1:9035"

配置Prometheus

  - job_name: 'hadoop'
    static_configs:
    - targets: ['ha-node1:9131','ha-node2:9131','ha-node3:9131','ha-node4:9131','ha-node5:9131']

啟動Prometheus

./prometheus --config.file=prometheus.yml

PromQL語法

用戶可以通過PromQL進行實時查詢、以及匯總時間序列數據。結果可以以圖形化展示，也可以以Table方式展示，還可以通過Http API方式提供給外部。

快速入門

查詢指定指標列

http_requests_total

指定條件查詢

http_requests_total{job="apiserver", handler="/api/comments"}

指定時間范圍查詢

http_requests_total{job="apiserver", handler="/api/comments"}[5m]

正則匹配查詢

# 以~開頭
http_requests_total{job=~".*server"}

取反查詢

http_requests_total{status!~"4.."}

子查詢

查詢過去30m，http_request_total指標5分鍾的速度，分辨率為1分鍾。

rate(http_requests_total[5m])[30m:1m]

嵌套子查詢

max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])

使用函數、運算符

返回過去5分鍾類使用http_request_total指標的所有時間序列，每秒的速率。

rate(http_requests_total[5m])

按照作業統計的比率的和。

sum by (job) (
  rate(http_requests_total[5m])
)

運算符（對於不同的指標，相同的維度標簽）

(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

相同的表達式，但按照不同的應用匯總。

sum by (app, proc) (
  instance_memory_limit_bytes - instance_memory_usage_bytes
) / 1024 / 1024

更多的函數、運算符請參考：https://prometheus.io/docs/prometheus/latest/querying/basics/

語法檢查

為了方便排錯，可以在不啟動Prometheus進行語法檢查。

promtool check rules /path/to/example.rules.yml

返回1表示存在語法錯誤，返回0表示無語法錯誤。

Prometheus整合Grafana

下載地址：https://grafana.com/grafana/download

安裝Grafana

創建Grafana用戶

useradd grafana
passwd grafana

上傳解壓grafana

# 上傳...
[grafana@ha-node1 ~]$ ll
總用量 50068
-rw-r--r-- 1 root root 51268825 3月   9 16:29 grafana-7.4.3.linux-amd64.tar.gz

# 解壓
[grafana@ha-node1 ~]$ tar -xvzf grafana-7.4.3.linux-amd64.tar.gz -C /opt/

# 創建超鏈接
[grafana@ha-node1 ~]$ ln -s /opt/grafana-7.4.3/ /opt/grafana

啟動grafana

[grafana@ha-node1 grafana]$ pwd
/opt/grafana

[grafana@ha-node1 grafana]$ ./bin/grafana-server web

訪問grafana

http://ha-node1:3000/login
# 默認用戶名密碼為:admin/admin

添加指標數據源

Grafana可以以圖表化的方式展示Prometheus中的數據。要展示數據，首先要告訴Grafana從哪兒查詢數據，也就是prometheus server（TSDB）的位置。

點擊小齒輪 > DataSource

添加數據源

選擇Prometheus

配置Prometheus地址，此處為：http://ha-node1:9090

點擊Save & test

使用Grafana查詢指標

點擊 Explore圖標，再選擇Prometheus數據源。

然后就可以用PromQL查詢指標數據了。

創建Dashboard

1、創建Dashboard

2、點擊右上角添加Panel

3、配置Panel

Grafana 表達式語言

Expression syntax	Example	Renders to	Explanation
`${__field.displayName}`	Same as syntax	`Temp {Loc="PBI", Sensor="3"}`	Displays the field name, and labels in `{}` if they are present. If there is only one label key in the response, then for the label portion, Grafana displays the value of the label without the enclosing braces.
`${__field.name}`	Same as syntax	`Temp`	Displays the name of the field (without labels).
`${__field.labels}`	Same as syntax	`Loc="PBI", Sensor="3"`	Displays the labels without the name.
`${__field.labels.X}`	`${__field.labels.Loc}`	`PBI`	Displays the value of the specified label key.
`${__field.labels.__values}`	Same as Syntax	`PBI, 3`	Displays the values of the labels separated by a comma (without label keys).

參考文獻：

[1] https://prometheus.io/docs/introduction/overview/
[2] https://grafana.com/docs/grafana/latest/panels/field-options/standard-field-options/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus+Grafana監控 Prometheus+Grafana監控使用Prometheus+Grafana監控MySQL實踐使用prometheus+grafana監控postgresql的測試使用Prometheus+Grafana監控JVM 050.Kubernetes集群管理-Prometheus+Grafana監控方案部署Prometheus+Grafana監控 prometheus+grafana監控redis Prometheus+Grafana 監控 Docker prometheus+grafana監控mysql