prometheus client_golang使用

本文轉載自查看原文 2020-05-20 15:53 3095

序言

Prometheus是一個開源的監控系統，擁有許多Advanced Feature，他會定期用HTTP協議來pull所監控系統狀態進行數據收集，在加上timestamp等數據組織成time series data，用metric name和label來標識不同的time series,用戶可以將數據用可視化工具顯示出來，並設置報警閾值進行報警。
本文將介紹Primetheus client的使用，基於golang語言，golang client 是當pro收集所監控的系統的數據時，用於響應pro的請求，按照一定的格式給pro返回數據，說白了就是一個http server，源碼參見github,相關的文檔參見GoDoc,讀者可以直接閱讀文檔進行開發，本文只是幫助理解。

基礎

要想學習pro golang client，需要有一個進行測試的環境，筆者建議使用prometheus的docker環境，部署迅速，對於系統沒有影響，安裝方式參見Using Docker，需要在本地准備好Pro的配置文件prometheus.yml，然后以volme的方式映射進docker，配置文件中的內容如下：

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "go-test"
  scrape_interval: 60s
  scrape_timeout: 60s
  metrics_path: "/metrics"

  static_configs:
  - targets: ["localhost:8888"]

可以看到配置文件中指定了一個job_name，所要監控的任務即視為一個job, scrape_interval和scrape_timeout是pro進行數據采集的時間間隔和頻率，matrics_path指定了訪問數據的http路徑，target是目標的ip:port,這里使用的是同一台主機上的8888端口。此處只是基本的配置，更多信息參見官網。
配置好之后就可以啟動pro服務了：
docker run --network=host -p 9090:9090 -v /home/gaorong/project/prometheus_test/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
此處網絡通信采用的是host模式，所以docker中的pro可以直接通過localhost來指定同一台主機上所監控的程序。prob暴露9090端口進行界面顯示或其他操作，需要對docker中9090端口進行映射。啟動之后可以訪問web頁面http://localhost:9090/graph,在status下拉菜單中可以看到配置文件和目標的狀態，此時目標狀態為DOWN，因為我們所需要監控的服務還沒有啟動起來，那就趕緊步入正文，用pro golang client來實現程序吧。

四種數據類型

pro將所有數據保存為timeseries data，用metric name和label區分，label是在metric name上的更細維度的划分，其中的每一個實例是由一個float64和timestamp組成，只不過timestamp是隱式加上去的，有時候不會顯示出來，如下面所示(數據來源於pro暴露的監控數據，訪問http://localhost:9090/metrics 可得），其中go_gc_duration_seconds是metrics name,quantile="0.5"是key-value pair的label，而后面的值是float64 value。
pro為了方便client library的使用提供了四種數據類型： Counter, Gauge, Histogram, Summary, 簡單理解就是Counter對數據只增不減，Gauage可增可減，Histogram,Summary提供跟多的統計信息。下面的實例中注釋部分# TYPE go_gc_duration_seconds summary 標識出這是一個summary對象。

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0.5"} 0.000107458
go_gc_duration_seconds{quantile="0.75"} 0.000200112
go_gc_duration_seconds{quantile="1"} 0.000299278
go_gc_duration_seconds_sum 0.002341738
go_gc_duration_seconds_count 18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 107

A Basic Example 演示了使用這些數據類型的方法（注意將其中8080端口改為本文的8888）

package main

import (
    "log"
    "net/http"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    cpuTemp = prometheus.NewGauge(prometheus.GaugeOpts{
        Name: "cpu_temperature_celsius",
        Help: "Current temperature of the CPU.",
    })
    hdFailures = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "hd_errors_total",
            Help: "Number of hard-disk errors.",
        },
        []string{"device"},
    )
)

func init() {
    // Metrics have to be registered to be exposed:
    prometheus.MustRegister(cpuTemp)
    prometheus.MustRegister(hdFailures)
}

func main() {
    cpuTemp.Set(65.3)
    hdFailures.With(prometheus.Labels{"device":"/dev/sda"}).Inc()

    // The Handler function provides a default handler to expose metrics
    // via an HTTP server. "/metrics" is the usual endpoint for that.
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(":8888", nil))
}

其中創建了一個gauge和CounterVec對象，並分別指定了metric name和help信息，其中CounterVec是用來管理相同metric下不同label的一組Counter，同理存在GaugeVec，可以看到上面代碼中聲明了一個lable的key為“device”，使用的時候也需要指定一個lable: hdFailures.With(prometheus.Labels{"device":"/dev/sda"}).Inc()。
變量定義后進行注冊，最后再開啟一個http服務的8888端口就完成了整個程序，pro采集數據是通過定期請求該服務http端口來實現的。
啟動程序之后可以在web瀏覽器里輸入http://localhost:8888/metrics 就可以得到client暴露的數據，其中有片段顯示為：

# HELP cpu_temperature_celsius Current temperature of the CPU.
# TYPE cpu_temperature_celsius gauge
cpu_temperature_celsius 65.3

# HELP hd_errors_total Number of hard-disk errors.
# TYPE hd_errors_total counter
hd_errors_total{device="/dev/sda"} 1

上圖就是示例程序所暴露出來的數據，並且可以看到counterVec是有label的,而單純的gauage對象卻不用lable標識，這就是基本數據類型和對應Vec版本的差別。此時再查看http://localhost:9090/graph 就會發現服務狀態已經變為UP了。
上面的例子只是一個簡單的demo,因為在prometheus.yml配置文件中我們指定采集服務器信息的時間間隔為60s，每隔60s pro會通過http請求一次自己暴露的數據，而在代碼中我們只設置了一次gauge變量cupTemp的值，如果在60s的采樣間隔里將該值設置多次，前面的值就會被覆蓋，只有pro采集數據那一刻的值能被看到，並且如果不再改變這個值，pro就始終能看到這個恆定的變量，除非用戶顯式通過Delete函數刪除這個變量。
使用Counter,Gauage等這些結構比較簡單，但是如果不再使用這些變量需要我們手動刪，我們可以調用resetfunction來清除之前的metrics。

自定義Collector

更高階的做法是使用Collector，go client Colletor只會在每次響應pro請求的時候才收集數據，並且需要每次顯式傳遞變量的值，否則就不會再維持該變量，在pro也將看不到這個變量，Collector是一個接口，所有收集metrics數據的對象都需要實現這個接口，Counter和Gauage等不例外，它內部提供了兩個函數，Collector用於收集用戶數據，將收集好的數據傳遞給傳入參數Channel就可，Descirbe函數用於描述這個Collector。當收集系統數據代價較大時，就可以自定義Collector收集的方式，優化流程，並且在某些情況下如果已經有了一個成熟的metrics，就不需要使用Counter,Gauage等這些數據結構，直接在Collector內部實現一個代理的功能即可，一些高階的用法都可以通過自定義Collector實現。

package main

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "net/http"
)

type ClusterManager struct {
    Zone         string
    OOMCountDesc *prometheus.Desc
    RAMUsageDesc *prometheus.Desc
    // ... many more fields
}

// Simulate prepare the data
func (c *ClusterManager) ReallyExpensiveAssessmentOfTheSystemState() (
    oomCountByHost map[string]int, ramUsageByHost map[string]float64,
) {
    // Just example fake data.
    oomCountByHost = map[string]int{
        "foo.example.org": 42,
        "bar.example.org": 2001,
    }
    ramUsageByHost = map[string]float64{
        "foo.example.org": 6.023e23,
        "bar.example.org": 3.14,
    }
    return
}

// Describe simply sends the two Descs in the struct to the channel.
func (c *ClusterManager) Describe(ch chan<- *prometheus.Desc) {
    ch <- c.OOMCountDesc
    ch <- c.RAMUsageDesc
}

func (c *ClusterManager) Collect(ch chan<- prometheus.Metric) {
    oomCountByHost, ramUsageByHost := c.ReallyExpensiveAssessmentOfTheSystemState()
    for host, oomCount := range oomCountByHost {
        ch <- prometheus.MustNewConstMetric(
            c.OOMCountDesc,
            prometheus.CounterValue,
            float64(oomCount),
            host,
        )
    }
    for host, ramUsage := range ramUsageByHost {
        ch <- prometheus.MustNewConstMetric(
            c.RAMUsageDesc,
            prometheus.GaugeValue,
            ramUsage,
            host,
        )
    }
}

// NewClusterManager creates the two Descs OOMCountDesc and RAMUsageDesc. Note
// that the zone is set as a ConstLabel. (It's different in each instance of the
// ClusterManager, but constant over the lifetime of an instance.) Then there is
// a variable label "host", since we want to partition the collected metrics by
// host. Since all Descs created in this way are consistent across instances,
// with a guaranteed distinction by the "zone" label, we can register different
// ClusterManager instances with the same registry.
func NewClusterManager(zone string) *ClusterManager {
    return &ClusterManager{
        Zone: zone,
        OOMCountDesc: prometheus.NewDesc(
            "clustermanager_oom_crashes_total",
            "Number of OOM crashes.",
            []string{"host"},
            prometheus.Labels{"zone": zone},
        ),
        RAMUsageDesc: prometheus.NewDesc(
            "clustermanager_ram_usage_bytes",
            "RAM usage as reported to the cluster manager.",
            []string{"host"},
            prometheus.Labels{"zone": zone},
        ),
    }
}

func main() {
    workerDB := NewClusterManager("db")
    workerCA := NewClusterManager("ca")

    // Since we are dealing with custom Collector implementations, it might
    // be a good idea to try it out with a pedantic registry.
    reg := prometheus.NewPedanticRegistry()
    reg.MustRegister(workerDB)
    reg.MustRegister(workerCA)

    http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))
    http.ListenAndServe(":8888", nil)
}

此時就可以去http://localhost:8888/metrics 看到傳遞過去的數據了。示例中定義了兩個matrics, host和zone分別是其label。其實pro client內部提供了幾個Collecto供我們使用，我們可以參考他的實現，在源碼包中可以找到go_collector.go, process_collecor.go, expvar_collector這三個文件的Collecor實現。

--update at 2019.2.26---
強烈建議將pro官網Best practice 章節閱讀一下，畢竟學會使用工具之后，我們需要明白作為一個系統，我們應該暴露哪些metriscs,該使用哪些變量最好....

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 prometheus client_golang使用 golang prometheus包的使用 Python prometheus_client使用方式使用 GoLang 獲取 TLS 的 Client Hello Info golang使用http client發起get和post請求示例 Prometheus 數據監控 on Python Client (一) golang使用http client發起get和post請求示例 golang kafka client Eureka-Client（Golang實現） golang http client的MaxConnsPerHost限制