2、Prometheus監控指標類型

本文轉載自查看原文 2021-02-12 13:59 515 Prometheus

一個簡單的程序

這里使用Python3語言，創建虛擬環境，並安裝prometheus_client

mkvirtualenv --python "/usr/local/python36/bin/python3" prom
pip install prometheus_client

編寫一個簡單的HTTP服務器(運行在192.168.88.50上)

import http.server
from prometheus_client import start_http_server

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello world")

if __name__  == "__main__":
    start_http_server(8000) # 在8000端口上啟動一個http服務器給promethes metrics 提供服務
    server = http.server.HTTPServer(('localhost', 8001), MyHandler)
    server.serve_forever()

訪問http://192.168.88.50:8000/metrics查看

可以把它配置到prometheus服務中，編輯prometheus.yml,添加如下配置，並重啟prometheus

  - job_name: "demo"
    static_configs:
    - targets:
      - "192.168.88.50:8000"

現在我們可以從prometheus中獲取指標數據

Counter

counter是一種會經常使用到指標類型，只增不減（除非發生重置），例如我們可以在應用程序中記錄某些事件發生的次數，通過以時序的形式存儲這些數據，我們可以輕松的了解該事件產生速率的變化。

擴展前面的代碼，增加一個新的mertic: 請求Hello World的次數

from prometheus_client import Counter

# 第1個參數是mertic name必須是唯一的, 第二個參數是對mertic的描述
REQUESTS = Counter('hello_worlds_total', 'Hello worlds requested.')

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        REQUESTS.inc()  # 遞增
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello world")

當我們在192.168.88.50上每訪問一次(curl http://127.0.0.1:8001), 則 'hello_worlds_total'的值就是增加1

使用PromQL表達式查看請求的速率

Counting Exceptions

客戶端庫不僅提供核心功能，還提供了實用的方法。在Python中計算異常我們可以直接利用count_exceptions上下文管理器和裝飾器。

import random
from prometheus_client import Counter

REQUESTS = Counter('hello_worlds_total', 'Hello worlds requested.')
EXCEPTIONS = Counter('hello_word_excpetions_total', 'Exceptions serving Hello World.')


class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        REQUESTS.inc()
　　　　 # 這里我們使用隨機數才產生異常，通過count_exceptions能夠記錄發生異常的次數，並且不干擾程序邏輯　　
        with EXCEPTIONS.count_exceptions():
            if random.random() < 0.2:
                raise Exception
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello world")

count_exceptions上下文管理器

查看異常率： rate(hello_word_excpetions_total[1m])

查看異常請求比例： rate(hello_word_excpetions_total[1m]) / rate(hello_worlds_total[1m])

我們也可以使用count_exceptions裝飾器

import random
from prometheus_client import Counter

REQUESTS = Counter('hello_worlds_total', 'Hello worlds requested.')
EXCEPTIONS = Counter('hello_word_excpetions_total', 'Exceptions serving Hello World.')


class MyHandler(http.server.BaseHTTPRequestHandler):
　　 #使用裝飾器
    @EXCEPTIONS.count_exceptions()
    def do_GET(self):
        REQUESTS.inc()
        if random.random() < 0.2:
            raise Exception
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello world")

count_exceptions

因為Prometheus 使用 64位浮點數作為值，所以計數器不是只能加1，我們可以將計數器加到任何非負數。例如

REQUESTS = Counter('hello_worlds_total', 'Hello worlds requested.')
REQUESTS.inc(2.5)

Gauge

Gauge指標類型反應了當前的狀態，它是一個可以上下浮動的實際值。而Counter我們往往關心它的增長速度。

Gauge一些常見的例子如：

隊列中消息的數量
內存使用情況
活動線程數
最后一次處理記錄的時間
最后一分鍾每秒平均請求數

我們可以使用Gauge提供的3個主要方法： inc、dec、和set。

inc 默認將值追加1， dec默認將值遞減1，set可以通過參數將值設置成我們想要的

下面通過案例展示如何使用Gauge來跟蹤 正在調用的數量和最后一次調用完成的時間

import time
from prometheus_client import Gauge

INPROGRESS = Gauge('hello_worlds_inprogress', 'Number of Hello Worlds in progress.')
LAST = Gauge('hello_world_last_time_seconds', 'The last time a Hello World was served.')

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        # 開始調用，值+1
        INPROGRESS.inc()
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")
        # 記錄當前時間戳
        LAST.set(time.time())
        # 調用完成，值-1
        INPROGRESS.dec()

這種Gauge類型的指標，我們是直接在expression browser中使用, 比如： hello_world_last_time_seconds, 確定訪問的最后時間

我們也可以通過 time() - hello_world_last_time_seconds來計算出距離最后一次訪問過了多少秒

上面的案例都是比較常見的，因此也對他們提供了使用的裝飾器函數： track_inprogress

from prometheus_client import Gauge

INPROGRESS = Gauge('hello_worlds_inprogress', 'Number of Hello Worlds in progress.')
LAST = Gauge('hello_world_last_time_seconds', 'The last time a Hello World was served.')

class MyHandler(http.server.BaseHTTPRequestHandler):
    @INPROGRESS.track_inprogress()
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")
        # 記錄當前時間戳, 可以直接使用set_to_current_time方法
        LAST.set_to_current_time()

通常counter類型的mertic都會加上后綴，如： _total、_count、sum和_bucket等，並且強烈建議加上單位名：如，(counter類型指標）處理的字節數可能會命名為：myapp_requests_processed_bytes_total. 然而guage類型的mertic並沒有這些后綴，為了避免混淆，我們也不應該使用后綴

Summary

當我們想要知道程序響應請求花費的時間或延遲時，就可以用到Summary類型的指標了。下面這個案例我們來跟蹤程序延遲。

import time
from prometheus_client import Summary

LATENCY = Summary('hello_world_latency_seconds', 'Time for a request Hello World')

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        # 記錄當前程序開始時間
        start = time.time()
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")
        # 程序運行完成時間-開始時間=花費的時間 
        LATENCY.observe(time.time() - start)

當我們查看 /mertics或發現有兩個指標

hello_world_latency_seconds_count

　　記錄的是observe調用的次數，因此可以使用表達式： rate(hello_world_latency_seconds_count[1m])返回請求的每秒速率

hello_world_latency_seconds_sum

　　記錄的是傳遞個observe值得總和，因此可以使用表達式： rate(hello_world_latency_seconds_sum[1m])返回每秒響應請求花費的時間

計算最后一分鍾的平均延遲：

　　rate(hello_world_latency_seconds_sum[1m]) / rate(hello_world_latency_seconds_count[1m])

對於延時監控，我們可以使用裝飾器來簡化代碼

from prometheus_client import Summary

LATENCY = Summary('hello_world_latency_seconds', 'Time for a request Hello World')

class MyHandler(http.server.BaseHTTPRequestHandler):
　　 #使用裝飾器
    @LATENCY.time()
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")

Histogram

Summary可以提供平均延遲。但如果是分位數（分位數告訴我們某個事件的大小比例低於給定的值），例如：0.95分位數300ms意味着95%的請求花費的時間小於300ms.

分位數在考慮實際的最終用戶體驗時很有用。如果用戶的瀏覽器向應用程序發出20個並發請求，那么決定用戶可見延遲的是請求最慢的一個

案例：第95百分位數（0.95分位數）捕獲延遲

from prometheus_client import Histogram

LATENCY = Histogram('histogram_hello_world_latency_seconds', 'Time for a request Hello World')

class MyHandler(http.server.BaseHTTPRequestHandler):
    @LATENCY.time()
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")

他將生成一組名為 histogram_hello_word_latency_seconds_bucket的時間序列

這是一組計數器，柱狀圖有一組桶，例如1毫秒、10毫秒和25毫秒，它們跟蹤每個桶中的事件數量

現在我們要計算： 95%的訪問請求延遲分布在哪個范圍

histogram_quantile(0.95, rate(histogram_hello_world_latency_seconds_bucket[1m]))

Buckets

從圖中可以發現默認的buckets延遲范圍是0.005秒-10秒，這是捕獲web應用程序的典型延遲范圍。如果我們任務默認值不適合自己的用例，我們可以提供自己的buckets, 如下

# buckets里面的值要從小到大, 一般建議配置10個
LATENCY = Histogram('histogram_hello_world_latency_seconds',
                    'Time for a request Hello World',
                    buckets=[0.0001, 0.0002, 0.0005, 0.001, 0.01, 0.1])

觀察到每個bucket並不是只屬於它自己的事件計數，它包括了比它小的 bukect計數，一直到+Inf(表示事件總數)，這也就是為什么標簽是 le(小於或等於)

在分析上圖：

　　+Inf 285 表示一共有285個請求

延遲<=0.0002秒的請求有144個

延遲<=0.001秒的請求有284個

我們也可以准確的計算出請求超過5毫秒（0.005）的請求比例

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 prometheus 常用監控指標 prometheus對硬盤的監控指標基於 prometheus 的微服務指標監控監控框架 - prometheus - 參數指標 python編寫prometheus的監控指標 Prometheus部署監控Minio指標 prometheus自定義監控指標——實戰 k8s prometheus監控指標 Prometheus監控指標的label注入方法線程池主要指標和監控-prometheus