3. prometheus遠程寫參數優化

本文轉載自查看原文 2020-04-02 11:04 1430 WAL/ 監控/ prometheus/ remoteWrite

一、概述
二、遠程寫入特征
- 2.1 整體結構
- 2.2 重試機制
- 2.3 內存使用
三、參數
- 3.1 capacity
- 3.2 max_shards
- 3.3 min_shards
- 3.4 max_samples_per_send
- 3.5 batch_send_deadline
- 3.6 min_backoff
- 3.7 max_backoff

一、概述

prometheus可以通過遠程存儲來解決自身存儲的瓶頸，所以其提供了遠程存儲接口，並可以通過過配置文件進行配置（prometheus.yml）。一般情況下我們使用其默認的配置參數，但是為了滿足特定的應用場景需要對其進行優化，本章節介紹可通過遠程寫入配置使用的調整參數，如下所示：

 1 # The URL of the endpoint to send samples to.
 2 url: <string>
 3 
 4 # Timeout for requests to the remote write endpoint.
 5 [ remote_timeout: <duration> | default = 30s ]
 6 
 7 # List of remote write relabel configurations.
 8 write_relabel_configs:
 9   [ - <relabel_config> ... ]
10 
11 # Sets the `Authorization` header on every remote write request with the
12 # configured username and password.
13 # password and password_file are mutually exclusive.
14 basic_auth:
15   [ username: <string> ]
16   [ password: <string> ]
17   [ password_file: <string> ]
18 
19 # Sets the `Authorization` header on every remote write request with
20 # the configured bearer token. It is mutually exclusive with `bearer_token_file`.
21 [ bearer_token: <string> ]
22 
23 # Sets the `Authorization` header on every remote write request with the bearer token
24 # read from the configured file. It is mutually exclusive with `bearer_token`.
25 [ bearer_token_file: /path/to/bearer/token/file ]
26 
27 # Configures the remote write request's TLS settings.
28 tls_config:
29   [ <tls_config> ]
30 
31 # Optional proxy URL.
32 [ proxy_url: <string> ]
33 
34 # Configures the queue used to write to remote storage.
35 queue_config:
36   # Number of samples to buffer per shard before we block reading of more
37   # samples from the WAL. It is recommended to have enough capacity in each
38   # shard to buffer several requests to keep throughput up while processing
39   # occasional slow remote requests.
40   [ capacity: <int> | default = 500 ]
41   # Maximum number of shards, i.e. amount of concurrency.
42   [ max_shards: <int> | default = 1000 ]
43   # Minimum number of shards, i.e. amount of concurrency.
44   [ min_shards: <int> | default = 1 ]
45   # Maximum number of samples per send.
46   [ max_samples_per_send: <int> | default = 100]
47   # Maximum time a sample will wait in buffer.
48   [ batch_send_deadline: <duration> | default = 5s ]
49   # Initial retry delay. Gets doubled for every retry.
50   [ min_backoff: <duration> | default = 30ms ]
51   # Maximum retry delay.
52   [ max_backoff: <duration> | default = 100ms ]

View Code

二、遠程寫入特征

我們本節主要探討queue_config部分參數（其它參數比較簡單，一看就知道什么意思，沒有可優化的地方）。

2.1 整體結構

每個遠程寫入目標都會啟動一個內存寫隊列（shards），這個隊列從WAL中緩存數據（關於WAL可以參考存儲部分：https://github.com/prometheus/prometheus/blob/master/docs/storage.md，原理類似於hbase中的WAL），通過隊列去將指標數據寫到有遠程存儲服務中,數據流如下所示：

1        |-->  queue (shard_1)   --> remote endpoint
2  WAL --|-->  queue (shard_...) --> remote endpoint
3        |-->  queue (shard_n)   --> remote endpoint

2.2 重試機制

這需要注意的是，當一個分片備份並填滿隊列時，Prometheus將阻止從WAL中讀取數據到任何分片。（關於這點就涉及到對以上參數優化，后面參數capacity部分講解）

遠程端點寫入失敗會進行重試操作，並且保證數據不會丟失，除非遠程端點保持關閉狀態超過2小時，因為2小時后，WAL將被壓縮，尚未發送的數據將丟失。重試時間見下面參數：min_backoff和max_backoff。

2.3 內存使用

使用遠程寫入會增加Prometheus的內存占用量。大多數用戶報告的內存使用量增加了約25％，但這取決於數據的形狀。對於WAL中的每個系列，遠程寫代碼都會緩存系列ID到標簽值的映射，從而顯着增加內存使用率。

除了系列緩存之外，每個分片及其隊列還會增加內存使用量。分片內存與number of shards * (capacity + max_samples_per_send)成正比。當進行優化調整時，請考慮減少max_shards增加的數量，同時提高capacity和max_samples_per_send參數的大小從而避免無意間耗盡內存。默認capacity和 max_samples_per_send的取值將使得每每個shard使用內存小於100kb。

三、參數

3.1 capacity

定義：每個內存隊列（shard：分片）的容量。

一旦WAL被阻塞（造成阻塞的原因請看2.1），就無法將樣本附加到任何分片，並且所有吞吐量都將停止。所以在大多數情況下，單個隊列容量應足夠打以避免阻塞其他分片，但是太大的容量可能會導致過多的內存消耗，並導致重新分片期間清除隊列的時間更長。

容量建議：將容量設置為3-10倍max_samples_per_send。

3.2 max_shards

顧名思義，最大的分片數（即隊列數），也可以理解為遠程寫的並行度。peometheus遠程寫的時候會使用所有的分片，只有在寫隊列落后於遠程寫的速度，使用的隊列數會達到max_shards,目的在於提高遠程寫的吞吐量。

PS：在操作過程中，Prometheus將根據傳入的采樣率，未發送的未處理樣本數以及發送每個樣本所花費的時間，連續計算要使用的最佳分片數。（實際的分片數是動態調整的）

3.3 min_shards

最小分片配置Prometheus使用的最小分片數量，並且是遠程寫入開始時使用的分片數量。如果遠程寫入落后，Prometheus將自動擴大分片的數量，因此大多數用戶不必調整此參數。但是，增加最小分片數將使Prometheus在計算所需分片數時避免在一開始就落后。

3.4 max_samples_per_send

定義：每次遠程寫發送的最大指標數量，即批處理；

這個值依賴於遠程存儲系統，對於一些系統而言，在沒有顯著增加延遲的情況下發送更多指標數據而運行良好，然而，對於另外一些系統而言，每次請求中發送大量指標數據可能導致其出現故障，使用的默認值是適用於絕大多數系統的。

3.5 batch_send_deadline

定義：單一分片批量發送指標數據的最大等待時間；

即使排隊的分片尚未達到max_samples_per_send，也會發送請求。對於對延遲不敏感的小批量系統，可以增加批量發送的截止時間，以提高請求效率。

3.6 min_backoff

定義：遠程寫失敗的最小等待時間；

min_backoff是第一次的重試等待時間，第二次等待時間是其2倍，以此類推，直到max_backoff的值；

3.7 max_backoff

定義：遠程寫失敗的最大等待時間；

參考文檔：https://prometheus.io/docs/practices/remote_write/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 3. beeGo 自己寫Controller 和請求數據處理 prometheus、node_exporter、cAdvisor常用參數 3.如何實現HTTP請求中，URL，body,header參數化 Tcp參數優化超參數優化 kudu參數優化設置 nginx 配置參數優化 AlexNet的參數優化 TCP網絡參數優化 tomcatJVM參數優化