Prometheus監控 rate與irate的區別


對官網文檔的解讀

irate和rate都會用於計算某個指標在一定時間間隔內的變化速率。但是它們的計算方法有所不同:irate取的是在指定時間范圍內的最近兩個數據點來算速率,而rate會取指定時間范圍內所有數據點,算出一組速率,然后取平均值作為結果。

所以官網文檔說:irate適合快速變化的計數器(counter),而rate適合緩慢變化的計數器(counter)。

根據以上算法我們也可以理解,對於快速變化的計數器,如果使用rate,因為使用了平均值,很容易把峰值削平。除非我們把時間間隔設置得足夠小,就能夠減弱這種效應。
試驗

用grafana做了一個試驗,創建一個測試的dashboard, 分別用 irate 和 rate 來監控CPU使用率指標,時間間隔分別用10m, 5m, 2m, 1m 。其中間隔為10分鍾的表達式如下:

sum(irate(process_cpu_seconds_total[10m])) * 100

sum(rate(process_cpu_seconds_total[10m])) * 100

下圖是間隔10分鍾的結果,可以看到,irate的曲線比較曲折,而rate的曲線相對平緩:

 


下圖是間隔5分鍾的結果:

 


下圖是間隔2分鍾的結果,兩個曲線重合了:

 


下圖是間隔1分鍾的結果,顯示沒有數據,應該是在這個時間間隔找不到一組數據來計算,所以沒有數據:

 


 
附:官網文檔
irate()

irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range vector. This is based on the last two data points. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.

The following example expression returns the per-second rate of HTTP requests looking up to 5 minutes back for the two most recent data points, per time series in the range vector:

irate(http_requests_total{job="api-server"}[5m])

irate should only be used when graphing volatile, fast-moving counters. Use rate for alerts and slow-moving counters, as brief changes in the rate can reset the FOR clause and graphs consisting entirely of rare spikes are hard to read.

Note that when combining irate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a irate() first, then aggregate. Otherwise irate() cannot detect counter resets when your target restarts.
rate()

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range's time period.

The following example expression returns the per-second rate of HTTP requests as measured over the last 5 minutes, per time series in the range vector:

rate(http_requests_total{job="api-server"}[5m])

rate should only be used with counters. It is best suited for alerting, and for graphing of slow-moving counters.

Note that when combining rate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a rate() first, then aggregate. Otherwise rate() cannot detect counter resets when your target restarts.

---------------------  
作者:東東~  
來源:CSDN  
原文:https://blog.csdn.net/palet/article/details/82763695  
版權聲明:本文為博主原創文章,轉載請附上博文鏈接!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM