InfluxDB配置優化

本文轉載自查看原文 2022-01-17 17:00 3864 數據庫

背景

Influxdb是db-engines排名第一的時序數據庫，目前在Apsaradb團隊有一定規模使用，本文主要介紹在Influxdb使用過程中做的一些調優和運維過程中的問題做分享。主要分為配置調優和問題排查技巧兩個方面介紹。

配置調優

對 InfluxDB 的配置優化，主要從一些配置參數出發提高InfluxDB的性能。

性能優化

Influxdb的存儲引擎是TSM Tree, 基本上整體思想和LSM Tree類似，做了一些時序場景下數據存儲結構上的建模優化。cache-snapshot-memory-size值需要調大。

[data]
  # CacheSnapshotMemorySize is the size at which the engine will
  # snapshot the cache and write it to a TSM file, freeing up memory
  cache-snapshot-memory-size = 562144000

cache-snapshot-memory-size 這個大小控制的是 LSM 中的 cache 的大小，當 cache 達到一定閾值后，cache 會落盤生成tsm file, 此時的 tsm file 的level為 level 0 , 兩個相同 level 的 tsm file 會進行 compact 生成一個level + 1的 tsm file, 既兩個level 0的tsm file會生成一個level 1的tsm file，這種設計既TSM tree的寫入放大問題。

由於Influxdb是固定兩個低level 文件compact成一個高一級 level的tsm file，所以如果cache size越小，dump成tsm file的頻率越高，進而做compact的頻率也越高，造成寫入放大越顯著，當寫入的頻率很高的場景下，會導致influxdb的吞吐下降非常明顯。

compact頻率變高后，Influxdb寫入放大很重要一個原因是TSM file數據做了編碼壓縮磁盤占用空間，當compact時，需要對數據decode，會帶來明顯的性能損耗。

Influxdb為了優化這個問題，在做compact時分為optimize compact和full compact兩種類型。在full compact場景下，首先會對tsm file中的block做decode，然后按照每個Block存儲的point數量，將decode的Point value按照時間順序重新encode成Block，然后寫入到新的TSM file中，其中的性能損耗會非常顯著。而在optimize場景下，不會讀取block內部的數據，會對多個block拼接，減少性能消耗。

目前部分場景下Influxdb做compact還是會選擇 full compact。optimize compact雖然會提升compact速度和減少compact的資源消耗，但是會引起查詢放大問題：需要從多個block中才能獲取到需要返回的數據。

說明

cache-snapshot-memory-size 值理論上是越大越好，但是需要關注你的硬件配置。

cache-snapshot-memory-size 值跟當前並發寫入 tags 數量有關系，如果你的tags數很大的情況下，一定要調大這個值，如果tags數不多，只是少數tag的數據寫入頻率很高，那么這個值稍低也不會對性能有太大影響。

超時設置

這個問題主要是在Influxdb的老版本(低於1.5版本)存在，且有一定風險。Influxdb對控制內存使用量上的設計比較粗糙。建議生產庫上不要隨意執行sql，一旦sql導致內存使用過多，容易導致Influxdb oom。而且在目前Inverted Index的設計中，Influxdb oom kill掉之后重新啟動速度很慢，因為啟動過程要重新遍歷tsm file生成內存中的Inverted Index，加上數據都是進行了encode，在數據量比較大的情況下，啟動速度非常慢，所以建議上述的讀取數量開關打開，因為默認情況下是不會發生出現那么大請求，如果有這種特殊sql存在，請酌情考慮。

[coordinator]
  # The maximum time a query will is allowed to execute before being killed by the system.  This limit
  # can help prevent run away queries.  Setting the value to 0 disables the limit.
  query-timeout = "120s"
  
  # The maximum number of points a SELECT can process.  A value of 0 will make
  # the maximum point count unlimited.  This will only be checked every 10 seconds so queries will not
  # be aborted immediately when hitting the limit.
  max-select-point = 1000000
  
  # The maximum number of series a SELECT can run.  A value of 0 will make the maximum series
  # count unlimited.
  max-select-series = 1000000
  
[http]
  # The default chunk size for result sets that should be chunked.
  max-row-limit = 1000000

InfluxDB對控制內存使用量主要cost在幾個方面：

返回結果集很大的情況下，數據會cache在內存中，等計算完成后，統一返回給Client。Influxdb內部算子是通過pipeline的方式流式交互，但是返回給Client需要Client在調用的時候傳遞"chunked"參數才能實現pipeline方式返回。所以建議max-select-point，query-timeout，max-row-limit一定要打開，防止一些誤操作查詢。不過1.6版本后據說有了kill process的功能，這樣可以及時kill掉，這塊本人沒有具體調研，這里不做介紹。
如果sql沒有對tags filter condition，那么每條Sql都會在內存中拷貝一份全量的Series，所以sql query 的 filter condition需要定義明確。

InfluxDB oom的問題在正常的時序使用場景下不會發生，但是一不小心就可以踩進"坑"里。

數據層面

max-series-per-database可調整為0，如注釋所示：該參數控制每個db的最大的series數量。

max-values-per-tag可調整為0，如注釋所示：該參數控制每個tag的tag_value數量。

[data]
  # The maximum series allowed per database before writes are dropped.  This limit can prevent
  # high cardinality issues at the database level.  This limit can be disabled by setting it to
  # 0.
  max-series-per-database = 0
  
  # The maximum number of tag values per tag that are allowed before writes are dropped.  This limit
  # can prevent high cardinality tag values from being written to a measurement.  This limit can be
  # disabled by setting it to 0.
  # max-values-per-tag = 0

Influxdb這兩個參數需要控制根源是來自於內部設計中倒排索引的實現，如果Influxdb使用方數據結構設計不合理，會導致內存過大。所以對於使用方建議這兩個參數不要調整為0，為使用估計一個series的數量。

安全層面

如果線上環境使用，reporting-disabled這個要配置上，不僅整個系統的安全性得到了提升，同時也有助於風控。

# Once every 24 hours InfluxDB will report usage data to usage.influxdata.com
# The data includes a random ID, os, arch, version, the number of series and other
# usage data. No data from user databases is ever transmitted.
# Change this option to true to disable reporting.
reporting-disabled = true

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 InfluxDB的安裝與配置 InfluxDB的安裝與配置 InfluxDB安裝及配置 mac下配置influxdb influxDB硬件配置指南 influxdb配置說明 InfluxDB安裝及配置 influxdb配置文件詳解 influxdb配置文件詳解 windows版influxDB安裝與配置