Zabbix housekeeper processes more than 75% busy

本文轉載自查看原文 2020-08-27 11:36 6322 Zabbix監控/ Zabbix housekeeper processes more than 75% busy

原因分析

為了防止數據庫持續增大，Zabbix有自動刪除歷史數據的機制，即housekeeper，而在頻繁清理歷史數據的時候，MySQL數據庫可能出現性能降低的情況，此時就會告警。

一般來說，Zabbix都會監控Zabbix Server本身。如下所示，我們可以分析“Zabbix server: Utilization of housekeeper internal processes, in %”圖形，通過分析，我們可以看到Zabbix housekeeper processes在10點32分的時候， housekeeper進程突然開始繁忙。

拉長時間范圍的截圖如下所示：

如需進一步分析，那么就必須查看MySQL的慢查詢日志：

 
          mysql> show variables like '%slow_query%'; 
          +---------------------+-------------------------------------+ 
          | Variable_name       | Value                               | 
          +---------------------+-------------------------------------+ 
          | slow_query_log      | ON                                  | 
          | slow_query_log_file | /mysql_data/mysql/xxxx-slow.log | 
          +---------------------+-------------------------------------+ 
          2 rows in set (0.01 sec) 
         

#注意，如果沒有設置過的話，這里默認是UTC時間。所以跟東八區時間有所區別。

 
          # Time: 2020-08-26T02:34:56.354162Z 
          # User@Host: zabbix[zabbix] @ localhost []  Id: 345463 
          # Query_time: 13.832335  Lock_time: 0.000088 Rows_sent: 0  Rows_examined: 5000 
          SET timestamp=1598409282; 
          delete from history where itemid=37078 limit 5000; 
          # Time: 2020-08-26T02:35:00.377783Z 
          # User@Host: zabbix[zabbix] @ localhost []  Id: 345463 
          # Query_time: 4.023518  Lock_time: 0.000126 Rows_sent: 0  Rows_examined: 5000 
          SET timestamp=1598409296; 
          delete from history where itemid=37079 limit 5000; 
          # Time: 2020-08-26T02:35:36.848120Z 
          # User@Host: zabbix[zabbix] @ localhost []  Id: 345463 
          # Query_time: 21.513432  Lock_time: 0.000094 Rows_sent: 0  Rows_examined: 5000 
          SET timestamp=1598409315; 
          delete from history where itemid=37099 limit 5000; 
          # Time: 2020-08-26T02:35:46.705206Z 
          # User@Host: zabbix[zabbix] @ localhost []  Id: 345463 
          # Query_time: 9.856468  Lock_time: 0.000124 Rows_sent: 0  Rows_examined: 5000 
          SET timestamp=1598409336; 
          delete from history where itemid=37100 limit 5000; 
          # Time: 2020-08-26T02:36:43.856421Z 
          # User@Host: zabbix[zabbix] @ localhost []  Id: 345463 
          # Query_time: 38.186585  Lock_time: 0.000039 Rows_sent: 0  Rows_examined: 5000 
          SET timestamp=1598409365; 
          delete from history where itemid=38789 limit 5000; 
          # Time: 2020-08-26T02:36:59.432174Z 
          # User@Host: zabbix[zabbix] @ localhost [127.0.0.1]  Id: 345563 
          # Query_time: 8.542213  Lock_time: 0.000084 Rows_sent: 20  Rows_examined: 7298 
          SET timestamp=1598409410; 
          SELECT DISTINCT e.eventid,e.clock,e.ns,e.objectid,e.acknowledged,er1.r_eventid FROM events e LEFT JOIN event_recovery er1 ON er1.eventid=e.eventid WHERE e.sou 
          rce='0' AND e.object='0' AND e.objectid=26811 AND e.eventid<='3437835' AND e.value=1 ORDER BY e.eventid DESC LIMIT 20; 
          # Time: 2020-08-26T02:37:02.317422Z 
          # User@Host: zabbix[zabbix] @ localhost []  Id: 345463 
          # Query_time: 18.460853  Lock_time: 0.000101 Rows_sent: 0  Rows_examined: 5000 
          SET timestamp=1598409403; 
          delete from history where itemid=38790 limit 5000; 
         

另外，Zabbix Server也會將慢查詢SQL寫入zabbix_server.log中，如下所示。

# grep "slow query" zabbix_server.log

通過分析，我們可以發現MySQL在刪除history和histry_unit數據。經過分析，這里突然出現 housekeeper進程繁忙，是因為我刪除了模板Zabbix template for Microsoft SQL Server，並勾選了Clear，所以導致Zabbix Server需要刪除大量的數據。當然這個只是一個誘因，本身history變得非常大才是一個重要的原因。你可以通過下面腳本，查看一下這些表的Size大小信息。

 
          SELECT TABLE_SCHEMA 
              ,  TABLE_NAME 
              , (DATA_LENGTH/1024/1024)     AS DATA_SIZE_MB  
              , (INDEX_LENGTH/1024/1024)  AS INDEX_SIZE_MB 
              , ((DATA_LENGTH+INDEX_LENGTH)/1024/1024) AS TABLE_SIZE_MB 
              , TABLE_ROWS  
          FROM INFORMATION_SCHEMA.TABLES  
          WHERE table_schema = 'zabbix' 
          ORDER BY TABLE_SIZE_MB ASC; 
         

解決方案

一般短時間出現這樣的告警可以忽略，如果一直出現這個告警的話，我們就應該調整參數HousekeepingFrequency和MaxHousekeeperDelete。

Zabbix 5.x下面，默認情況下HousekeepingFrequency值為1，表示一小時執行一次。 MaxHousekeeperDelete表示一次刪除5000條記錄。如下所示

 
          # grep -C 1 HousekeepingFrequency /etc/zabbix/zabbix_server.conf 
            
          ### Option: HousekeepingFrequency 
          #       How often Zabbix will perform housekeeping procedure (in hours). 
          #       Housekeeping is removing outdated information from the database. 
          #       To prevent Housekeeper from being overloaded, no more than 4 times HousekeepingFrequency 
          #       hours of outdated information are deleted in one housekeeping cycle, for each item. 
          #       To lower load on server startup housekeeping is postponed for 30 minutes after server start. 
          #       With HousekeepingFrequency=0 the housekeeper can be only executed using the runtime control option. 
          #       In this case the period of outdated information deleted in one housekeeping cycle is 4 times the 
          -- 
          # Default: 
          # HousekeepingFrequency=1 
            
            
          # grep -C 1 MaxHousekeeperDelete  /etc/zabbix/zabbix_server.conf 
            
          ### Option: MaxHousekeeperDelete 
          #       The table "housekeeper" contains "tasks" for housekeeping procedure in the format: 
          #       [housekeeperid], [tablename], [field], [value]. 
          #       No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value]) 
          #       will be deleted per one task in one housekeeping cycle. 
          -- 
          # Default: 
          # MaxHousekeeperDelete=5000 
         

HousekeepingFrequency介紹：

Zabbix 執行housekeeping 的頻率 (單位為小時)。

housekeeping負責從數據庫中刪除過期的信息。

注意: 為了防止 housekeeper 負載過大 (例如, 當歷史和趨勢周期大大減小時), 對於每一個監控項，不會在一個housekeeping周期內刪除超過4倍HousekeepingFrequency 的過期數據。因此, 如果 HousekeepingFrequency 是 1小時, 一個周期內不會刪除超過4小時的過期信息 (從最舊的數據開始) 。

備注: 為降低 server壓力， housekeeping將在server啟動以后，延遲30分鍾執行。因此, 如果 HousekeepingFrequency 是1小時,serverg啟動30分后執行第一次 housekeeping , 然后按1小時為周期重復執行。從Zabbix 2.4.0以后有了這種延遲行為。

從Zabbix 3.0.0開始，可以設置HousekeepingFrequency為0來禁止自動housekeeping。此時 housekeeping 只能通過 housekeeper_execute 啟動，在一個housuekeeping周期內刪除的過期信息時長為從最后一次housekeeping以來到配置周期的4倍,不少於4小時且不大於4天。

MaxHousekeeperDelete介紹：

從Zabbix 1.8.2 開始支持該參數，如果MaxHousekeeperDelete設置為0，表示不限制刪除的行數。當然不建議這么做。它僅適用於那些准備刪除的歷史數據和歷史趨勢數據。

一般是通過將housekeeper進程做歸檔的時間間隔調大，一次刪除數據的量放大來解決問題。至於這個值到底多大合適，沒有統一標准。要根據實際情況、測試才能給出一個合適的值。

 
          HousekeepingFrequency=6           #間隔時間6小時 
          MaxHousekeeperDelete=10000        #最大刪除量

在這個案例中，將MaxHousekeeperDelete調整為100000，發現delete操作反而慢了許多。如下所示：

836378:20200826:161213.441 slow query: 773.254950 sec, "delete from history where itemid=45251 limit 100000"

836378:20200826:162435.978 slow query: 742.537260 sec, "delete from history where itemid=46694 limit 100000"

836378:20200826:163329.011 slow query: 532.932137 sec, "delete from history where itemid=51313 limit 100000"

836378:20200826:163842.539 slow query: 313.528311 sec, "delete from history where itemid=52664 limit 100000"

如果我將MaxHousekeeperDelete調整為10000的話，發現delete的性能還是要快一些。所以，這個不妨多測試驗證一下。

943980:20200826:233157.246 slow query: 5.393617 sec, "delete from history where itemid=37769 limit 10000"

943980:20200826:233202.914 slow query: 5.667551 sec, "delete from history where itemid=38407 limit 10000"

943980:20200826:233208.044 slow query: 5.129767 sec, "delete from history where itemid=41283 limit 10000"

943980:20200826:233217.462 slow query: 7.011403 sec, "delete from history where itemid=37770 limit 10000"

943980:20200826:233222.516 slow query: 5.053935 sec, "delete from history where itemid=38408 limit 10000"

943980:20200826:233227.286 slow query: 4.769753 sec, "delete from history where itemid=41284 limit 10000"

另外，還有一些方法，例如減少歷史數據的保留時間、對history等大表進行分區，也可以避免或減少這個告警出現的概率。根據個人的經驗，如果像history表變得非常大以后，即使調整上面參數，其實效果並不明顯。需要通過分區或手工清理歷史數據來解決。這樣效果才顯著。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。