Zabbix template for Microsoft SQL Server介紹
這里介紹Zabbix下監控Microsoft SQL Server數據庫非常好用的一個模板,模板名為“Zabbix template for Microsoft SQL Server”,此模板的下載地址為:
Zabbix share的地址:
https://share.zabbix.com/databases/microsoft-sql-server/template-for-microsoft-sql-server
GitHub的地址:
https://github.com/MantasTumenas/Zabbix-template-for-Microsoft-SQL-Server
下面的實驗、測試均為Zabbix 5.x,其它Zabbix版本沒有經過測試驗證。另外,建議使用GitHub下Microsoft SQL Server目錄下的模板。感覺這個模板遇到的問題比較少,如果你使用Zabbix share下的模板,問題多到煩死你,除非你有能力Fix掉這些問題。
解壓GitHub下的模板文件(Zabbix-template-for-Microsoft-SQL-Server-master.zip),你就會發現下面分三個(Zabbix share的只有兩個目錄)目錄,分別如下命名:
Microsoft SQL Server #分支版本,這里部署的是這個模板。
Without SQL instance discovery #適用於單實例SQL Server監控
With SQL instance discovery #適用於多實例SQL Server監控
Zabbix share下模板(Zabbix Template for Microsoft SQL Server.zip)的目錄:
Without SQL instance discovery #適用於單實例SQL Server監控
With SQL instance discovery #適用於多實例SQL Server監控
Microsoft SQL Server下還有下面個目錄,具體如下所示:
Documentation #下面是Zabbix template for Microsoft SQL Server的文檔資料,絕對是我見過的Zabbix模板里面最詳細的資料
Scripts #下面是Powe12rShell監控腳本
Template #下面是Template模板
User parameters #下面有一個文件userparams.conf,里面定義了User parameters參數的一些樣例
Zabbix Value Mapping #下面有SQL Agent Job status.xml和SQL Database status.xml這兩個文件。里面定義了一些映射值。
這個模板包含這些功能和特征,如下所示:
Features
• MS SQL performance counters.
• MS SQL instance Low Level Discovery.
• MS SQL database Low Level Discovery.
• MS SQL agent job Low Level Discovery.
• MS SQL database backup monitoring.
• MS SQL database mirroring monitoring.
• MS SQL Always On monitoring.
• MS SQL Log Shipping monitoring.
支持的版本,詳細信息請見下面介紹:
Supported versions
Tested on Microsoft SQL Server 2012, 2014 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. For the extensive overview on the performance counters difference between MS SQL 2008 and MS SQL 2012 you can read here (https://blog.dbi-services.com/sql-server-2012-new-perfmon-counters/).
Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,<param>]) may be unsupported. The template was started on Zabbix 2.4.0 but after each new Zabbix version, objects were modified or new things were added.
注意:這里測試的環境為Zabbix 5.x, 所以這個模板也是支持Zabbix 5.x的,請知曉!
部署過程
Without SQL instance discovery模板部署
官方文檔的部署步驟:
1. Import templates via Configuration >> Templates:
• “Template Microsoft SQL Server DE Tier 3.xml”
• “Template Microsoft SQL Server DE Tier 2.xml”
• “Template Microsoft SQL Server DE Tier 1.xml”
• “Template Microsoft SQL Server SA Tier 3.xml”
2. Import value mappings via Administration >> General >> Value mapping:
• “SQL Agent Job status.xml”
• “SQL Database status.xml”
3. Copy catalog MSSQL with PowerShell scripts (*.ps1) to a location a Zabbix Agent can access (by default “C:\...\Zabbix\bin\”).
4. Copy 3 *.conf files from catalog “User parameters” to a location a Zabbix Agent can access (by default “C:\...\Zabbix\”).
5. Update “zabbix_agentd.win.conf”:
• add line “Include= C:\Program Files\Zabbix\mssql.agent.userparams.conf”.
• add line “Include= C:\Program Files\Zabbix\mssql.backup.userparams.conf”.
• add line “Include= C:\Program Files\Zabbix\mssql.basic.userparams.conf”.
6. Grant rights for Zabbix Agent service account. It needs read rights on tables:
• msdb.dbo.sysjobhistory
• msdb.dbo.sysjobs
• master.sys.databases
• msdb.dbo.backupset
• msdb.dbo.log_shipping_monitor_secondary.
7. By default, Zabbix Agent service account is NT AUTHORITY\SYSTEM which is already in SQL Server. If you need to monitor mirrored databases or databases in Always On, you will have to give Zabbix Agent’s service account (NT AUTHORITY\SYSTEM by default) sysadmin rights. More about it here.
8. Restart Zabbix Agent.
9. Depending on your SQL server edition and monitoring requirements select and add templates to a host.
10. Modify macros in templates according to your needs. Default values are below:
Macros |
Macros meaning |
Value |
Meaning |
Trigger |
{$SYSDBFTIME1} |
Sys db full backup time value 1 |
25 |
25 hours |
Information |
{$SYSDBFTIME2} |
Sys db full backup time value 2 |
50 |
50 hours |
Low |
{$SYSDBFTIME3} |
Sys db full backup time value 3 |
75 |
75 hours |
Medium |
{$UDBDTIME1} |
User db diff backup time value 1 |
48 |
2 days |
Information |
{$UDBDTIME2} |
User db diff backup time value 2 |
72 |
3 days |
Low |
{$UDBDTIME3} |
User db diff backup time value 3 |
96 |
4 days |
Medium |
{$UDBFTIME1} |
User db full backup time value 1 |
168 |
7 days |
Information |
{$UDBFTIME2} |
User db full backup time value 2 |
192 |
8 days |
Low |
{$UDBFTIME3} |
User db full backup time value 3 |
216 |
9 days |
Medium |
{$UDBLTIME1} |
User db log backup time value 1 |
30 |
30 minutes |
Information |
{$UDBLTIME2} |
User db log backup time value 2 |
60 |
60 minutes |
Low |
{$UDBLTIME3} |
User db log backup time value 3 |
90 |
90 minutes |
Medium |
{$EVENTLOGTIME} |
Event log recovery time value |
28h |
28 hours |
Medium |
{$DAYS} |
Maintenance job time value |
7 |
7 days |
None |
11. “Template Microsoft SQL Server SA Tier 3.xml” lets you discover SQL agent jobs. Discovery rules consist of:
• “SQL Server Agent Discovery” – discover SQL Agent service.
• “SQL Server Agent Jobs P1 Discovery” – discover SQL Agent jobs.
• “SQL Server Agent Jobs P2 Discovery” – discover SQL Agent jobs.
• “SQL Server Agent Jobs P3 Discovery” – discover SQL Agent jobs.
12. Difference between “SQL Server Agent Jobs P1 / P2 / P3 Discovery” are triggers. They can be configured differently. For example:
• “SQL Server Agent Jobs P1 Discovery” – alerts after trigger failed. Good for monitoring jobs, which need immediate attention. Like failed job “CHECKDB”.
• “SQL Server Agent Jobs P2 Discovery” – alerts after trigger failed two times. Good for monitoring jobs, which need attention, but not immediate. For example, job “DB LOG BACKUP” failed 1st time, but it will run again in 30 minutes. If 2nd time it fails again, then alert is raised.
• “SQL Server Agent Jobs P3 Discovery” – alerts after trigger failed but with additional conditions. Good for monitoring jobs, which do not need immediate attention. Like failed job “IndexOptimize”. Alert will be raised only during Monday – Friday, during 08:00 – 16:00. If you want to change day and hour parameters, you can do it directly in triggers.
• In mssql.agent.userparams.conf I placed 2 additional user parameters. In case you need to create your own custom items for monitoring P(riority)4 and P(priority)5 jobs.
13. Every discovery rule “SQL Server Agent Jobs P1 / P2 / P3 Discovery” has its filters there you can enter the job name, you want to associate with a selected rule:
If you leave a filter empty, all agent jobs will be discovered. To avoid that, I entered a simple place holder for every rule – ENTER_JOB_NAME.
下面結合個人的操作用中文簡單描述一下:
1:在“配置”-> "模板“下導入下面四個模板:
• “Template Microsoft SQL Server DE Tier 3.xml”
• “Template Microsoft SQL Server DE Tier 2.xml”
• “Template Microsoft SQL Server DE Tier 1.xml”
• “Template Microsoft SQL Server SA Tier 3.xml”
注意,從Zabbix share上下載的模板,只有下面兩個模板:
“Template SQL Server Instance 0 DE.xml”
“Template SQL Server Instance 0 SA.xml”
另外,默認情況下,這些模板位於Templates下面,個人喜歡將其分配到Templates/Databases組下面,方便日后的使用和管理! 步驟1只需要做一次就好了。這個是針對Zabbix Server而言。
2:在“管理”(Administration)->“一般”(General)-> "值映射"(Value mapping)下面導入值映射
“SQL Agent Job status.xml”
“SQL Database status.xml”
注意:步驟2也是只需做一次即可。
3:將Scirpt目錄下的MSSQL目錄(里面有一些PowerShell腳本)拷貝到Zabbix Agent能訪問的路徑(默認情況下,將其拷貝到“C:\...\Zabbix\bin\”下面),這里將其拷貝到C:\zabbix\bin\win64下面。當然你可以根據實際情況進行調整設定。也可以按照官方文檔設定。
4:將User parameters目錄下的3個配置文件拷貝到Zabbix Agent能訪問的路徑下(默認情況下為“C:\...\Zabbix\”),這里我將其拷貝到C:\zabbix\conf目錄下面。
由於第三步,我將這些PowerShell腳本放在C:\zabbix\bin\win64\MSSQL,所以,這三個參數文件(mssql.agent.userparams.conf、mssql.backup.userparams.conf、mssql.basic.userparams.conf)很多配置信息必須修改。這個根據實際情況調整,如下例子所示:
例子(修改前)
# User parameter to get agent name. Tier 3 template.
UserParameter=tier3.agent.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\zabbix\bin\MSSQL\DiscoveryDatabaseAgent\Discovery.mssql.instanceagentname.ps1"
# User parameter to get job name. Priority 5. Tier 3 template.
UserParameter=tier3.jobsp5.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\zabbix\bin\MSSQL\DiscoveryDatabaseAgent\Discovery.mssql.jobname.ps1"
例子(修改后)
# User parameter to get agent name. Tier 3 template.
UserParameter=tier3.agent.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\zabbix\bin\win64\MSSQL\DiscoveryDatabaseAgent\Discovery.mssql.instanceagentname.ps1"
# User parameter to get job name. Priority 5. Tier 3 template.
UserParameter=tier3.jobsp5.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\zabbix\bin\win64\MSSQL\DiscoveryDatabaseAgent\Discovery.mssql.jobname.ps1"
5:更新zabbix_agentd.conf下的配置
• add line “Include= C:\Program Files\Zabbix\mssql.agent.userparams.conf”.
• add line “Include= C:\Program Files\Zabbix\mssql.backup.userparams.conf”.
• add line “Include= C:\Program Files\Zabbix\mssql.basic.userparams.conf”.
個人的設置如下,這個肯定根據具體實際情況進行調整。
Include=C:\zabbix\conf\mssql.agent.userparams.conf
Include=C:\zabbix\conf\mssql.backup.userparams.conf
Include=C:\zabbix\conf\mssql.basic.userparams.conf
6:授權給Zabbix Agent服務器賬號權限,它需要下面一些表的查詢查詢
• msdb.dbo.sysjobhistory
• msdb.dbo.sysjobs
• master.sys.databases
• msdb.dbo.backupset
• msdb.dbo.log_shipping_monitor_secondary.
7:默認情況下,Zabbix Agent的服務賬號為NT AUTHORITY\SYSTEM,它是SQL Server下一個已經存在的賬號,如果你需要監控數據鏡像或Always On下面的一些數據庫,你需要授予Zabbix Agent的服務賬號sysadmin角色權限。更多參考相關資料。
8:重啟Zabbix Agent服務。
9:在Zabbix Server上給相關需要監控的主機添加對應的模板。
如下所示,勾選下面四個模板。
此時,你就會在主機的配置里面看到關於SQL Server監控的一些應用集(Applications)選項(截圖只是部分)
Zabbix share的模板配置略有區別,它有詳細的配置文檔,有興趣的可以自己測試驗證一下。下面是之前測試整理的簡單步驟。
1:在“配置”-> "模板“下導入下面兩個模板:
Template SQL Server Instance 0 DE.xml
Template SQL Server Instance 0 SA.xml
2:在“管理”(Administration)->“一般”(General)-> "值映射"(Value mapping)下面導入值映射
“SQL Agent Job status.xml”
“SQL Database status.xml”
3:將Discovery.mssql.server.ps1文件copy到Zabbix Agent能訪問的地方,個人將其放置在C:\zabbix\bin\win64下面
4:編輯Discovery.mssql.server.ps1文件,在文件的第14行,找到下面腳本,用服務器名替換“InsertSQLInstanceName”
[Parameter(Mandatory = $false, Position = 2)]$SQLInstanceName="EnterInstanceName"
參考博客https://segmentfault.com/a/1190000019203337,也可以修改Discovery.mssql.server.ps1腳本,添加下面一段代碼(紅色部分),以后直接copy這個文件即可,不用做任何修改。這樣省事方便很多。
Param(
[Parameter(Mandatory = $true, Position = 0)] [string]$select,
[Parameter(Mandatory = $false, Position = 1)][string]$2,
[Parameter(Mandatory = $false, Position = 2)]$SQLInstanceName="EnterInstanceName"
)
if ($SQLInstanceName -eq "EnterInstanceName")
{
$SQLInstanceName = $(hostname.exe)
}
5:修改zabbix_agentd.conf中的參數UserParameter, 如果你將文件Discovery.mssql.server.ps1放在C:\Program Files\zabbix\bin下面,那么就可以用userparams.conf中的值。
UserParameter=databases.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\zabbix\bin\Discovery.mssql.server.ps1" JSONDBNAME
UserParameter=jobs.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\zabbix\bin\Discovery.mssql.server.ps1" JSONJOBNAME
UserParameter=data.mssql.discovery[*],powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\zabbix\bin\Discovery.mssql.server.ps1" $1 "$2"
個人做了一些變跟。因為將文件Discovery.mssql.server.ps1放在C:\zabbix\bin\win64下面
UserParameter=databases.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\zabbix\bin\win64\Discovery.mssql.server.ps1" JSONDBNAME
UserParameter=jobs.mssql.discovery,powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\zabbix\bin\win64\Discovery.mssql.server.ps1" JSONJOBNAME
UserParameter=data.mssql.discovery[*],powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\zabbix\bin\win64\Discovery.mssql.server.ps1" $1 "$2"
6:給運行Zabbix Agent 服務的賬號授予數據庫的相關權限,它需要訪問msdb.dbo.sysjobhistory和msdb.dbo.sysjobs,默認情況,運行Zabbix Agent 服務的賬號為NT AUTHORITY\SYSTEM已經在數據庫中。
當然你可以創建一個賬號,然后在Discovery.mssql.server.ps1中設置,取消$uid和$pwd的設置,填上創建的的賬號密碼。
# Desenvolvido por Diego Cavalcante - 06/12/2017
# Monitoramento Windows SQLServer
# Versco: 1.1.0
# Criaeco = Versco 1.0.0 29/08/2017 (Script Bisico).
# Update = Versco 1.1.0 02/01/2018 (Obrigado @bernardolankheet, JOBSTATUS Retornava N = 5 Nunca Executado).
# Update = by Oleg D. and Mantas T. Translated to EN, added SQL Insance name.
# Parameters. Change Line 14 $SQLInstanceName="InstanceName" to correct instance name
Param(
[Parameter(Mandatory = $true, Position = 0)] [string]$select,
[Parameter(Mandatory = $false, Position = 1)][string]$2,
[Parameter(Mandatory = $false, Position = 2)]$SQLInstanceName="xxxx" #具體的實例名
)
#Login SQLInstanceName
#$uid = "Login" #具體的登錄名和密碼
#$pwd = "Password"
7:重啟Zabbix Agent服務
8:給相關服務器(host)添加模板。
9:如果需要的話,更新宏
10:默認情況下,需要添加兩個模板,除非你數據庫是SQL Server Express edition,那么你只需要添加模板“Template SQL Server Instance 0 DE Baseline”
11:最好將這兩個模板分類到Templates/Databases群組下面,方便日后的使用和管理!
With SQL instance discovery 的模板創建也非常簡單,跟上面的差異不是太大。按照官方文檔的操作步驟,逐步操作即可。
使用總結
1:例如,YourSQlDba數據庫的恢復模式為簡單模式,只做了完整備份。那么監控就會觸發告警,告訴你這個YourSQlDba數據庫的沒有做差異備份和事務日志備份。如下截圖
如果你不想它觸發告警,你可以在監控項(Item)里面找到“SQL Server Databases Discovery: SQL Instance MSSQLSERVER Database YourSQLDba: Diff Backup Status”,禁用這些監控項(Item)即可。
2:如果數據庫實例上有脫機的數據庫(offline),那么你必須禁用這個數據庫的相關監控項(Item),否則,你會在Zabbix Agent的日志中發現大量類似這樣的日志
...............................................................................
19120:20200826:154534.767 active check "perf_counter["\SQLServer:Databases(xxxx)\Log File(s) Used Size (KB)"]" is not supported: Cannot obtain performance information from collector.
19120:20200826:154534.768 active check "perf_counter["\SQLServer:Databases(xxxx)\Log Flush Wait Time"]" is not supported: Cannot obtain performance information from collector.
19120:20200826:154534.769 active check "perf_counter["\SQLServer:Databases(xxxx)\Log Flush Waits/sec"]" is not supported: Cannot obtain performance information from collector.
19120:20200826:154534.769 active check "perf_counter["\SQLServer:Databases(xxxx)\Log Flushes/sec"]" is not supported: Cannot obtain performance information from collector.
19120:20200826:154534.769 active check "perf_counter["\SQLServer:Databases(xxxx)\Log Growths"]" is not supported: Cannot obtain performance information from collector.
19120:20200826:154534.770 active check "perf_counter["\SQLServer:Databases(xxxx)\Log Shrinks"]" is not supported: Cannot obtain performance information from collector.
19120:20200826:154534.770 active check "perf_counter["\SQLServer:Databases(xxxx)\Log Truncations"]" is not supported: Cannot obtain performance information from collector.
..............................................................................
另外,如果不禁用這個數據庫的相關監控項(Item),那么你會在Zabbix的Queue隊列里面看到大量被延遲的監控項(Item)。禁用了脫機數據庫的相關Item后,你就會觀察到Queue隊列延遲的Item不見了。
3:你看到類似下面這樣各種告警或信息。下面截圖僅僅是部分截圖,然后就是理解各種告警和解決問題了。
各類監控指標都有圖形。可以查看這些指標的曲線圖。
問題小結:
在使用Zabbix template for Microsoft SQL Server模板過程中,也遇到了一些小問題,下面是這些問題的集合。下面絕大部分問題是Zabbix share下的模板才會遇到的。下面描述問題時盡量標明是那個分支模板遇到的問題。強烈推薦使用GitHub上的分支版本。可以讓你繞過很多坑。
問題1:Zabbix Agent日志中出現下面錯誤。
764:20200715:140830.588 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Active Transactions"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.588 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Data File(s) Size (KB)"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.589 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Bytes Flushed/sec"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.589 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log File(s) Size (KB)"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.590 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log File(s) Used Size (KB)"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.590 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Flush Wait Time"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.590 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Flush Waits/sec"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.591 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Flushes/sec"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.591 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Growths"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.592 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Shrinks"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.592 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Log Truncations"]" is not supported: Cannot obtain performance information from collector.
764:20200715:140830.592 active check "perf_counter["\SQLServer:Databases(DBAInventory)\Percent Log Used"]" is not supported: Cannot obtain performance information from collector.
檢查分析發現,DBAInventory數據庫被設置為脫機狀態,這台服務器應用了模板"Template SQL Server Instance 0 DE Baseline",那么就會生成一些監控項(Items)和一些觸發器(Triggers),這些Items和Tiggers的狀態是“不支持的”(Not supported),所以在主機設置里面,通過過濾搜索數據庫DBAInventory的監控項和觸發器,如下所示,然后將其停用(Disable)后,zabbix_agentd.log中就不會出現這個錯誤信息了。
問題2:遇到 Timeout while executing a shell script.錯誤。
1364:20200709:085346.828 active check "jobs.mssql.discovery" is not supported: Timeout while executing a shell script.
1364:20200709:085842.183 Failed to execute command "powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\zabbix\bin\win64\Discovery.mssql.server.ps1" JSONDBNAME": Timeout while executing a shell script.
1364:20200709:085842.183 active check "databases.mssql.discovery" is not supported: Timeout while executing a shell script.
修改zabbix_agentd.conf配置文件中的參數Timeout, 例如將Timeout調整為30
### Option: Timeout
# Spend no more than Timeout seconds on processing.
#
# Mandatory: no
# Range: 1-30
# Default:
# Timeout=3
Timeout=30
此時你就會發現zabbix_agentd.log不會出現這個錯誤了。
整理的文檔,本來有十幾個小問題,全部列在此處,不僅感覺非常混亂,而且占用了大量的篇幅,后面想想,這里就簡單列舉一兩個問題,后面有空,打算將這些問題以單篇展開述說。
參考資料:
https://share.zabbix.com/databases/microsoft-sql-server/template-for-microsoft-sql-server
https://github.com/MantasTumenas/Zabbix-template-for-Microsoft-SQL-Server