基礎監控:
Processor:
% Processor Time CPU當前利用率,百分比
Memory:
Available MBytes 當前可用內存,兆字節(虛擬內存不需要監控,只有當物理內存不夠時才會使用虛擬內存,物理內存已有監控)
LogicalDisk:
% Free Space 邏輯分區可用空間,百分比(物理磁盤IO由於RAID級別不同,或者有的機器沒有RAID,無法定義統一的監控閾值)
Network Interface:
Bytes Total/sec 網卡流量:發送+接收,字節
TCPv4:
Connections Established 當前連接數(Established + Close-Wait)
==================================================
CPU:
%Processor Time
%Priviliaged Time
CPU在特權模式下處理線程所花的時間百分比。一般的系統服務,進城管理,內存管理等一些由操作系統自行啟動的進程屬於這類
%User Time
與%Privileged Time計數器正好相反,指的是在用戶狀態模式下(即非特權模式)的操作所花的時間百分比。如果該值較大,可以考慮是否通過算法優化等方法降低這個值。如果該服務器是數據庫服務器,導致此值較大的原因很可能是數據庫的排序或是函數操作消耗了過多的CPU時間,此時可以考慮對數據庫系統進行優化。
%DPC Time
處理器在網絡處理上消耗的時間,該值越低越好。在多處理器系統中,如果這個值大於50%並且%Processor Time非常高,加入一個網卡可能會提高性能。
Memory:
Available Bytes
Pages/sec
該計數器顯示由於頁面不在物理內存中而需要從磁盤讀取的頁面數。Pages/sec 的值很大不一定表明內存有問題,而可能是運行使用內存映射文件的程序所致,操作系統經常會利用磁盤交換的方式提高系統可用的內存量或是提高內存的使用效率。(注意該計數器與 Page Faults/sec 的區別,后者只表明數據不能在內存的指定工作集中立即使用,包括硬錯誤和軟錯誤)
Page Faults/sec計數器可以確保磁盤活動不是由分頁導致的。在 Windows 中,換頁的原因包括:配置進程占用了過多內存 或者 文件系統活動。
如果在同一硬盤上有多個邏輯分區,需要使用 Logical Disk計數器而非 Physical Disk計數器。查看邏輯磁盤計數器有助於確定哪些文件被頻繁訪問。當發現磁盤有大量讀/寫活動時,請查看讀寫專用計數器以確定導致每個邏輯卷負荷增加的磁盤活動類型,例如,Logical Disk: Disk Write Bytes/sec。
Page Input/sec
表示為了解決硬錯誤而寫入硬盤的頁數(參考值:>=Page Reads/sec)
Page Reads/sec
表示為了解決硬錯誤而從硬盤上讀取的頁數。(參考值: <=5)
如果懷疑有內存泄露,請監視 Memory/Available Bytes 和 Memory/ Committed Bytes,以觀察內存行為,並監視你認為可能在泄露內存的進程的 Process/ Private Bytes、Process/ Working Set 和Process/ Handle Count。如果懷疑是內核模式進程導致了泄露,則還應該監視 Memory/ Pool Nonpaged Bytes、Memory/ Pool Nonpaged Allocs 和 Process(process_name)/ Pool Nonpaged Bytes
如果發生了內存泄漏,process\private bytes計數器和process\working set 計數器的值往往會升高,同時avaiable bytes的值會降低
private Bytes
是指進程所分配的無法與其他進程共享的當前字節數量。該計數器主要用來判斷進程在性能測試過程中有無內存泄漏。
例如:對於一個IIS之上的web應用,我們可以重點監控inetinfo進程的Private Bytes,如果在性能測試過程中,該進程的Private Bytes計數器值不斷增加,或是性能測試停止后一段時間,該進程的Private Bytes仍然持續在高水平,則說明應用存在內存泄漏。
Disk:
PhysicalDisk\Avg. Disk sec/Read
以秒計算的在此盤上讀取數據的所需平均時間。
Physical Disk\ Disk Reads/sec
在讀取操作時從磁盤上傳送的字節平均數。
PhysicalDisk\ Avg. Disk sec/Write
以秒計算的在此盤上寫入數據的所需平均時間。
Physical Disk\ DiskWrites/sec
在寫入操作時從磁盤上傳送的字節平均數。
Physical Disk\ Avg.Disk sec/Transfer
反映磁盤完成請求所用的時間。較高的值表明磁盤控制器由於失敗而不斷重試該磁盤。這些故障會增加平均磁盤傳送時間。
%Disk Time和Avg.Disk Queue Length
RAID 磁盤中的 % Disk Time 計數器會指示大於 100% 的值。如果出現這種情況,則使用 PhysicalDisk: Avg.Disk Queue Length計數器來確定等待進行磁盤訪問的平均系統請求數量。
如果不是RAID,則使用 % Disk Time 和 Current Disk Queue Length計數器確定是否磁盤存在瓶頸,如果這兩個計數器的值一直很高,則可能是磁盤存在瓶頸
Physical Disk:
DiskTransfers/sec 磁盤IOPS
% Disk Time 當前物理磁盤利用率,如果是RAID,該值會大於100%
Current Disk Queue Length 等待進行磁盤訪問的當前系統請求數量
Avg.Disk Queue Length 等待進行磁盤訪問的平均系統請求數量,用於RAID
Disk counters to monitor
Monitor the following counters to ensure the health of disks. Note that the following values represent values measured over time — not values that occur during a sudden spike and not values that are based on a single measurement.
-
Physical Disk: % Disk Time: DataDrive This counter shows the percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Monitor this counter to ensure that it remains less than two times the number of disks.
-
Logical Disk: Disk Transfers/sec This counter shows the rate at which read and write operations are performed on the disk. Use this counter to monitor growth trends and forecast appropriately.
-
Logical Disk: Disk Read Bytes/sec and Logical Disk: Disk Write Bytes/sec These counters show the rate at which bytes are transferred from the disk during read or write operations.
-
Logical Disk: Avg. Disk Bytes/Read This counter shows the average number of bytes transferred from the disk during read operations. This value can reflect disk latency — larger read operations can result in slightly increased latency.
-
Logical Disk: Avg. Disk Bytes/Write This counter shows the average number of bytes transferred to the disk during write operations. This value can reflect disk latency — larger write operations can result in slightly increased latency.
-
Logical Disk: Current Disk Queue Length This counter shows the number of requests outstanding on the disk at the time that the performance data is collected. For this counter, lower values are better. Values above 2 per disk may indicate a bottleneck and should be investigated. This means that a value of up to 8 may be acceptable for a LUN comprised of 4 disks. Bottlenecks can create a backlog that can spread beyond the current server that is accessing the disk, and result in long wait times for users. Possible solutions to a bottleneck are to add more disks to the RAID array, replace existing disks with faster disks, or move some data to other disks.
-
Logical Disk: Avg. Disk Queue Length This counter shows the average number of both read and write requests that were queued for the selected disk during the sample interval. The rule is that there should be two or fewer outstanding read and write requests per spindle, but this can be difficult to measure because of storage virtualization and differences in RAID levels between configurations. Look for larger than average disk queue lengths in combination with larger than average disk latencies. This combination can indicate that the storage array cache is being overused or that spindle sharing with other applications is affecting performance.
-
Logical Disk: Avg. Disk sec/Read and Logical Disk: Avg. Disk sec/Write These counters show the average time, in seconds, of a read or write operation to the disk. Monitor these counters to ensure that they remain below 85 percent of the disk capacity. Disk access time increases exponentially if read or write operations are more than 85 percent of disk capacity. To determine the specific capacity for your hardware, refer to the vendor documentation, or use the SQLIO Disk Subsystem Benchmark Tool to calculate it. For more information, see SQLIO Disk Subsystem Benchmark Tool(http://go.microsoft.com/fwlink/?LinkID=105586).
-
Logical Disk: Avg. Disk sec/Read This counter shows the average time, in seconds, of a read operation from the disk. On a well-tuned system, ideal values are from 1-5 milliseconds (ms) for logs (ideally 1 ms on a cached array), and 4-20 ms for data (ideally less than 10 ms). Higher latencies can occur during peak times, but if high values occur regularly, you should investigate the cause.
-
Logical Disk: Avg. Disk sec/Write This counter shows the average time, in seconds, of a write operation to the disk. On a well-tuned system, ideal values are from 1-5 ms for logs (ideally 1 ms on a cached array), and 4-20 ms for data (ideally less than 10 ms). Higher latencies can occur during peak times, but if high values occur regularly, you should investigate the cause.
When you are using RAID configurations with the Avg. Disk sec/Read or Avg. Disk sec/Write, use the formulas listed in the following table to determine the rate of input and output on the disk.
RAID level Formula RAID 0
I/Os per disk = (reads + writes) / number of disks
RAID 1
I/Os per disk = [reads + (2 * writes)] / 2
RAID 5
I/Os per disk = [reads + (4 * writes)] / number of disks
RAID 10
I/Os per disk = [reads + (2 * writes)] / number of disks
For example, if you have a RAID 1 system that has two physical disks, and your counters are at the values that are shown in the following table:
Counter Value Avg. Disk sec/Read
80
Logical Disk: Avg. Disk sec/Write
70
Avg. Disk Queue Length
5
The I/O value per disk can be calculated as follows: (80 + (2 * 70))/2 = 110
The disk queue length can be calculated as follows: 5/2 = 2.5
In this situation, you have a borderline I/O bottleneck.
-
- From:http://technet.microsoft.com/en-us/library/dd723635(v=office.12).aspx