CPU利用率和CPU負荷(CPU usage vs CPU load)

本文轉載自查看原文 2019-04-16 14:37 3026 cpu load/ linux

對於CPU的性能監測，通常用top指令能顯示出兩個指標：cpu 利用率和cpu負荷。

其中%Cpu相關的內容：

us表示用戶進程cpu利用率，sy表示系統內核進程cpu利用率，ni表示運行正常進程消耗的 CPU 時間百分比，id表示idle time，

wa表示IO waiting time，hi表示硬中斷（Hardware IRQ）占用CPU的百分比；

si表示軟中斷（Software Interrupts）占用CPU的百分比；

st表示steal time：在內存緊張環境下，pagein 強制對不同的頁面進行的 steal 操作。虛擬服務占用的CPU時間百分比。

(%steal 不為 0 說明，當前 OS 是在虛擬機調度器的管理下運行的，且存在其它 OS 也被虛擬機調度器管理。)

其中load average有3個值，分別記錄了當前1min，5min,15min的系統平均負載。

用uptime指令也能顯示這3個值：

root@Ubuntu01:~# uptime
02:55:15 up 43 min, 1 user, load average: 0.09, 0.25, 0.13

CPU usage：

cpu usage或cpu utilization即 cpu 利用率，就是程序對CPU時間片的占用情況。參見https://en.wikipedia.org/wiki/CPU_time。

cpu 利用率是基於 /proc/stat 文件中的內容得到的：

詳細說明見參考文檔。

=> 進程cpu使用率:
基於 /proc/<pid>/stat 文件計算
進程的總Cpu 時間計算公式（該值包括其所有線程的 cpu 時間）
processCpuTime = utime + stime + cutime + cstime

=> 線程的cpu使用率:
基於 /proc/<pid>/task/<tid>/stat 文件計算
線程Cpu 時間計算公式為
threadCpuTime = utime + stime

CPU load：

load average 表示的是CPU的負載，包含的信息不是CPU的使用率狀況，而是在一段時間內CPU正在處理以及等待CPU處理的進程數之和的統計信息，也就是CPU使用隊列的長度的統計信息。這個數字越小越好。參見https://en.wikipedia.org/wiki/Load_%28computing%29的解釋：

that CPU load information based upon the CPU queue length does much better in load balancing compared to CPU utilization (CPU usage). The reason CPU queue length did better is probably because when a host is heavily loaded, its CPU utilization is likely to be close to 100% and it is unable to reflect the exact load level of the utilization. In contrast, CPU queue lengths can directly reflect the amount of load on a CPU.

如果load average值長期大於系統CPU的個數則說明CPU很繁忙，負載很高，可能會影響系統性能，導致系統卡頓響應時間長等等。

一般能夠被接受的值是 load average <= CPU核數 *0.7。

cpu load 是從 /proc/loadavg 中讀取的；

root@Ubuntu01:~# cat /proc/loadavg
0.00 0.00 0.00 1/272 20911

相關指令：

除了上面提及的top和uptime指令，還有這些：

1) 顯示cpu信息：

root@Ubuntu01:~# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s):           1
NUMA node(s):        1

root@Ubuntu01:~# cat /proc/cpuinfo

root@Ubuntu01:~# grep 'model name' /proc/cpuinfo | wc -l （獲取cpu數量）
1

2) vmstat：
#vmstat 10 5 （表示10秒鍾內取樣5次）

其中cpu相關的內容：

us表示用戶進程cpu利用率，sy表示系統內核進程cpu利用率，id表示idle time，wa表示IO waiting time，st表示steal time（在內存緊張環境下，pagein 強制對不同的頁面進行的 steal 操作。虛擬服務占用的CPU時間百分比）。

Note: system中cs表示上下文切換Context Switch。

3) mpstat:

%guest：Percentage of time spent by the CPU or CPUs to run a virtual processor.

%gnice：Percentage of time spent by the CPU or CPUs to run a niced guest.

==> totalCpuTime = user + nice + system + idle + iowait + irq + softirq + steal + guest + guest_nice
在純粹的物理機上（即其上未跑其它 guest OS ，自身也未作為 guest OS 被虛擬機調度器管理），steal/guest/guest_nice 值應該都為 0 ；除此之外，上述值就應該不為 0 .

4) sar監控CPU:

# sar -u 6 3 （表示6秒鍾內取樣3次）