【轉】nvidia-smi 命令解讀

本文轉載自查看原文 2017-11-18 13:37 3251 math

轉自：http://www.cnblogs.com/nowornever-L/p/6934605.html

轉自：http://blog.csdn.net/bruce_0712/article/details/63683787

nvidia-smi是用來查看GPU使用情況的。我常用這個命令判斷哪幾塊GPU空閑，但是最近的GPU使用狀態讓我很困惑，於是把nvidia-smi命令顯示的GPU使用表中各個內容的具體含義解釋一下。

這里寫圖片描述

這是服務器上特斯拉K80的信息。
上面的表格中：
第一欄的Fan：N/A是風扇轉速，從0到100%之間變動，這個速度是計算機期望的風扇轉速，實際情況下如果風扇堵轉，可能打不到顯示的轉速。有的設備不會返回轉速，因為它不依賴風扇冷卻而是通過其他外設保持低溫（比如我們實驗室的服務器是常年放在空調房間里的）。
第二欄的Temp：是溫度，單位攝氏度。
第三欄的Perf：是性能狀態，從P0到P12，P0表示最大性能，P12表示狀態最小性能。
第四欄下方的Pwr：是能耗，上方的Persistence-M：是持續模式的狀態，持續模式雖然耗能大，但是在新的GPU應用啟動時，花費的時間更少，這里顯示的是off的狀態。
第五欄的Bus-Id是涉及GPU總線的東西，domain:bus:device.function
第六欄的Disp.A是Display Active，表示GPU的顯示是否初始化。
第五第六欄下方的Memory Usage是顯存使用率。
第七欄是浮動的GPU利用率。
第八欄上方是關於ECC的東西。
第八欄下方Compute M是計算模式。
下面一張表示每個進程占用的顯存使用率。

顯存占用和GPU占用是兩個不一樣的東西，顯卡是由GPU和顯存等組成的，顯存和GPU的關系有點類似於內存和CPU的關系。我跑caffe代碼的時候顯存占得少，GPU占得多，師弟跑TensorFlow代碼的時候，顯存占得多，GPU占得少。

背景

qgzang@ustc:~$ nvidia-smi -h

輸出如下信息：

NVIDIA System Management Interface – v352.79

NVSMI provides monitoring information for Tesla and select Quadro devices.
The data is presented in either a plain text or an XML format, via stdout or a file.
NVSMI also provides several management operations for changing the device state.

Note that the functionality of NVSMI is exposed through the NVML C-based
library. See the NVIDIA developer website for more information about NVML.
Python wrappers to NVML are also available. The output of NVSMI is
not guaranteed to be backwards compatible; NVML and the bindings are backwards
compatible.

http://developer.nvidia.com/nvidia-management-library-nvml/
http://pypi.python.org/pypi/nvidia-ml-py/

Supported products:

Full Support

All Tesla products, starting with the Fermi architecture

All Quadro products, starting with the Fermi architecture

All GRID products, starting with the Kepler architecture

GeForce Titan products, starting with the Kepler architecture

Limited Support

All Geforce products, starting with the Fermi architecture

命令

nvidia-smi [OPTION1 [ARG1]] [OPTION2 [ARG2]] ...

參數

參數	詳解
-h, –help	Print usage information and exit.

LIST OPTIONS:

參數	詳解
-L, –list-gpus	Display a list of GPUs connected to the system.

qgzang@ustc:~$ nvidia-smi -L GPU 0: GeForce GTX TITAN X (UUID: GPU-xxxxx-xxx-xxxxx-xxx-xxxxxx)

SUMMARY OPTIONS:

參數	詳解
-i,–id=	Target a specific GPU.
-f,–filename=	Log to a specified file, rather than to stdout.
-l,–loop=	Probe until Ctrl+C at specified second interval.

QUERY OPTIONS:

參數	詳解
-q,	–query
-u,–unit	Show unit, rather than GPU, attributes.
-i,–id=	Target a specific GPU or Unit.
-f,–filename=	Log to a specified file, rather than to stdout.
-x,–xml-format	Produce XML output.
–dtd	When showing xml output, embed DTD.
-d,–display=	Display only selected information: MEMORY,
-l, –loop=	Probe until Ctrl+C at specified second interval.
-lms, –loop-ms=	Probe until Ctrl+C at specified millisecond interval.

SELECTIVE QUERY OPTIONS:

參數	詳解	補充
–query-gpu=	Information about GPU.	Call –help-query-gpu for more info.
–query-supported-clocks=	List of supported clocks.	Call –help-query-supported-clocks for more info.
–query-compute-apps=	List of currently active compute processes.	Call –help-query-compute-apps for more info.
–query-accounted-apps=	List of accounted compute processes.	Call –help-query-accounted-apps for more info.
–query-retired-pages=	List of device memory pages that have been retired.	Call –help-query-retired-pages for more info.

[mandatory]

參數	命令
-i, –id=	Target a specific GPU or Unit.
-f, –filename=	Log to a specified file, rather than to stdout.
-l, –loop=	Probe until Ctrl+C at specified second interval.
-lms, –loop-ms=	Probe until Ctrl+C at specified millisecond interval.

DEVICE MODIFICATION OPTIONS:

參數	命令	補充
-pm, –persistence-mode=	Set persistence mode: 0/DISABLED, 1/ENABLED
-e, –ecc-config=	Toggle ECC support: 0/DISABLED, 1/ENABLED
-p, –reset-ecc-errors=	Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE
-c, –compute-mode=	Set MODE for compute applications:	0/DEFAULT,1/EXCLUSIVE_THREAD (deprecated),2/PROHIBITED, 3/EXCLUSIVE_PROCESS
–gom=	Set GPU Operation Mode:	0/ALL_ON, 1/COMPUTE, 2/LOW_DP
-r –gpu-reset	Trigger reset of the GPU.

UNIT MODIFICATION OPTIONS:

參數	命令
-t, –toggle-led=	Set Unit LED state: 0/GREEN, 1/AMBER
-i, –id=	Target a specific Unit.

SHOW DTD OPTIONS:

參數	命令
–dtd	Print device DTD and exit.
-f, –filename=	Log to a specified file, rather than to stdout.
-u, –unit	Show unit, rather than device, DTD.
–debug=	Log encrypted debug information to a specified file.

Process Monitoring:

參數	命令	補充
pmon	Displays process stats in scrolling format.	“nvidia-smi pmon -h” for more information.

TOPOLOGY: (EXPERIMENTAL)

參數	命令	補充
topo	Displays device/system topology. “nvidia-smi topo -h” for more information.	Please see the nvidia-smi(1) manual page for more detailed information.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 nvidia-smi 命令解讀 nvidia-smi 命令解讀 Nvidia-smi命令使用 NVIDIA-SMI系列命令總結 NVIDIA-SMI系列命令總結 GPU之nvidia-smi命令詳解 nvidia-smi命令輸出詳解 NVIDIA-SMI系列命令總結 nvidia-smi Ubuntu關機重啟后 NVIDIA-SMI 命令不能使用