linux重啟之后No CUDA-supporting devices found!

本文轉載自查看原文 2017-05-26 13:51 2787 CUDA

實驗室做並行計算的服務重啟后，采用cuda接口的應用程序vasp_gpu，運行時提示：

CUDA Error in cuda_main.cu, line 144: unknown error

No CUDA-supporting devices found!

在Nvidia開發者論壇https://devtalk.nvidia.com/ 找到相關主題下的回答，

When you first boot up the system in console mode, the nvidia driver is not loaded and the GPU device is not available. One benifit of this , is that more host memory is free。

即重啟之后GPU是默認關閉的，需要手動打開。

（修正：實際上是默認關閉persistence mode持續模式。persistence mode能夠讓GPU更快響應任務，代價是待機功耗增加。關閉persistence mode同樣能夠啟動任務。但有些程序自己有bug啟動不了）

$nvidia-smi

解決方法：打開persistence mode持續模式

root賬戶下操作

#cd /usr/local/cuda/sample/1_Utilities/deviceQuery

#./deviceQuery

#nvidia-smi -pm 1

隨后我們查看GPU狀態。Persistence-M從Off變成了On，持續模式已打開。

$nvidia-smi

DeviceQuery是NVIDIA自帶的設備查詢程序，它實際上是一個sample，需要編譯后才能使用。在 cuda根目錄/.../cuda/samples/1_Utilities/deviceQuery下用make編譯

這里給出了一個運行結果示范：http://blog.csdn.net/u012033124/article/details/70740119

nvidia-smi 即NVIDIA system manager interface是GPU 的控制程序，同時也能夠監視GPU的運行狀態。詳細參數通過nvidia-smi -h命令查看幫助文檔。

這里有一個簡單的介紹 http://www.microway.com/hpc-tech-tips/nvidia-smi_control-your-gpus/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 #報錯記錄#RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 【深度學習】RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! KeyError: 'CUDA_VISIBLE_DEVICES' 戴爾電腦no bootable devices found libtorch 報錯 PyTorch is not linked with support for cuda devices ComponentNotFoundException: No component for supporting the service Abp.AspNetCore.Configuration.AbpAspNetCoreConfiguration was found add shell 出現 error: no devices/emulators found （原）pycharm中使用CUDA_VISIBLE_DEVICES No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 10.Linux-CentOS系統重啟之后Xshell無法SSH連接（雲環境）