failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 排坑指南 - 碼上歡樂

相關內容簡體繁體

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 排坑指南

本文轉載自查看原文 2019-04-21 10:46 1102 深度學習可還行

訓練maskrcnn時，出現了

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

一開始以為是自己沒有把cuda安裝好，在排查安裝問題，發現沒有問題后重啟電腦，運行

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

這個是測試代碼，可以查看GPU是否能正常運行

重啟電腦后的第一次GPU是可以正常運行的，說明GPU的配置是沒有問題的

但是當再一次運行要調用GPU的程序時，會報錯

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

這就有點奇怪了，剛開始以為是程序停止了但GPU還被占用，於是用nvidia-smi查看了一下，發現報錯

Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU

GPU已經丟失了。。。需要重啟。。。重啟之后GPU又可以使用，但用GPU一次以后又會出現該問題

經過百度和google發現大概是因為顯存占用過高，導致GPU 離線，通過降低batch_size可能可以解決問題。可以考慮從減少訓練過程顯存占用這個方面入手，修改部分模型訓練參數，有待實驗

至此問題並未解決，從根本解決問題后會及時更新

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 no CUDA-capable device is detected，或者GPU is lost TensorFlow學習筆記速記2 報錯：failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE CUDA error: device-side assert triggered RuntimeError: CUDA error: invalid device ordinal tensorflow報錯：Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error CUDA ---- device管理 cuda_device_functions.h:32:31: fatal error: cuda/include/cuda.h: 沒有那個文件或目錄 CUDA Error: no kernel image is available for execution on the device: No error 錯誤如何處理? RuntimeError: cuda runtime error (10) : invalid device ordinal

粵ICP備18138465號 © 2018-2026 CODEPRJ.COM