存在空閑CUDA前提下報錯：RuntimeError: CUDA error: out of memory - 碼上快樂

相關內容簡體繁體

存在空閑CUDA前提下報錯：RuntimeError: CUDA error: out of memory

本文轉載自查看原文 2021-09-17 09:36 111 安裝/下載/問題解決/說明

問題背景：

最近跑代碼時發現報錯CUDA out of memory，進入linux終端查看GPU使用情況（nvidia-smi），結果如下：

我用的GPU序號是0，但這塊被人占用了，所以我可以用剩下的3號和4號。

解決方案：

在代碼中更改GPU使用序號（修改/添加代碼）：

1 import os 2 
3 os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"
4 args.device = torch.device('cuda:{}'.format(2) if torch.cuda.is_available() else 'cpu')

os是列舉出可用的GPU序號, args選擇可用的index為2的序號，因此也為2.

可能出現的問題：代碼中有些位置沒有使用arg.device，而是直接使用model.cuda()，因為此時默認的序號0的GPU被占用，同樣會報錯：cuda out of memory

解決方法：需要修改代碼為model.to(arg.device)

可能

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 *** RuntimeError: CUDA error: out of memory. 解決RuntimeError: CUDA error: out of memory RuntimeError: CUDA error:out of memory的一種解決辦法 CUDA_ERROR_OUT_OF_MEMORY RuntimeError: CUDA error: an illegal memory access was encountered No decoder surfaces left 和 CUDA_ERROR_OUT_OF_MEMORY的報錯解決 ubuntu查看並殺死自己之前運行的進程解決辦法RuntimeError: CUDA error: out of memory torch.load CUDA ERROR: out of memory 解決CUDA out of memory 顯存充足，但是卻出現CUDA error:out of memory錯誤

粵ICP備18138465號 © 2018-2025 CODEPRJ.COM