完美解決-RuntimeError: CUDA error: device-side assert triggered

本文轉載自查看原文 2020-07-27 16:23 11785 pytorch

網上的解決方案意思是對的，但並沒有給出相應的實際解決方法：

問題描述：

當使用ImageFolder方式構建數據集的時候：

  train_data = torchvision.datasets.ImageFolder(train_path, transform=train_transform)
  train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=6)

pytorch會自己掃描train_path下的每一個文件夾（每類圖片都位於其類別的文件夾下），並將每一個類映射成數值，比如有4類，類別標簽就是[0,1,2,3]。

在進行二分類的時候的確是將標簽映射成了[0,1]，但是在進行4分類的時候，標簽卻映射成了[1,2,3,4]，因此就會報錯：

RuntimeError: CUDA error: device-side assert triggered

我們可以這樣打印下相關的輸出：

from torch.autograd import Variable
#load_fzdataset是自己定義的讀取數據的函數，其返回的是DataLoader對象
train_data,test_data=load_fzdataset(8)
for epoch in range(2):
    for i, data in enumerate(train_data):
        # 將數據從 train_loader 中讀出來,一次讀取的樣本數是8個
        inputs, labels = data
        # 將這些數據轉換成Variable類型
        inputs, labels = Variable(inputs), Variable(labels)
        # 接下來就是跑模型的環節了，我們這里使用print來代替
        print("epoch：", epoch, "的第" , i, "個inputs", inputs.data.size(), "labels", labels.data)

報錯時的信息是：

epoch： 0 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 2, 4, 4, 3, 4, 3, 1])
epoch： 0 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 1, 3, 4, 4, 4, 2])
epoch： 0 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 2, 2, 4, 4, 4, 3, 3])
epoch： 0 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 3, 4, 1, 2, 1, 2, 1])
epoch： 0 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 1, 1, 1, 4, 4, 3, 1])
epoch： 0 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 3, 4, 4, 4, 4, 1, 4])
epoch： 0 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 1, 1, 4, 2, 4, 1])
epoch： 0 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 4, 3, 4, 3, 4, 4])
epoch： 0 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([1, 4, 4, 1, 2, 1])
epoch： 1 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 3, 4, 4, 4, 4, 4])
epoch： 1 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([2, 4, 1, 1, 4, 4, 2, 4])
epoch： 1 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 4, 2, 1, 1, 4, 4, 3])
epoch： 1 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 1, 1, 1, 3, 4, 1])
epoch： 1 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 4, 2, 4, 1, 1, 4, 1])
epoch： 1 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 4, 1, 2, 4, 3, 4, 1])
epoch： 1 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([4, 2, 4, 1, 3, 4, 4, 4])
epoch： 1 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 1, 2, 4, 1, 4, 4, 4])
epoch： 1 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([2, 1, 3, 3, 4, 4])

我們只需要這么修改就行了：

from torch.autograd import Variable
#load_fzdataset是自己定義的讀取數據的函數，其返回的是DataLoader對象
train_data,test_data=load_fzdataset(8)
for epoch in range(2):
    for i, data in enumerate(train_data):
        # 將數據從 train_loader 中讀出來,一次讀取的樣本數是8個
        inputs, labels = data
        # 將這些數據轉換成Variable類型
        inputs, labels = Variable(inputs), Variable(labels)-1
        # 接下來就是跑模型的環節了，我們這里使用print來代替
        print("epoch：", epoch, "的第" , i, "個inputs", inputs.data.size(), "labels", labels.data)

輸出：

epoch： 0 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 0, 3, 2, 1, 3, 2])
epoch： 0 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 3, 3, 3, 3, 3, 2, 2])
epoch： 0 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 0, 0, 3, 2, 1, 3])
epoch： 0 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 0, 0, 3, 2, 1])
epoch： 0 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([2, 0, 1, 0, 3, 0, 0, 2])
epoch： 0 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 0, 0, 0, 3, 3, 3])
epoch： 0 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 3, 0, 3, 3, 3, 0, 2])
epoch： 0 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 2, 3, 3, 0, 0])
epoch： 0 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([3, 3, 3, 1, 2, 1])
epoch： 1 的第 0 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 0, 3, 2, 1, 3, 3])
epoch： 1 的第 1 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 1, 2, 1, 0, 3, 1, 0])
epoch： 1 的第 2 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 0, 0, 1, 2, 2])
epoch： 1 的第 3 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 3, 2, 3, 3, 0, 2])
epoch： 1 的第 4 個inputs torch.Size([8, 3, 224, 224]) labels tensor([1, 3, 2, 3, 2, 3, 3, 3])
epoch： 1 的第 5 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 0, 3, 3, 0, 3, 0, 3])
epoch： 1 的第 6 個inputs torch.Size([8, 3, 224, 224]) labels tensor([3, 0, 3, 0, 3, 2, 0, 3])
epoch： 1 的第 7 個inputs torch.Size([8, 3, 224, 224]) labels tensor([0, 3, 0, 3, 3, 3, 3, 3])
epoch： 1 的第 8 個inputs torch.Size([6, 3, 224, 224]) labels tensor([2, 1, 0, 3, 2, 0])

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 RuntimeError: CUDA error: device-side assert triggered的解決 RuntimeError: cuda runtime error (59) : device-side assert triggered（已解決） CUDA error: device-side assert triggered RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26 Pytorch: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMa Pytorch報錯：cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26 RuntimeError: CUDA error: invalid device ordinal RuntimeError: cuda runtime error (10) : invalid device ordinal 解決RuntimeError: CUDA error: out of memory RuntimeError: CUDA error:out of memory的一種解決辦法