Pytorch 報錯總結


目前在學習pytorch,自己寫了一些例子,在這里記錄下來一些報錯及總結

 

1. RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'

詳細報錯信息

 1 Traceback (most recent call last):
 2   File "dogvscat-resnet.py", line 105, in <module>
 3     outputs = net(inputs)
 4   File "/home/lzx/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
 5     result = self.forward(*input, **kwargs)
 6   File "/home/lzx/anaconda3/envs/pytorch/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/resnet.py", li
 7 ne 139, in forward
 8   File "/home/lzx/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
 9     result = self.forward(*input, **kwargs)
10   File "/home/lzx/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
11     self.padding, self.dilation, self.groups)
12 RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'

參考:https://github.com/wohlert/semi-supervised-pytorch/issues/7

這個報錯其實比較隱蔽,用Google搜索的第一頁都沒什么參考價值,只有上面的這個鏈接里提醒了我,

在GPU上進行訓練時,需要把模型和數據都加上.cuda(),如

model.cuda()

但是對於數據,這個.cuda()並非是inplace操作,就是說不單單是在變量名后面加上.cuda()就可以了

還必須顯示的賦值回去,即:

data.cuda()是不行的,而

data = data.cuda()才是可以的。

這樣的顯示聲明的細節非常重要。

 

示例代碼:用LeNet做貓狗的二分類,自己寫的代碼

請重點關注以下行的寫法:46 47 57 58 96 97

  1 import os
  2 from PIL import Image
  3 import numpy as np
  4 import torch
  5 from torchvision import transforms as T
  6 from torchvision.datasets import ImageFolder
  7 from torch.utils.data import DataLoader
  8 import torch.nn as nn
  9 import torch.nn.functional as F
 10 from torch import optim
 11 from torch.utils import data
 12 import torchvision as tv
 13 from torchvision.transforms import ToPILImage
 14 show = ToPILImage()  # 可以把Tensor轉成Image,方便可視化
 15 
 16 
 17 transform = T.Compose([
 18     T.Resize(32), # 縮放圖片(Image),保持長寬比不變,最短邊為224像素
 19     T.CenterCrop(32), # 從圖片中間切出224*224的圖片
 20     T.ToTensor(), # 將圖片(Image)轉成Tensor,歸一化至[0, 1]
 21     T.Normalize(mean=[.5, .5, .5], std=[.5, .5, .5]) # 標准化至[-1, 1],規定均值和標准差
 22 ])
 23 
 24 
 25 class Net(nn.Module):
 26     def __init__(self):
 27         super(Net, self).__init__()
 28         self.conv1 = nn.Conv2d(3, 6, 5)
 29         self.conv2 = nn.Conv2d(6, 16, 5)
 30         self.fc1 = nn.Linear(16*5*5, 120)
 31         self.fc2 = nn.Linear(120, 84)
 32         self.fc3 = nn.Linear(84, 2)
 33 
 34     def forward(self, x):
 35         x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
 36         x = F.max_pool2d(F.relu(self.conv2(x)), 2)
 37         x = x.view(x.size()[0], -1)
 38         x = F.relu(self.fc1(x))
 39         x = F.relu(self.fc2(x))
 40         x = self.fc3(x)
 41         return x
 42 
 43 net = Net()
 44 if torch.cuda.is_available():
 45     print("Using GPU")
 46 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 47 net.to(device)
 48 
 49 
 50 def test():
 51     correct = 0 # 預測正確的圖片數
 52     total = 0 # 總共的圖片數
 53     # 由於測試的時候不需要求導,可以暫時關閉autograd,提高速度,節約內存
 54     with torch.no_grad():
 55         for data in testloader:
 56             images, labels = data
 57             images = images.to(device)
 58             labels = labels.to(device)
 59             outputs = net(images)
 60             _, predicted = torch.max(outputs, 1)
 61             total += labels.size(0)
 62             correct += (predicted == labels).sum()
 63 
 64         print('Accuracy in the test dataset: %.1f %%' % (100 * correct / total))
 65 
 66 train_dataset = ImageFolder('/home/lzx/datasets/dogcat/sub-train/', transform=transform)
 67 test_dataset = ImageFolder('/home/lzx/datasets/dogcat/sub-test/', transform=transform)
 68 # dataset = DogCat('/home/lzx/datasets/dogcat/sub-train/', transforms=transform)
 69 # train_dataset = ImageFolder('/Users/lizhixuan/PycharmProjects/pytorch_learning/Chapter5/sub-train/', transform=transform)
 70 # test_dataset = ImageFolder('/Users/lizhixuan/PycharmProjects/pytorch_learning/Chapter5/sub-test/', transform=transform)
 71 
 72 trainloader = torch.utils.data.DataLoader(
 73                     train_dataset,
 74                     batch_size=512,
 75                     shuffle=True,
 76                     num_workers=4)
 77 testloader = torch.utils.data.DataLoader(
 78                     test_dataset,
 79                     batch_size=512,
 80                     shuffle=False,
 81                     num_workers=4)
 82 classes = ('cat', 'dog')
 83 
 84 criterion = nn.CrossEntropyLoss()  # 交叉熵損失函數
 85 optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
 86 
 87 print("Starting to train")
 88 torch.set_num_threads(8)
 89 for epoch in range(1000):
 90 
 91     running_loss = 0.0
 92     for i, data in enumerate(trainloader, 0):
 93 
 94         # 輸入數據
 95         inputs, labels = data
 96         inputs = inputs.to(device)
 97         labels = labels.to(device)
 98 
 99         # 梯度清零
100         optimizer.zero_grad()
101 
102         # forward + backward
103         outputs = net(inputs)
104         loss = criterion(outputs, labels)
105 #         print("outputs %s  labels %s" % (outputs, labels))
106         loss.backward()
107 
108         # 更新參數
109         optimizer.step()
110 
111         # 打印log信息
112         # loss 是一個scalar,需要使用loss.item()來獲取數值,不能使用loss[0]
113         running_loss += loss.item()
114         print_gap = 10
115         if i % print_gap == (print_gap-1): # 每1000個batch打印一下訓練狀態
116             print('[%d, %5d] loss: %.3f' \
117                   % (epoch+1, i+1, running_loss / print_gap))
118             running_loss = 0.0
119     test()
120 print('Finished Training')

 

這樣一來,就完全明白了如何把代碼放在GPU上運行了,哈哈

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM