pytorch 社區踩坑匯總
文中代碼不一定能直接應用,僅僅記錄思路
model.eval() 和 with torch.no_grad() 的區別
- model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode.
- torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).
轉換 tensor 類型
- tensor_one.float() : converts the tensor_one type to torch.float32
- tensor_one.double() : converts the tensor_one type to torch.float64
- tensor_one.int() : converts the tensor_one type to torch.int32
在指定維度連接 tensor
third_tensor = torch.cat((first_tensor, second_tensor), 0)
序列化模型,加載以便再次訓練
@Bixqu You can check the ImageNet Example line 139
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.state_dict(),
'best_prec1': best_prec1,
'optimizer' : optimizer.state_dict(),
}, is_best)
With
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
torch.save(state, filename)
if is_best:
shutil.copyfile(filename, 'model_best.pth.tar')
Loading/Resuming from the dictionary is there
if args.resume:
if os.path.isfile(args.resume):
print("=> loading checkpoint '{}'".format(args.resume))
checkpoint = torch.load(args.resume)
args.start_epoch = checkpoint['epoch']
best_prec1 = checkpoint['best_prec1']
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
print("=> loaded checkpoint '{}' (epoch {})"
.format(args.resume, checkpoint['epoch']))
else:
print("=> no checkpoint found at '{}'".format(args.resume))
釋放 GPU 資源
直接清 cuda 緩存: torch.cuda.empty_cache()
把 優化器 扔到 cpu 上:
https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27
I was about to ask a question but I found my issue. Maybe it will help others.
I was on Google Colab and finding that I could train my model several times, but that on the 3rd or 4th time I’d run into the memory error. Using torch.cuda.empty_cache() between runs did not help. All I could do was restart my kernel.
I had a setup of the sort:
class Fitter:
def __init__(self, model):
self.model = model
optimizer = # init optimizer here
The point is that I was carrying the model over in between runs but making a new optimizer (in my case I was making new instances of Fitter). And in my case, the (Adam) optimizer state actually took up more memory than my model!
So to fix it I tried some things.
This did not work:
def wipe_memory(self): # DOES NOT WORK
self.optimizer = None
torch.cuda.empty_cache()
Neither did this:
def wipe_memory(self): # DOES NOT WORK
del self.optimizer
self.optimizer = None
gc.collect()
torch.cuda.empty_cache()
This did work!
def wipe_memory(self): # DOES WORK
self._optimizer_to(torch.device('cpu'))
del self.optimizer
gc.collect()
torch.cuda.empty_cache()
def _optimizer_to(self, device):
for param in self.optimizer.state.values():
# Not sure there are any global tensors in the state dict
if isinstance(param, torch.Tensor):
param.data = param.data.to(device)
if param._grad is not None:
param._grad.data = param._grad.data.to(device)
elif isinstance(param, dict):
for subparam in param.values():
if isinstance(subparam, torch.Tensor):
subparam.data = subparam.data.to(device)
if subparam._grad is not None:
subparam._grad.data = subparam._grad.data.to(device)
I got that optimizer_to function from here
交換坐標軸
a = torch.rand(1,2,3,4)
print(a.transpose(0,3).transpose(1,2).size())
print(a.permute(3,2,1,0).size())
Variables 轉 numpy
Variable's can’t be transformed to numpy, because they’re wrappers around tensors that save the operation history, and numpy doesn’t have such objects. You can retrieve a tensor held by the Variable, using the .data attribute. Then, this should work: var.data.numpy().
(Variable(x).data).cpu().numpy()
為啥 transform.Normalize()
Normalize does the following for each channel:
image = (image - mean) / std
The parameters mean, std are passed as 0.5, 0.5 in your case. This will normalize the image in the range [-1,1]. For example, the minimum value 0 will be converted to (0-0.5)/0.5=-1, the maximum value of 1 will be converted to (1-0.5)/0.5=1.
if you would like to get your image back in [0,1] range, you could use,
image = ((image * std) + mean)
Denormalize
pip install kornia
(一個配合 pytorch 使用的可微圖像庫)
pip install DatasetsHelper==0.0.3
(非必須,只是用於獲取normalization的值)
推薦使用:
mean, std = [torch.tensor(i) for i in NormalizeValues('cifar10')()]
# 注意 denormalize的輸入是batch,4維的
kornia_img = kornia.enhance.denormalize(t(img).unsqueeze_(0), mean, std)
plt.imshow(kornia_img.squeeze_(0).permute(1, 2, 0))
簡單示例
from matplotlib import pyplot as plt
from torchvision.utils import make_grid
from torchvision.transforms import transforms as T
import torch
from DatasetsHelperQ import get_dataset_mean_std
from DatasetsHelperQ import tensor_to_rgb_image_without_normalization
import kornia
from torchvision import datasets
from torch.utils.data import Dataset
train_transform = T.Compose([T.ToTensor()])
train_set = datasets.CIFAR10(root="./cifar10", train=True, download=True, transform=train_transform)
train_iter = iter(train_set)
img, _ = next(train_iter)
# torch.manual_seed(1234)
# img = torch.randn(3, 4, 4).abs_()
# print(img)
# print(img.type)
t = T.Compose([
# T.ToPILImage(),
# T.ToTensor(),
T.Normalize(*NormalizeValues('cifar10')()),
])
# batch = torch.cat([torch.unsqueeze(img, 0), torch.unsqueeze(t(img), 0)], 0)
# new_img = make_grid([img, t(img)])
# print(new_img.shape)
pil = T.ToPILImage()
plt.figure(figsize=(16, 16))
plt.subplot(141)
plt.title("ORIG")
plt.xticks([])
plt.yticks([])
# print(img.permute(1, 2, 0))
plt.imshow(img.permute(1, 2, 0).numpy())
# plt.imshow(pil(img))
plt.subplot(143)
plt.title("kornia_denorm")
plt.xticks([])
plt.yticks([])
# temp = T.ToPILImage(img.double().div_(255))()
# print(img.double().div_(255))
mean, std = [torch.tensor(i) for i in NormalizeValues('cifar10')()]
kornia_img = kornia.enhance.denormalize(t(img).unsqueeze_(0), mean, std)
plt.imshow(kornia_img.squeeze_(0).permute(1, 2, 0))
# plt.imshow(pil(kornia_img.squeeze_(0)))
plt.subplot(142)
plt.title("Norm")
plt.xticks([])
plt.yticks([])
unorm = UnNormalize(*NormalizeValues('cifar10')())
# plt.imshow(t(img).permute(1, 2, 0))
plt.imshow(pil(t(img)))
plt.subplot(144)
plt.title("PIL")
plt.xticks([])
plt.yticks([])
plt.imshow(pil(img))
pretrained model 基於啥 dataset
Imagenet-12
加載部分 pretrained model
After model_dict.update(pretrained_dict)
, the model_dict
may still have keys that pretrained_model
doesn’t have, which will cause a error.
Assume following situation:
pretrained_dict: ['A', 'B', 'C', 'D']
model_dict: ['A', 'B', 'C', 'E']
After pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict} and model_dict.update(pretrained_dict), they are:
pretrained_dict: ['A', 'B', 'C']
model_dict: ['A', 'B', 'C', 'E']
So when performing model.load_state_dict(pretrained_dict)
, model_dict
still has key E that pretrained_dict
doen’t have.
So how about using model.load_state_dict(model_dict)
instead of model.load_state_dict(pretrained_dict)
?
The complete snippet is therefore as follow:
pretrained_dict = ...
model_dict = model.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict)
# 3. load the new state dict
model.load_state_dict(model_dict)
numpy 轉 tensor 的最佳方式
When you are on GPU, torch.Tensor()
will convert your data type to Float.
Actually, torch.Tensor
and torch.FloatTensor
both do same thing.
But I think better way is using torch.tensor()
(note the case of ‘t’ character). It converts your data to tensor but retains data type which is crucial in some methods. You may know that PyTorch and numpy are switchable to each other so if your array is int, your tensor should be int too unless you explicitly change type.
But on top of all these, torch.tensor
is convention because you can define following variables: device, dtype, requires_grad, etc.
Note: using torch.tensor()
allocates new memory to copy the data of tensor. So if you want to avoid copying, use torch.as_tensor(numpy_ndarray)
.
PIL ↔ Tensor
pil_img = Image.open(img)
print(pil_img.size)
pil_to_tensor = transforms.ToTensor()(img).unsqueeze_(0)
print(pil_to_tensor.shape)
tensor_to_pil = transforms.ToPILImage()(pil_to_tensor.squeeze_(0))
print(tensor_to_pil.size)
只使用特定 gpu
CUDA_VISIBLE_DEVICES=1,2 python myscript.py
如何提取特征圖
once you have a trained model, if you want to extract the result of an intermediate layer (say fc7 after the relu), you have a couple of possibilities.
You can either reconstruct the classifier once the model was instantiated, as in the following example:
import torch
import torch.nn as nn
from torchvision import models
model = models.alexnet(pretrained=True)
# remove last fully-connected layer
new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])
model.classifier = new_classifier
Or, if instead you want to extract other parts of the model, you might need to recreate the model structure, and reusing the parts of the pre-trained model in the new model.
import torch
import torch.nn as nn
from torchvision import models
original_model = models.alexnet(pretrained=True)
class AlexNetConv4(nn.Module):
def __init__(self):
super(AlexNetConv4, self).__init__()
self.features = nn.Sequential(
# stop at conv4
*list(original_model.features.children())[:-3]
)
def forward(self, x):
x = self.features(x)
return x
model = AlexNetConv4()
Training with Half Precision
直接看論壇
https://discuss.pytorch.org/t/training-with-half-precision/11815/2
pytorch 數據增廣
imgaug
log_softmax or softmax?
推薦log版更穩定
How are optimizer.step() and loss.backward() related?
placeholder
Convert int into one-hot format
https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507/4
nn.ModuleList
& nn.Sequential()
nn.ModuleList
就像一個 python 列表,用於存儲 nn.Module,使用案例如下
class LinearNet(nn.Module):
def __init__(self, input_size, num_layers, layers_size, output_size):
super(LinearNet, self).__init__()
self.linears = nn.ModuleList([nn.Linear(input_size, layers_size)])
self.linears.extend([nn.Linear(layers_size, layers_size) for i in range(1, self.num_layers-1)])
self.linears.append(nn.Linear(layers_size, output_size)
nn.Sequential
可以按順序構建一個神經網絡
class Flatten(nn.Module):
def forward(self, x):
N, C, H, W = x.size() # read in N, C, H, W
return x.view(N, -1)
simple_cnn = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=7, stride=2),
nn.ReLU(inplace=True),
Flatten(),
nn.Linear(5408, 10),
)
Not really. Maybe there are some situations where you could use both, but the main idea is the following:
In nn.Sequential, the nn.Module's stored inside are connected in a cascaded way. For instance, in the example that I gave, I define a neural network that receives as input an image with 3 channels and outputs 10 neurons. That network is composed by the following blocks, in the following order: Conv2D -> ReLU -> Linear layer. Moreover, an object of type nn.Sequential has a forward() method, so if I have an input image x I can directly call y = simple_cnn(x) to obtain the scores for x. When you define an nn.Sequential you must be careful to make sure that the output size of a block matches the input size of the following block. Basically, it behaves just like a nn.Module
On the other hand, nn.ModuleList does not have a forward() method, because it does not define any neural network, that is, there is no connection between each of the nn.Module's that it stores. You may use it to store nn.Module's, just like you use Python lists to store other types of objects (integers, strings, etc). The advantage of using nn.ModuleList's instead of using conventional Python lists to store nn.Module's is that Pytorch is “aware” of the existence of the nn.Module's inside an nn.ModuleList, which is not the case for Python lists. If you want to understand exactly what I mean, just try to redefine my class LinearNet using a Python list instead of a nn.ModuleList and train it. When defining the optimizer() for that net, you’ll get an error saying that your model has no parameters, because PyTorch does not see the parameters of the layers stored in a Python list. If you use a nn.ModuleList instead, you’ll get no error.
optim.zero_grad()
optimizer.zero_grad()意思是把梯度置零,也就是把loss關於weight的導數變成0.在學習pytorch的時候注意到,對於每個batch大都執行了這樣的操作:
optimizer.zero_grad() ## 梯度清零
preds = model(inputs) ## inference
loss = criterion(preds, targets) ## 求解loss
loss.backward() ## 反向傳播求解梯度
optimizer.step() ## 更新權重參數
- 由於pytorch的動態計算圖,當我們使用loss.backward()和opimizer.step()進行梯度下降更新參數的時候,梯度並不會自動清零。並且這兩個操作是獨立操作。
- backward():反向傳播求解梯度。
- step():更新權重參數。
基於以上幾點,正好說明了pytorch的一個特點是每一步都是獨立功能的操作,因此也就有需要梯度清零的說法,如若不顯式地進 optimizer.zero_grad()這一步操作,backward()的時候就會累加梯度。
tensor.cuda()
該方法不會直接把原始 tensor 放到 gpu 中:
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: True
## | 100%
In [3]: a = torch.zeros(1,2,3,4) ## | 100%
## | 100%
In [4]: a
Out[4]:
tensor([[[[0., 0., 0., 0.],
[0., 0., 0., 0.], [11:05]
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]]])
In [5]: a.cuda()
Out[5]:
tensor([[[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]]], device='cuda:0')
In [6]: a
Out[6]:
tensor([[[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]]])