如需了解完整代碼請跳轉到:
https://www.emperinter.info/2020/08/05/change-leaning-rate-by-reducelronplateau-in-pytorch/
緣由
自己之前寫過一個Pytorch學習率更新,其中感覺依據是否loss升高或降低的次數來動態更新學習率,感覺是個挺好玩的東西,自己弄了好久都設置錯誤,今天算是搞出來了!
解析
說明
- torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
在發現loss不再降低或者acc不再提高之后,降低學習率。各參數意義如下:
參數 | 含義 |
---|---|
mode | 'min'模式檢測metric是否不再減小,'max'模式檢測metric是否不再增大; |
factor | 觸發條件后lr*=factor; |
patience | 不再減小(或增大)的累計次數; |
verbose | 觸發條件后print; |
threshold | 只關注超過閾值的顯著變化; |
threshold_mode | 有rel和abs兩種閾值計算模式,rel規則:max模式下如果超過best(1+threshold)為顯著,min模式下如果低於best(1-threshold)為顯著;abs規則:max模式下如果超過best+threshold為顯著,min模式下如果低於best-threshold為顯著; |
cooldown | 觸發一次條件后,等待一定epoch再進行檢測,避免lr下降過速; |
min_lr | 最小的允許lr; |
eps | 如果新舊lr之間的差異小與1e-8,則忽略此次更新。 |
import math
import matplotlib.pyplot as plt
#%matplotlib inline
x = 0
o = []
p = []
o.append(0)
p.append(0.0009575)
while(x < 8):
x += 1
y = 0.0009575 * math.pow(0.35,x)
o.append(x)
p.append(y)
print('%d: %.50f' %(x,y))
plt.plot(o,p,c='red',label='test') #分別為x,y軸對應數據,c:color,label
plt.legend(loc='best') # 顯示label,loc為顯示位置(best為系統認為最好的位置)
plt.show()
難點
我感覺這里面最難的時這幾個參數的選擇,第一個是初始的學習率(我目前接觸的miniest和下面的圖像分類貌似都是0.001,我這里訓練調整時才發現自己設置的為0.0009575,這個值是上一個實驗忘更改了,但發現結果不錯,第一次運行該代碼接近到0.001這么小的損失值),這里面的乘積系數以及判斷說多少次沒有減少(增加)后決定變換學習率都是難以估計的。我自己的最好方法是先按默認不變的0.001來訓練一下(結合tensoarboard )觀察從哪里開始出現問題就可以從這里來確定次數,而乘積系數,個人感覺還是用上面的代碼來獲取一個較為平滑且變化極小的數字來作為選擇。建議在做這種測試時可以把模型先備份一下以免浪費過多的時間!
例子
- 該例子初始學習率為0.0009575,乘積項系數為:0.35,在我的例子中x變化的條件是:累計125次沒有減小則x加1;自己訓練在第一次lr變化后(從0.0009575變化到0.00011729)損失值慢慢取向於0.001(如第一張圖所示),准確率達到69%;
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from datetime import datetime
from torch.utils.tensorboard import SummaryWriter
from torch.optim import *
PATH = './cifar_net_tensorboard_net_width_200_and_chang_lr_by_decrease_0_35^x.pth' # 保存模型地址
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=0)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=0)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)
print("獲取一些隨機訓練數據")
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
print("**********************")
# 設置一個tensorborad
# helper function to show an image
# (used in the `plot_classes_preds` function below)
def matplotlib_imshow(img, one_channel=False):
if one_channel:
img = img.mean(dim=0)
img = img / 2 + 0.5 # unnormalize
npimg = img.cpu().numpy()
if one_channel:
plt.imshow(npimg, cmap="Greys")
else:
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# 設置tensorBoard
# default `log_dir` is "runs" - we'll be more specific here
writer = SummaryWriter('runs/train')
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# create grid of images
img_grid = torchvision.utils.make_grid(images)
# show images
# matplotlib_imshow(img_grid, one_channel=True)
imshow(img_grid)
# write to tensorboard
# writer.add_image('imag_classify', img_grid)
# Tracking model training with TensorBoard
# helper functions
def images_to_probs(net, images):
'''
Generates predictions and corresponding probabilities from a trained
network and a list of images
'''
output = net(images)
# convert output probabilities to predicted class
_, preds_tensor = torch.max(output, 1)
# preds = np.squeeze(preds_tensor.numpy())
preds = np.squeeze(preds_tensor.cpu().numpy())
return preds, [F.softmax(el, dim=0)[i].item() for i, el in zip(preds, output)]
def plot_classes_preds(net, images, labels):
preds, probs = images_to_probs(net, images)
# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(12, 48))
for idx in np.arange(4):
ax = fig.add_subplot(1, 4, idx+1, xticks=[], yticks=[])
matplotlib_imshow(images[idx], one_channel=True)
ax.set_title("{0}, {1:.1f}%\n(label: {2})".format(
classes[preds[idx]],
probs[idx] * 100.0,
classes[labels[idx]]),
color=("green" if preds[idx]==labels[idx].item() else "red"))
return fig
#
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 200, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(200, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# # 把net結構可視化出來
writer.add_graph(net, images)
net.to(device)
·······
·······
·······