參考:https://pytorch.org/docs/master/optim.html#how-to-adjust-learning-rate
torch.optim.lr_scheduler提供了幾種方法來根據迭代的數量來調整學習率
自己手動定義一個學習率衰減函數:
def adjust_learning_rate(optimizer, epoch, lr): """Sets the learning rate to the initial LR decayed by 10 every 2 epochs""" lr *= (0.1 ** (epoch // 2)) for param_group in optimizer.param_groups: param_group['lr'] = lr
optimizer通過param_group來管理參數組。param_group中保存了參數組及其對應的學習率,動量等等
使用:
model = AlexNet(num_classes=2) optimizer = optim.SGD(params = model.parameters(), lr=10) plt.figure() x = list(range(10)) y = [] lr_init = optimizer.param_groups[0]['lr'] for epoch in range(10): adjust_learning_rate(optimizer, epoch, lr_init) lr = optimizer.param_groups[0]['lr'] print(epoch, lr) y.append(lr) plt.plot(x,y)
返回:
0 10.0 1 10.0 2 1.0 3 1.0 4 0.10000000000000002 5 0.10000000000000002 6 0.010000000000000002 7 0.010000000000000002 8 0.0010000000000000002 9 0.0010000000000000002
如圖:
舉例先導入所需的庫:
import torch import torch.optim as optim from torch.optim import lr_scheduler from torchvision.models import AlexNet import matplotlib.pyplot as plt
1.LambdaLR
CLASS torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
將每個參數組的學習率設置為初始lr乘以給定函數。當last_epoch=-1時,將初始lr設置為lr。
參數:
-
optimizer (Optimizer) – 封裝好的優化器
-
lr_lambda (function or list) –當是一個函數時,需要給其一個整數參數,使其計算出一個乘數因子,用於調整學習率,通常該輸入參數是epoch數目;或此類函數的列表,根據在optimator.param_groups中的每組的長度決定lr_lambda的函數個數,如下報錯。
-
last_epoch (int) – 最后一個迭代epoch的索引. Default: -1.
如:
optimizer = optim.SGD(params = model.parameters(), lr=0.05) lambda1 = lambda epoch:epoch // 10 #根據epoch計算出與lr相乘的乘數因子為epoch//10的值 lambda2 = lambda epoch:0.95 ** epoch #根據epoch計算出與lr相乘的乘數因子為0.95 ** epoch的值 scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
報錯:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-2-c02d2d9ffc0d> in <module> 4 lambda1 = lambda epoch:epoch // 10 5 lambda2 = lambda epoch:0.95 ** epoch ----> 6 scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) 7 plt.figure() 8 x = list(range(40)) /anaconda3/envs/deeplearning/lib/python3.6/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, lr_lambda, last_epoch) 83 if len(lr_lambda) != len(optimizer.param_groups): 84 raise ValueError("Expected {} lr_lambdas, but got {}".format( ---> 85 len(optimizer.param_groups), len(lr_lambda))) 86 self.lr_lambdas = list(lr_lambda) 87 self.last_epoch = last_epoch ValueError: Expected 1 lr_lambdas, but got 2
說明這里只需要一個lambda函數
舉例:
1)使用的是lambda2
model = AlexNet(num_classes=2) optimizer = optim.SGD(params = model.parameters(), lr=0.05) #下面是兩種lambda函數 #epoch=0到9時,epoch//10=0,所以這時的lr = 0.05*0=0 #epoch=10到19時,epoch//10=1,所以這時的lr = 0.05*1=0.05 lambda1 = lambda epoch:epoch // 10 #當epoch=0時,lr = lr * (0.2**0)=0.05;當epoch=1時,lr = lr * (0.2**1)=0.01 lambda2 = lambda epoch:0.2 ** epoch scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda2) plt.figure() x = list(range(40)) y = [] for epoch in range(40): scheduler.step() lr = scheduler.get_lr() print(epoch, scheduler.get_lr()[0]) y.append(scheduler.get_lr()[0]) plt.plot(x,y)
返回:

0 0.05 1 0.010000000000000002 2 0.0020000000000000005 3 0.00040000000000000013 4 8.000000000000002e-05 5 1.6000000000000006e-05 6 3.2000000000000015e-06 7 6.400000000000002e-07 8 1.2800000000000006e-07 9 2.5600000000000014e-08 10 5.120000000000003e-09 11 1.0240000000000006e-09 12 2.0480000000000014e-10 13 4.096000000000003e-11 14 8.192000000000007e-12 15 1.6384000000000016e-12 16 3.276800000000003e-13 17 6.553600000000007e-14 18 1.3107200000000014e-14 19 2.621440000000003e-15 20 5.242880000000006e-16 21 1.0485760000000013e-16 22 2.0971520000000027e-17 23 4.194304000000006e-18 24 8.388608000000012e-19 25 1.6777216000000025e-19 26 3.355443200000005e-20 27 6.71088640000001e-21 28 1.3421772800000022e-21 29 2.6843545600000045e-22 30 5.368709120000009e-23 31 1.0737418240000018e-23 32 2.147483648000004e-24 33 4.294967296000008e-25 34 8.589934592000016e-26 35 1.7179869184000033e-26 36 3.435973836800007e-27 37 6.871947673600015e-28 38 1.3743895347200028e-28 39 2.748779069440006e-29
如圖:
也可以寫成下面的格式:
def lambda_rule(epoch): lr_l = 0.2 ** epoch return lr_l scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_rule)
2)使用的是lambda1函數:
返回:

0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 5 0.0 6 0.0 7 0.0 8 0.0 9 0.0 10 0.05 11 0.05 12 0.05 13 0.05 14 0.05 15 0.05 16 0.05 17 0.05 18 0.05 19 0.05 20 0.1 21 0.1 22 0.1 23 0.1 24 0.1 25 0.1 26 0.1 27 0.1 28 0.1 29 0.1 30 0.15000000000000002 31 0.15000000000000002 32 0.15000000000000002 33 0.15000000000000002 34 0.15000000000000002 35 0.15000000000000002 36 0.15000000000000002 37 0.15000000000000002 38 0.15000000000000002 39 0.15000000000000002
如圖:
其他函數:
load_state_dict(state_dict)
下載調試器狀態
參數:
- state_dict (dict) – 調試器狀態。應為
state_dict()
調用返回的對象.
state_dict()
以字典格式返回調試器的狀態
它為self.__dict__中的每個變量都包含一個條目,這不是優化器。只有當學習率的lambda函數是可調用對象時才會保存它們,而當它們是函數或lambdas時則不會保存它們。
2.StepLR
CLASS torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
每個step_size時間步長后使每個參數組的學習率降低。注意,這種衰減可以與此調度程序外部對學習率的其他更改同時發生。當last_epoch=-1時,將初始lr設置為lr。
參數:
-
optimizer (Optimizer) – 封裝的優化器
-
step_size (int) – 學習率衰減的周期
-
gamma (float) – 學習率衰減的乘數因子。Default: 0.1.
-
last_epoch (int) – 最后一個迭代epoch的索引. Default: -1.
舉例:
model = AlexNet(num_classes=2) optimizer = optim.SGD(params = model.parameters(), lr=0.05) #即每10次迭代,lr = lr * gamma scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) plt.figure() x = list(range(40)) y = [] for epoch in range(40): scheduler.step() lr = scheduler.get_lr() print(epoch, scheduler.get_lr()[0]) y.append(scheduler.get_lr()[0]) plt.plot(x,y)
返回:

0 0.05 1 0.05 2 0.05 3 0.05 4 0.05 5 0.05 6 0.05 7 0.05 8 0.05 9 0.05 10 0.005000000000000001 11 0.005000000000000001 12 0.005000000000000001 13 0.005000000000000001 14 0.005000000000000001 15 0.005000000000000001 16 0.005000000000000001 17 0.005000000000000001 18 0.005000000000000001 19 0.005000000000000001 20 0.0005000000000000001 21 0.0005000000000000001 22 0.0005000000000000001 23 0.0005000000000000001 24 0.0005000000000000001 25 0.0005000000000000001 26 0.0005000000000000001 27 0.0005000000000000001 28 0.0005000000000000001 29 0.0005000000000000001 30 5.0000000000000016e-05 31 5.0000000000000016e-05 32 5.0000000000000016e-05 33 5.0000000000000016e-05 34 5.0000000000000016e-05 35 5.0000000000000016e-05 36 5.0000000000000016e-05 37 5.0000000000000016e-05 38 5.0000000000000016e-05 39 5.0000000000000016e-05
如圖:
3.MultiStepLR
CLASS torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
當迭代數epoch達到某個里程碑時,每個參數組的學習率將被gamma衰減。注意,這種衰減可以與此調度程序外部對學習率的其他更改同時發生。當last_epoch=-1時,將初始lr設置為lr。
參數:
-
optimizer (Optimizer) – 封裝的優化器
-
milestones (list) –迭代epochs指數列表. 列表中的值必須是增長的.
-
gamma (float) – 學習率衰減的乘數因子。Default: 0.1.
-
last_epoch (int) – 最后一個迭代epoch的索引. Default: -1.
舉例:
model = AlexNet(num_classes=2) optimizer = optim.SGD(params = model.parameters(), lr=0.05) #在指定的epoch值,如[10,15,25,30]處對學習率進行衰減,lr = lr * gamma scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[10,15,25,30], gamma=0.1) plt.figure() x = list(range(40)) y = [] for epoch in range(40): scheduler.step() lr = scheduler.get_lr() print(epoch, scheduler.get_lr()[0]) y.append(scheduler.get_lr()[0]) plt.plot(x,y)
返回:

0 0.05 1 0.05 2 0.05 3 0.05 4 0.05 5 0.05 6 0.05 7 0.05 8 0.05 9 0.05 10 0.005000000000000001 11 0.005000000000000001 12 0.005000000000000001 13 0.005000000000000001 14 0.005000000000000001 15 0.0005000000000000001 16 0.0005000000000000001 17 0.0005000000000000001 18 0.0005000000000000001 19 0.0005000000000000001 20 0.0005000000000000001 21 0.0005000000000000001 22 0.0005000000000000001 23 0.0005000000000000001 24 0.0005000000000000001 25 5.0000000000000016e-05 26 5.0000000000000016e-05 27 5.0000000000000016e-05 28 5.0000000000000016e-05 29 5.0000000000000016e-05 30 5.000000000000001e-06 31 5.000000000000001e-06 32 5.000000000000001e-06 33 5.000000000000001e-06 34 5.000000000000001e-06 35 5.000000000000001e-06 36 5.000000000000001e-06 37 5.000000000000001e-06 38 5.000000000000001e-06 39 5.000000000000001e-06
如圖:
4.ExponentialLR
CLASS torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
每個epoch都對每個參數組的學習率進行衰減。當last_epoch=-1時,將初始lr設置為lr。
參數:
-
optimizer (Optimizer) – 封裝的優化器
-
gamma (float) – 學習率衰減的乘數因子
-
last_epoch (int) – 最后一個迭代epoch的索引. Default: -1.
舉例:
model = AlexNet(num_classes=2) optimizer = optim.SGD(params = model.parameters(), lr=0.2) #即每個epoch都衰減lr = lr * gamma,即進行指數衰減 scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.2) plt.figure() x = list(range(10)) y = [] for epoch in range(10): scheduler.step() lr = scheduler.get_lr() print(epoch, scheduler.get_lr()[0]) y.append(scheduler.get_lr()[0]) plt.plot(x,y)
返回:

0 0.2 1 0.04000000000000001 2 0.008000000000000002 3 0.0016000000000000005 4 0.0003200000000000001 5 6.400000000000002e-05 6 1.2800000000000006e-05 7 2.560000000000001e-06 8 5.120000000000002e-07 9 1.0240000000000006e-07
如圖:
5.CosineAnnealingLR
CLASS torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
使用余弦退火調度設置各參數組的學習率,其中ηmax設為初始lr, Tcur為SGDR上次重啟后的時間間隔個數:
當last_epoch=-1時,將初始lr設置為lr。注意,由於調度是遞歸定義的,所以其他操作符可以在此調度程序之外同時修改學習率。如果學習速率僅由該調度程序設置。利用cos曲線降低學習率,該方法來源SGDR,學習率變換如下公式:
該方法已在SGDR中提出SGDR: Stochastic Gradient Descent with Warm Restarts。注意,這只實現了SGDR的余弦退火部分,而沒有重新啟動。
參數:
-
optimizer (Optimizer) – 封裝的優化器
-
T_max (int) – 迭代的最大數量
-
eta_min (float) – 最小學習率 Default: 0.
-
last_epoch (int) – 最后一個迭代epoch的索引. Default: -1.
舉例:
model = AlexNet(num_classes=2) optimizer = optim.SGD(params = model.parameters(), lr=10) #根據式子進行計算 scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=2) plt.figure() x = list(range(10)) y = [] for epoch in range(10): scheduler.step() lr = scheduler.get_lr() print(epoch, scheduler.get_lr()[0]) y.append(scheduler.get_lr()[0]) plt.plot(x,y)
返回:
0 10.0 1 5.0 2 0.0 3 4.999999999999999 4 10.0 5 5.000000000000001 6 0.0 7 4.999999999999998 8 10.0 9 5.000000000000002
如圖:
該例子前三個學習率結果的計算方式:
6.ReduceLROnPlateau(動態衰減lr)
CLASS torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
torch.optim.lr_scheduler.ReduceLROnPlateau
允許基於一些驗證測量對學習率進行動態的下降
當評價指標停止改進時,降低學習率。一旦學習停滯不前,模型通常會從將學習率降低2-10倍中獲益。這個調度器讀取一個度量量,如果在“patience”時間內沒有看到改進,那么學習率就會降低。
參數:
-
optimizer (Optimizer) – 封裝的優化器
-
mode (str) – min, max兩個模式中一個。在min模式下,當監測的數量停止下降時,lr會減少;在max模式下,當監視的數量停止增加時,它將減少。默認值:“分鍾”。
-
factor (float) – 學習率衰減的乘數因子。new_lr = lr * factor. Default: 0.1.
-
patience (int) – 沒有改善的迭代epoch數量,這之后學習率會降低。例如,如果patience = 2,那么我們將忽略前2個沒有改善的epoch,如果loss仍然沒有改善,那么我們只會在第3個epoch之后降低LR。Default:10。
-
verbose (bool) – 如果為真,則為每次更新打印一條消息到stdout. Default:
False
. -
threshold (float) – 閾值,為衡量新的最優值,只關注顯著變化. Default: 1e-4.
-
threshold_mode (str) – rel, abs兩個模式中一個. 在rel模式的“max”模式下的計算公式為dynamic_threshold = best * (1 + threshold),或在“min”模式下的公式為best * (1 - threshold)。在abs模式下的“max”模式下的計算公式為dynamic_threshold = best + threshold,在“min”模式下的公式為的best - threshold. Default: ‘rel’.
-
cooldown (int) – 減少lr后恢復正常操作前等待的時間間隔. Default: 0.
-
min_lr (float or list) – 標量或標量列表。所有參數組或每組的學習率的下界. Default: 0.
-
eps (float) – 作用於lr的最小衰減。如果新舊lr之間的差異小於eps,則忽略更新. Default: 1e-8.
舉例:
import torchvision.models as models import torch.nn as nn model = models.resnet34(pretrained=True) fc_features = model.fc.in_features model.fc = nn.Linear(fc_features, 2) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(params = model.parameters(), lr=10) scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min') inputs = torch.randn(4,3,224,224) labels = torch.LongTensor([1,1,0,1]) plt.figure() x = list(range(60)) y = [] for epoch in range(60): optimizer.zero_grad() outputs = model(inputs) #print(outputs) loss = criterion(outputs, labels) print(loss) loss.backward() scheduler.step(loss) optimizer.step() lr = optimizer.param_groups[0]['lr'] print(epoch, lr) y.append(lr) plt.plot(x,y)
監督loss,如果loss在patient=10的epoch中沒有改進,那么lr就會衰減
返回:

tensor(0.7329, grad_fn=<NllLossBackward>) 0 10 tensor(690.2101, grad_fn=<NllLossBackward>) 1 10 tensor(2359.7373, grad_fn=<NllLossBackward>) 2 10 tensor(409365., grad_fn=<NllLossBackward>) 3 10 tensor(240161.7969, grad_fn=<NllLossBackward>) 4 10 tensor(3476952.2500, grad_fn=<NllLossBackward>) 5 10 tensor(5098666.5000, grad_fn=<NllLossBackward>) 6 10 tensor(719.7433, grad_fn=<NllLossBackward>) 7 10 tensor(6.7871, grad_fn=<NllLossBackward>) 8 10 tensor(5.5356, grad_fn=<NllLossBackward>) 9 10 tensor(4.2844, grad_fn=<NllLossBackward>) 10 10 tensor(3.0342, grad_fn=<NllLossBackward>) 11 1.0 tensor(2.9092, grad_fn=<NllLossBackward>) 12 1.0 tensor(2.7843, grad_fn=<NllLossBackward>) 13 1.0 tensor(2.6593, grad_fn=<NllLossBackward>) 14 1.0 tensor(2.5343, grad_fn=<NllLossBackward>) 15 1.0 tensor(2.4093, grad_fn=<NllLossBackward>) 16 1.0 tensor(2.2844, grad_fn=<NllLossBackward>) 17 1.0 tensor(2.1595, grad_fn=<NllLossBackward>) 18 1.0 tensor(2.0346, grad_fn=<NllLossBackward>) 19 1.0 tensor(1.9099, grad_fn=<NllLossBackward>) 20 1.0 tensor(1.7853, grad_fn=<NllLossBackward>) 21 1.0 tensor(1.6610, grad_fn=<NllLossBackward>) 22 0.1 tensor(1.6486, grad_fn=<NllLossBackward>) 23 0.1 tensor(1.6362, grad_fn=<NllLossBackward>) 24 0.1 tensor(1.6238, grad_fn=<NllLossBackward>) 25 0.1 tensor(1.6114, grad_fn=<NllLossBackward>) 26 0.1 tensor(1.5990, grad_fn=<NllLossBackward>) 27 0.1 tensor(1.5866, grad_fn=<NllLossBackward>) 28 0.1 tensor(1.5743, grad_fn=<NllLossBackward>) 29 0.1 tensor(1.5619, grad_fn=<NllLossBackward>) 30 0.1 tensor(1.5496, grad_fn=<NllLossBackward>) 31 0.1 tensor(1.5372, grad_fn=<NllLossBackward>) 32 0.1 tensor(1.5249, grad_fn=<NllLossBackward>) 33 0.010000000000000002 tensor(1.5236, grad_fn=<NllLossBackward>) 34 0.010000000000000002 tensor(1.5224, grad_fn=<NllLossBackward>) 35 0.010000000000000002 tensor(1.5212, grad_fn=<NllLossBackward>) 36 0.010000000000000002 tensor(1.5199, grad_fn=<NllLossBackward>) 37 0.010000000000000002 tensor(1.5187, grad_fn=<NllLossBackward>) 38 0.010000000000000002 tensor(1.5175, grad_fn=<NllLossBackward>) 39 0.010000000000000002 tensor(1.5163, grad_fn=<NllLossBackward>) 40 0.010000000000000002 tensor(1.5150, grad_fn=<NllLossBackward>) 41 0.010000000000000002 tensor(1.5138, grad_fn=<NllLossBackward>) 42 0.010000000000000002 tensor(1.5126, grad_fn=<NllLossBackward>) 43 0.010000000000000002 tensor(1.5113, grad_fn=<NllLossBackward>) 44 0.0010000000000000002 tensor(1.5112, grad_fn=<NllLossBackward>) 45 0.0010000000000000002 tensor(1.5111, grad_fn=<NllLossBackward>) 46 0.0010000000000000002 tensor(1.5110, grad_fn=<NllLossBackward>) 47 0.0010000000000000002 tensor(1.5108, grad_fn=<NllLossBackward>) 48 0.0010000000000000002 tensor(1.5107, grad_fn=<NllLossBackward>) 49 0.0010000000000000002 tensor(1.5106, grad_fn=<NllLossBackward>) 50 0.0010000000000000002 tensor(1.5105, grad_fn=<NllLossBackward>) 51 0.0010000000000000002 tensor(1.5103, grad_fn=<NllLossBackward>) 52 0.0010000000000000002 tensor(1.5102, grad_fn=<NllLossBackward>) 53 0.0010000000000000002 tensor(1.5101, grad_fn=<NllLossBackward>) 54 0.0010000000000000002 tensor(1.5100, grad_fn=<NllLossBackward>) 55 0.00010000000000000003 tensor(1.5100, grad_fn=<NllLossBackward>) 56 0.00010000000000000003 tensor(1.5099, grad_fn=<NllLossBackward>) 57 0.00010000000000000003 tensor(1.5099, grad_fn=<NllLossBackward>) 58 0.00010000000000000003 tensor(1.5099, grad_fn=<NllLossBackward>) 59 0.00010000000000000003
第一個loss為0.7329,一直向后的patient=10的10個epoch中都沒有loss小於它,所以根據mode='min',lr = lr*factor=lr * 0.1,所以lr從10變為了1.0
如圖:
⚠️該函數沒有get_lr
(),所以使用optimizer.param_groups[0]['lr']得到當前的學習率
7.CyclicLR
CLASS torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1)
根據循環學習速率策略(CLR)設置各參數組的學習速率。該策略以恆定的頻率循環兩個邊界之間的學習率,論文 Cyclical Learning Rates for Training Neural Networks中進行詳細介紹。兩個邊界之間的距離可以按每次迭代或每次循環進行縮放。
循環學習率策略會在每批batch數據之后改變學習率。該類的step()函數應在使用批處理進行訓練后調用。
該類有三個內置策略:
- “triangular”:一個基本的三角形周期w,無振幅縮放
- “triangular2”:一種基本的三角形周期,每個周期的初始振幅乘以一半。
- “exp_range”:在每次循環迭代時,初始振幅按**(循環迭代)縮放的循環。
- This implementation was adapted from the github repo: bckenstler/CLR
- 參數:
-
optimizer (Optimizer) – 封裝的優化器
-
max_lr (float or list) – 各參數組在循環中的學習率上限。在功能上,它定義了周期振幅(max_lr - base_lr)。任意周期的lr是base_lr和振幅的某種比例的和;因此,根據縮放函數,實際上可能無法達到max_lr。
-
step_size_up (int) – 在周期的上升部分中,訓練迭代的次數. Default: 2000
-
step_size_down (int) – 在一個周期的下降部分的訓練迭代次數。如果step_size_down為None,則將其設置為step_size_up. Default: None
-
mode (str) – {triangular, triangular2, exp_range}三個模式之一.值對應於上面詳細說明的策略。如果scale_fn不為None,則忽略該參數. Default: ‘triangular’
-
gamma (float) – 在' exp_range '縮放函數中的常量 :計算公式為gamma**(循環迭代)。Default: 1.0
-
scale_fn (function) – 由一個參數lambda函數定義的自定義縮放策略,其中對於所有x >= 0, 0 <= scale_fn(x) <= 1。如果指定,則忽略“mode”. Default: None
-
scale_mode (str) – {‘cycle’, ‘iterations’}.定義scale_fn是根據循環數計算還是根據循環迭代(從循環開始的訓練迭代)計算. Default: ‘cycle’
-
cycle_momentum (bool) –
如果為
True
,momentum與“base_momentum”和“max_momentum”之間的學習率成反比。 . Default: True -
base_momentum (float or list) – 初始momentum是各參數組在周期中的下界. Default: 0.8
-
max_momentum (float or list) – 各參數組在周期中的最大momentum邊界。在功能上,它定義了周期振幅(max_momentum - base_momentum)。任意周期的momentum等於最大動量與振幅的比例之差;因此,根據縮放函數,實際上可能無法達到base_momentum. Default: 0.9
-
last_epoch (int) – 最后一批batch的索引。此參數用於恢復訓練工作。因為step()應該在每個批處理之后調用,而不是在每個epoch之后調用,所以這個數字表示計算的批處理總數,而不是計算的epoch總數。當last_epoch=-1時,調度將從頭開始. Default: -1
舉例:
出錯:找不到CyclicLR,因為我1.0.1版本中沒有這個類:
AttributeError: module 'torch.optim.lr_scheduler' has no attribute 'CyclicLR'
然后更新torch到最新版本1.1.0:
(deeplearning) userdeMBP:ageAndGender user$ pip install --upgrade torch torchvision Collecting torch ... Installing collected packages: torch Found existing installation: torch 1.0.1.post2 Uninstalling torch-1.0.1.post2: Successfully uninstalled torch-1.0.1.post2 Successfully installed torch-1.1.0
然后又報錯:
(deeplearning) userdeMBP:ageAndGender user$ python Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import torch Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/anaconda3/envs/deeplearning/lib/python3.6/site-packages/torch/__init__.py", line 79, in <module> from torch._C import * ImportError: dlopen(/anaconda3/envs/deeplearning/lib/python3.6/site-packages/torch/_C.cpython-36m-darwin.so, 9): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib Referenced from: /anaconda3/envs/deeplearning/lib/python3.6/site-packages/torch/lib/libshm.dylib Reason: image not found >>>
沒能解決這個問題,小伙伴有解決了的希望能告知
該類詳情可見https://blog.csdn.net/u013166817/article/details/86503899
pix2pix代碼相關部分為:
def get_scheduler(optimizer, opt): """返回學習率調試器 Parameters: optimizer -- 網絡使用的優化器 opt (option class) -- 存儲的所有的實驗標記,需要是BaseOptions的子類 opt.lr_policy是學習率策略的名字,如: linear | step | plateau | cosine 對於'linear',對前<opt.niter>個迭代(默認為100)中我們保持相同的學習率,並在下面的<opt.niter_decay>個迭代(m默認為100)中線性衰減學習率到0 對於其他的調試器(step, plateau, and cosine),使用默認的pytorch配置 See https://pytorch.org/docs/stable/optim.html for more details. """ if opt.lr_policy == 'linear': def lambda_rule(epoch): #epoch_count表示epoch開始的值,默認為1 lr_l = 1.0 - max(0, epoch + opt.epoch_count - opt.niter) / float(opt.niter_decay + 1) return lr_l scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_rule) elif opt.lr_policy == 'step': scheduler = lr_scheduler.StepLR(optimizer, step_size=opt.lr_decay_iters, gamma=0.1) elif opt.lr_policy == 'plateau': scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.2, threshold=0.01, patience=5) elif opt.lr_policy == 'cosine': scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=opt.niter, eta_min=0) else: return NotImplementedError('learning rate policy [%s] is not implemented', opt.lr_policy) return scheduler