目錄
定義網絡
梯度反向傳播
梯度更新
面試時的變相問法
參考資料
BPTT(back-propagation through time)算法是常用的訓練RNN的方法,其實本質還是BP算法,只不過RNN處理時間序列數據,所以要基於時間反向傳播,故叫隨時間反向傳播。
BPTT算法在吳恩達和李宏毅等教授的講義中只是稍微提及了一下,並沒有實際操作。本文就實操了一下,彌補這個空缺並附代碼。
定義網絡 |
假設輸入序列為x1,x2,並且只有一個維度;
假設隱藏層為H,也只有一個維度。
前向傳播過程為:
輸入:含有兩個時間步的序列
輸出:基於第二個時間步的隱藏層,與標簽【1,0】計算出來的softmax交叉熵
梯度反向傳播 |
梯度更新 |
對應代碼

import torch import torch.nn as nn import torch.nn.functional as F class RNN(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(RNN, self).__init__() self.hidden_size = hidden_size self.i2h = nn.Linear(input_size + hidden_size, hidden_size, bias=False) self.i2o = nn.Linear(hidden_size, output_size, bias=False) self.softmax = nn.Softmax(dim=1) def forward(self, input, hidden): combined = torch.cat((hidden, input), 1) hidden = self.i2h(combined) output = self.i2o(hidden) # output = self.softmax(output) return output, hidden def initHidden(self): return torch.zeros(1, self.hidden_size) def train(category_tensor, input_tensor): hidden = rnn.initHidden() rnn.zero_grad() for i in range(input_tensor.size()[0]): output, hidden = rnn(input_tensor[i], hidden) loss = criterion(output, category_tensor) loss.backward() # Add parameters' gradients to their values, multiplied by learning rate for p in rnn.parameters(): print("梯度值",p.grad.data) p.data.add_(p.grad.data, alpha=-learning_rate) return output, loss.item() if __name__ == '__main__': n_hidden = 1 n_categories = 2 n_letters = 2 rnn = RNN(n_letters, n_hidden, n_categories) weight_i2h = torch.tensor([ [-0.3435, 0.2170] ]) weight_i2o = torch.tensor([ [0.5131], [-0.7451] ]) rnn.i2h._parameters["weight"].data = weight_i2h # 自定義 rnn.i2o._parameters["weight"].data = weight_i2o # 自定義 for p in rnn.parameters(): print("初始化權重",p.data) criterion = nn.CrossEntropyLoss() learning_rate = 0.1 n_iters = 1 all_losses = [] for iter in range(1, n_iters + 1): category_tensor = torch.tensor([0]) # 第0類,啞編碼:[1, 0] input_tensor = torch.tensor([ [[2.]], # 第1個字符的編碼 [[3.]] # 第2個字符的編碼 ]) output, loss = train(category_tensor, input_tensor) print("迭代次數",iter, output, loss) """ 初始化權重 tensor([[-0.3435, 0.2170]]) 初始化權重 tensor([[ 0.5131], [-0.7451]]) 梯度值 tensor([[-0.1896, -1.0103]]) 梯度值 tensor([[-0.1743], [ 0.1743]]) 迭代次數 1 tensor([[ 0.2575, -0.3740]], grad_fn=<MmBackward>) 0.42643341422080994 """
面試時的變相問法 |
簡述pytorch中的model.zero_grad()是干什么的?什么時候需要調用?
簡述RNN是如何更新參數的?
簡述CNN和RNN更新參數的不同之處?
相信看完此文,大家心中都有答案了吧?
參考資料 |
《21個項目玩轉深度學習:基於Tensorflow的實踐詳解》
https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial