pytorch自定義LSTM結構(附代碼)
有時我們可能會需要修改LSTM的結構,比如用分段線性函數替代非線性函數,這篇博客主要寫如何用pytorch自定義一個LSTM結構,並在IMDB數據集上搭建了一個單層反向的LSTM網絡,驗證了自定義LSTM結構的功能。
@
一、整體程序框架
如果要處理一個維度為【batch_size, length, input_dim】的輸入,則需要的LSTM結構如圖1所示:
layers表示LSTM的層數,batch_size表示批處理大小,length表示長度,input_dim表示每個輸入的維度。
其中,每個LSTMcell執行的表達式如下所示:
二、LSTMcell
LSTMcell的計算函數如下所示;其中nn.Parameter表示該張量為模型可訓練參數;
class LSTMCell(nn.Module):
def __init__(self, input_size, hidden_size):
super(LSTMCell, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.weight_cx = nn.Parameter(torch.Tensor(hidden_size, input_size)) #初始化8個權重矩陣
self.weight_ch = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
self.weight_fx = nn.Parameter(torch.Tensor(hidden_size, input_size))
self.weight_fh = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
self.weight_ix = nn.Parameter(torch.Tensor(hidden_size, input_size))
self.weight_ih = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
self.weight_ox = nn.Parameter(torch.Tensor(hidden_size, input_size))
self.weight_oh = nn.Parameter(torch.Tensor(hidden_size, hidden_size))
self.bias_c = nn.Parameter(torch.Tensor(hidden_size)) #初始化4個偏置bias
self.bias_f = nn.Parameter(torch.Tensor(hidden_size))
self.bias_i = nn.Parameter(torch.Tensor(hidden_size))
self.bias_o = nn.Parameter(torch.Tensor(hidden_size))
self.reset_parameters() #初始化參數
def reset_parameters(self):
stdv = 1.0 / math.sqrt(self.hidden_size)
for weight in self.parameters():
weight.data.uniform_(-stdv, stdv)
def forward(self, input, hc):
h, c = hc
i = F.linear(input, self.weight_ix, self.bias_i) + F.linear(h, self.weight_ih) #執行矩陣乘法運算
f = F.linear(input, self.weight_fx, self.bias_f) + F.linear(h, self.weight_fh)
g = F.linear(input, self.weight_cx, self.bias_c) + F.linear(h, self.weight_ch)
o = F.linear(input, self.weight_ox, self.bias_o) + F.linear(h, self.weight_oh)
i = F.sigmoid(i) #激活函數
f = F.sigmoid(f)
g = F.tanh(g)
o = F.sigmoid(o)
c = f * c + i * g
h = o * F.tanh(c)
return h, c
三、LSTM整體程序
如圖1所示,一個完整的LSTM是由很多LSTMcell操作組成的,LSTMcell的數量,取決於layers的大小;每個LSTMcell運行的次數取決於length的大小
1. 多層LSTMcell
需要的庫函數:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import math
假如我們設計的LSTM層數layers大於1,第一層的LSTM輸入維度是input_dim,輸出維度是hidden_dim,那么其他各層的輸入維度和輸出維度都是hidden_dim(下層的輸出會成為上層的輸入),因此,定義layers個LSTMcell的函數如下所示:
self.lay0 = LSTMCell(input_size,hidden_size)
if layers > 1:
for i in range(1, layers):
lay = LSTMCell(hidden_size,hidden_size)
setattr(self, 'lay{}'.format(i), lay)
其中setattr()函數的作用是,把lay變成self.lay 'i' ,如果layers = 3,那么這段程序就和下面這段程序是一樣的
self.lay0 = LSTMCell(input_size,hidden_size)
self.lay1 = LSTMCell(hidden_size,hidden_size)
self.lay2 = LSTMCell(hidden_size,hidden_size)
2. 多層LSTM處理不同長度的輸入
每個LSTMcell都需要(h_t-1和c_t-1)作為狀態信息輸入,若沒有指定初始狀態,我們就自定義一個值為0的初始狀態
if initial_states is None:
zeros = Variable(torch.zeros(input.size(0), self.hidden_size))
initial_states = [(zeros, zeros), ] * self.layers #初始狀態
states = initial_states
outputs = []
length = input.size(1)
for t in range(length):
x = input[:, t, :]
for l in range(self.layers):
hc = getattr(self, 'lay{}'.format(l))(x, states[l])
states[l] = hc #如圖1所示,左面的輸出(h,c)做右面的狀態信息輸入
x = hc[0] #如圖1所示,下面的LSTMcell的輸出h做上面的LSTMcell的輸入
outputs.append(hc) #將得到的最上層的輸出存儲起來
其中getattr()函數的作用是,獲得括號內的字符串所代表的屬性;若l = 3,則下面這兩段代碼等價:
hc = getattr(self, 'lay{}'.format(l))(x, states[l])
hc = self.lay3(x, states[3])
3. 整體程序
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size, layers=1, sequences=True):
super(LSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.layers = layers
self.sequences = sequences
self.lay0 = LSTMCell(input_size,hidden_size)
if layers > 1:
for i in range(1, layers):
lay = LSTMCell(hidden_size,hidden_size)
setattr(self, 'lay{}'.format(i), lay)
def forward(self, input, initial_states=None):
if initial_states is None:
zeros = Variable(torch.zeros(input.size(0), self.hidden_size))
initial_states = [(zeros, zeros), ] * self.layers
states = initial_states
outputs = []
length = input.size(1)
for t in range(length):
x = input[:, t, :]
for l in range(self.layers):
hc = getattr(self, 'lay{}'.format(l))(x, states[l])
states[l] = hc
x = hc[0]
outputs.append(hc)
if self.sequences: #是否需要圖1最上層里從左到右所有的LSTMcell的輸出
hs, cs = zip(*outputs)
h = torch.stack(hs).transpose(0, 1)
c = torch.stack(cs).transpose(0, 1)
output = (h, c)
else:
output = outputs[-1] # #只輸出圖1最右上角的LSTMcell的輸出
return output
三、反向LSTM
定義兩個LSTM,然后將輸入input1反向,作為input2,就可以了
代碼如下所示:
import torch
input1 = torch.rand(2,3,4)
inp = input1.unbind(1)[::-1] #從batch_size所在維度拆開,並倒序排列
input2 = inp[0]
for i in range(1, len(inp)): #倒序后的tensor連接起來
input2 = torch.cat((input2, inp[i]), dim=1)
x, y, z = input1.size() #兩個輸入同維度
input2 = input2.resize(x, y, z)
OK,反向成功
四、實驗
在IMDB上搭建一個單層,雙向,LSTM結構,加一個FC層;
self.rnn1 = LSTM(embedding_dim, hidden_dim, layers = n_layers, sequences=False)
self.rnn2 = LSTM(embedding_dim, hidden_dim, layers = n_layers, sequences=False)
self.fc = nn.Linear(hidden_dim * 2, output_dim)
運行結果如圖:
時間有限,只迭代了6次,實驗證明,自定義的RNN程序,可以收斂。