1. 詞嵌入

nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, 
             max_norm=None, norm_type=2.0, scale_grad_by_freq=False, 
             sparse=False, _weight=None)

其為一個簡單的 存儲固定大小的詞典 的 嵌入向量的查找表
- 意思是說，給一個編號，嵌入層就能 返回這個編號對應的嵌入向量（嵌入向量反映了各個編號代表的符號之間的語義關系）
輸入：一個編號列表，輸出：對應的符號嵌入向量列表。
num_embeddings(int)：詞典的大小尺寸，比如總共出現5000個詞，那就輸入5000。此時index為（0-4999）;
embedding_dim(int)：嵌入向量的維度，即用多少維來表示一個符號;
padding_idx(int,可選)：比如，輸入長度為100，但是每次的句子長度並不一樣，后面就需要用統一的數字填充，而這里就是指定這個數字;
max_norm(float,可選)：最大范數，如果嵌入向量的范數超過了這個界限，就要進行再歸一化;
norm_type (float, 可選)：指定利用什么范數計算，並用於對比max_norm，默認為2范數;
scale_grad_by_freq (boolean, 可選)：根據單詞在mini-batch中出現的頻率，對梯度進行放縮，默認為False;
sparse (bool, 可選)：若為True，則與權重矩陣相關的梯度轉變為稀疏張量;

import torch
from torch import nn

# 給單詞編索引號
word_to_idx = {'hello':0, 'world':1}                                      
# 得到目標單詞索引
lookup_tensor = torch.tensor([word_to_idx['hello']], dtype=torch.long)    

embeds = nn.Embedding(num_embeddings=2, embedding_dim=5)
# 傳入單詞的index，返回對應的嵌入向量
hello_embed = embeds(lookup_tensor)               
print(hello_embed)

tensor([[ 0.3951,  0.6753,  0.2209, -2.4807, -0.6213]],
       grad_fn=<EmbeddingBackward>)

Tip：

上面nn.Embedding的那張表是沒有初始化的，得到的嵌入向量是隨機生成的
初始化一般采用現成的編碼方式，把 word2vec 或者 GloVe 下載下來，數據內容填充進表。

下面直接使用GloVe方式進行查表操作（提前pip install pytorch-nlp）

from torchnlp.word_to_vector import GloVe

# 下載太漫長
vectors = GloVe()
print(vectors('hello'))

2. nn.RNN

RNN詳解

nn.RNN的數據處理如下圖所示。
每次向網絡中輸入batch個樣本，每個時刻處理的是該時刻的batch個樣本，因此 \(x_t\) 是shape為[batch,feature_len]的Tensor。
- 例如，輸入3句話，每句話10個單詞（T_x），每個單詞用100維的向量表示，那么seq_len=10，batch=3，feature_len=100。
隱藏記憶單元h 的shape是二維的[batch,hidden_len]，其中hidden_len是一個可以自定的超參數
- 例如，可以取為20，表示每個樣本用20長度的向量記錄。

Tip：x相當於seq_len個\(x_t\)

2.1 nn.RNN參數

nn.RNN(input_size, hidden_size, num_layers=1, 
       nonlinearity=tanh, bias=True, batch_first=False, 
       dropout=0, bidirectional=False)

參數：

input_size：輸入特征的維度
- 一般rnn中輸入的是詞向量，那么 input_size 就等於 一個詞向量的維度，即feature_len;
hidden_size：隱藏層神經元個數
- 即，輸出的維度（因為rnn輸出為各個時間步上的隱藏狀態）;
num_layers：網絡的層數;
nonlinearity：激活函數;
bias：是否使用偏置;
batch_first：輸入數據的形式，默認是 False，就是這樣形式，(seq(num_step), batch, input_dim)
- 即，將序列長度放在第一位，batch 放在第二位;
dropout：是否應用dropout, 默認不使用，如若使用將其設置成一個0-1的數字即可;
birdirectional：是否使用雙向的 rnn，默認是 False;

from torch import nn

# 詞向量維度100維，輸出維度10
rnn = nn.RNN(100, 10)                
print(rnn._parameters.keys())        
# odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

# W_hh: W_aa, W_ih: W_xa
print(rnn.weight_hh_l0.shape, rnn.weight_ih_l0.shape)         
# torch.Size([10, 10]) torch.Size([10, 100])

# batch size: 10
print(rnn.bias_hh_l0.shape, rnn.bias_ih_l0.shape)             
# torch.Size([10]) torch.Size([10])

Tip：

bias只取hidden_len，等到作加法時會廣播到所有的batch上。

2.2 forward前向傳播

out, ht = forward(x, h0)

x：[seq_len, batch, feature_len] 它是一次性將所有時刻特征喂入的，不需要每次喂入當前時刻的\(x_t\);
h0/ht：[num_layers, batch, hidden_len] h0是第一個時間戳所有層的記憶單元的Tensor（理解成每一層中每個句子的隱藏輸出）(h0:a0, ht: at);
out：[seq_len, batch, hidden_len] out是每一個時刻上空間上最后一層的輸出(相當於 \(\hat{y^{<t>}}\))

# 5層RNN
import torch
from torch import nn

# (詞向量維度)feature_len=100, (神經元數)hidden_len=20, 網絡層數=5
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=5)                

# 單詞數量(seq_len=10),句子數量(batch=3),每個特征100維度(feature_len=100)
x = torch.randn(10, 3, 100)             

# 傳入RNN處理, 另外傳入h_0, shape是<網絡層數=5, batch=3, (神經元數)hidden_len=20>
# forward
out, h = rnn(x, torch.zeros(5, 3, 20))  

print(out.shape)                        # torch.Size([10, 3, 20])
print(h.shape)                          # at: torch.Size([5, 3, 20])

2.3 使用nn.RNN構建多層循環網絡

相比2.1處的代碼，只要改變層數即可。

from torch import nn

# 詞向量維度100維，輸出維度20，層數為2
rnn = nn.RNN(100, 20, 2)             
print(rnn._parameters.keys())        
# odict_keys(['weight_ih_l0', 'weight_hh_l0', 
# 'bias_ih_l0', 'bias_hh_l0', 'weight_ih_l1', 
# 'weight_hh_l1', 'bias_ih_l1', 'bias_hh_l1'])

print(rnn.weight_hh_l0.shape, rnn.weight_ih_l0.shape)         
# torch.Size([20, 20]) torch.Size([20, 100])

print(rnn.weight_hh_l1.shape, rnn.weight_ih_l1.shape)         
# torch.Size([20, 20]) torch.Size([20, 20]) 這里輸入不是100，是20

Tip：

從 \(l_1\) 層開始接受的輸入都是下面層的輸出，即接受的輸入特征數不再是feature_len而是hidden_len
即，這里參數 weight_ih_l1 的shape是：[hidden_len,hidden_len]

out, ht=forward(x, h0)

import torch
from torch import nn

# feature_len=100, hidden_len=20, 層數=4
rnn = nn.RNN(100, 20, 4)                

# 單詞數量(seq_len=10),句子數量(batch=3),每個特征100維度(feature_len=100)
x = torch.randn(10, 3, 100)             

# 傳入RNN處理, 另外傳入h_0, shape是<層數, batch, hidden_len=20>
out, h = rnn(x, torch.zeros(4, 3, 20))  

print(out.shape)                    # torch.Size([10, 3, 20])
print(h.shape)                      # torch.Size([4, 3, 20])

3. nn.RNNCell

nn.RNN是一次性將 所有時刻 特征喂入的
nn.RNNCell將序列上的 每個時刻 分開來處理。
舉例：如果要處理3個句子，每個句子10個單詞，每個單詞用100維的嵌入向量表示
- nn.RNN傳入的Tensor的shape是[10,3,100]
- nn.RNNCell傳入的Tensor的shape是[3,100]，將此計算單元運行10次。

3.1 nn.RNNCell()

初始化方法和上面一樣。

3.2 ht = forward(xt, ht-1)

\(x_t\)：[batch, feature_len]表示當前時刻的輸入;
\(h_{t-1}\)：[num layers, batch, hidden_len]前一個時刻的單元輸出，\(h_t\)是下一時刻的單元輸入;
out： out相當於 \(\hat{y^{<t>}}\)

多層RNN類似下圖：

import torch
from torch import nn

# 單層RNN, feature_len=100, hidden_len=20
cell1 = nn.RNNCell(100, 20)  
h1 = torch.zeros(3, 20)
x = torch.randn(10, 3, 100)
for xt in x:                  # xt.shape=[3, 100]
    h1 = cell1(xt, h1)
print(h1.shape)                # torch.Size([3, 20])


# 多層RNN
cell1 = nn.RNNCell(100, 30)
cell2 = nn.RNNCell(30, 20)

h1 = torch.zeros(3, 30)
h2 = torch.zeros(3, 20)
x = torch.randn(10, 3, 100)
for xt in x:
    h1 = cell1(xt, h1)
    h2 = cell2(h1, h2)
    
print(h1.shape)                     # torch.Size([3, 30])
print(h2.shape)                     # torch.Size([3, 20])

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pytorch實戰學習(八)：基礎RNN 深度學習與Pytorch入門實戰（十五）LSTM 《深度學習與Pytorch入門實戰》2019 深度學習入門之PyTorch 深度學習與Pytorch入門實戰（十）ResNet&nn.Module 深度學習與Pytorch入門實戰（十一）數據增強深度學習與Pytorch入門實戰（十四）時間序列預測深度學習入門: CNN與LSTM(RNN) 對比學習:《深度學習之Pytorch》《PyTorch深度學習實戰》+代碼深度學習之入門Pytorch（1）------基礎