相關內容簡體繁體

深度學習面試題37：LSTM Networks原理(Long Short Term Memory networks)

本文轉載自查看原文 2020-07-14 23:47 1145 0200.面試題：深度學習

目錄

　　LSTMs網絡架構

　　LSTM的核心思想

　　遺忘門(Forget gate)

　　輸入門(Input gate)

　　輸出門(Output gate)

　　LSTMs是如何解決長程依賴問題的？

　　Peephole是啥

　　多層LSTM

　　參考資料

長短期記憶網絡通常稱為LSTMs，是一種特殊的RNN，能夠學習長期依賴關系。他們是由Hochreiter 等人在1997年提出的，在之后的工作中又被很多人精煉和推廣。它們對各種各樣的問題都非常有效，現在被廣泛使用。LSTMs被明確設計為避免長期依賴問題。長時間記憶信息實際上是他們的默認行為，而不是他們努力學習的東西。

LSTMs網絡架構

LSTM的核心思想

LSTMs的關鍵是單元狀態，即貫穿圖表頂部的水平線。細胞的狀態有點像傳送帶。它沿着整個鏈向下延伸，只有一些小的線性相互作用。很容易讓信息不加改變地流動。

LSTM確實有能力刪除或添加信息到細胞狀態，由稱為門的結構仔細地調節。門是一種選擇性地讓信息通過的方式。一個LSTM有三個門，以保護和控制單元的狀態。

遺忘門(Forget gate)

遺忘門會輸出一個0到1之間的向量，然后與記憶細胞C做Pointwize的乘法，可以理解為模型正在忘記一些東西。

輸入門(Input gate)

有的資料也叫更新門

輸入門有兩條分支，左側輸出一個0到1之間的向量，表示要當前輪多少百分比的信息更新到記憶細胞C上去；右側表示當前輪提出來的信息。

經過遺忘門和輸入門之后，記憶細胞便有了一定的變化。

注意LSTM中的記憶細胞只經過遺忘門和輸入門，它是不直接經過輸出門的。

輸出門(Output gate)

輸出門需要接受來自三個方向的輸入信息，然后產生兩個方向的輸出。

三個方向輸入的信息分別是：當前時間步的信息、上一個時間步的輸出和當前記憶細胞的信息。

兩個方向輸出的信息分別是：產生當前輪的預測和作為下一個時間步的輸入。

LSTMs是如何解決長程依賴問題的？

與簡單的RNN網絡模型比，LSTMs不是僅僅依靠快速變化的hidden state的信息去產生預測，而是還去考慮記憶細胞C中的信息。

比如有一個有長程依賴的待預測數據：

I grew up in France… I speak fluent ().

當LSTMs讀到France后，就把France的信息存在記憶細胞特定的位置上，再經過后面的時間步時，這個France的信息會因為遺忘門的乘法而沖淡，但是要注意，這個沖淡的效果很弱，如果沖刷記憶的效果太強那就和簡單的RNN類似了（可能有人要問，要把這個沖刷的強度置為多少呢？答：這是LSTMs自己學出來的），當LSTMs讀到fluent時，結合記憶細胞中France的信息，就預測出French的答案。

Peephole是啥

2000年學者Gers & Schmidhuber對LSTMs做了一些變體，peephole如圖所示，就是讓三個門能利用好記憶細胞里的信息，從而讓模型更強。

下圖為對應李宏毅老師的結構，是完全一樣的。

Pytorch demo

# https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html?highlight=lstm
# tensorboard --logdir=runs/lstm --host=127.0.0.1
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)


class LSTMTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim

        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores


def prepare_sequence(seq, to_ix):
    idxs = [to_ix[w] for w in seq]
    return torch.tensor(idxs, dtype=torch.long)


training_data = [
    ("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
    ("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
for sent, tags in training_data:
    for word in sent:
        if word not in word_to_ix:
            word_to_ix[word] = len(word_to_ix)
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2}

# These will usually be more like 32 or 64 dimensional.
# We will keep them small, so we can see how the weights change as we train.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6

model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# See what the scores are before training
# Note that element i,j of the output is the score for tag j for word i.
# Here we don't need to train, so the code is wrapped in torch.no_grad()

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('../runs/lstm')

with torch.no_grad():
    inputs = prepare_sequence(training_data[0][0], word_to_ix)
    tag_scores = model(inputs)
    writer.add_graph(model, inputs)
    writer.close()

    print(tag_scores)

for epoch in range(300):  # again, normally you would NOT do 300 epochs, it is toy data
    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Tensors of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)
        targets = prepare_sequence(tags, tag_to_ix)

        # Step 3. Run our forward pass.
        tag_scores = model(sentence_in)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores, targets)
        loss.backward()
        optimizer.step()

# See what the scores are after training
with torch.no_grad():
    inputs = prepare_sequence(training_data[0][0], word_to_ix)
    tag_scores = model(inputs)

    # The sentence is "the dog ate the apple".  i,j corresponds to score for tag j
    # for word i. The predicted tag is the maximum scoring tag.
    # Here, we can see the predicted sequence below is 0 1 2 0 1
    # since 0 is index of the maximum value of row 1,
    # 1 is the index of maximum value of row 2, etc.
    # Which is DET NOUN VERB DET NOUN, the correct sequence!
    print(tag_scores)

View Code

多層LSTM

和簡單的RNN一樣，可以疊多層，也可以雙向。

參考資料

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://www.bilibili.com/video/BV1JE411g7XF?p=20

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow 基礎學習十一：LSTM（long-short term memory） LSTM網絡（Long Short-Term Memory ） [譯] 理解 LSTM(Long Short-Term Memory, LSTM) 網絡 LSTM（Long Short-Term Memory）長短期記憶網絡 Long short-term memory(LSTM)數學推導深度學習面試題36：RNN與長程依賴關系(Long-Term Dependencies) 論文翻譯：2018_LSTM剪枝_Learning intrinsic sparse structures within long short-term memory Long-Short Memory Network(LSTM長短期記憶網絡) 深度學習面試題深度學習面試題38：LSTM如何解決梯度消失問題

粵ICP備18138465號 © 2018-2025 CODEPRJ.COM