CSC321 神經網絡語言模型 RNN-LSTM

本文轉載自查看原文 2015-12-16 20:18 13261 機器學習

主要兩個方面

Probabilistic modeling
概率建模，神經網絡模型嘗試去預測一個概率分布
Cross-entropy作為誤差函數使得我們可以對於觀測到的數據
給予較高的概率值
同時可以解決saturation的問題
前面提到的線性隱層的降維作用(減少訓練參數)

這是一個最初版的神經網絡語言模型

選取什么要的loss function，為什么用cross-entropy，為什么不用squared loss呢？

首先可以看到 cross-entropy更能從數值上體現0.01,0.0001兩個預測的真實差別

另外一點是saturation

考慮sigmoid的激活單元輸出y

考慮Cost和z之間的對應關系

由於cross-entropy的話是convex的所以沒有局部的最優解只有全局最優解，因此更容易optimize

線性隱層單元

等價於embedding矩陣的lookup， R是embedding矩陣或者叫做lookup table
那么如何起到了降維效果，怎樣降低了訓練的參數？

embedding降維
當前神經網絡語言模型的局限

這個語言模型其實就是和word2vec的skip-gram model所對應的 continuous bag of words model(cbow)
Word2vec是從一個詞預測周圍的詞這個是從周圍的詞預測中心詞，語言模型特定是從前幾個詞預測當前詞
這意外着我們只能使用類似NGRAM,遵循markov assumption 而不能利用前幾個詞之外的更多信息

但是有時候長距離的語境也是有意義的比如
RNN模型
RNN模型可以解決上述問題，可以學習到長距離的依賴關系

這是一個簡單的RNN例子，將輸入加和。
RNN的訓練
這和之前的普通神經網絡訓練的backprop算法是一樣的也是 backprop只是這里有兩個新的問題

Weight constraints 權重限制
Exploding and vanishing gradients 梯度的爆炸和消失

5.1 關於權重的限制

也就是說隨着時間所有的單元的輸出weight限制為相同的

一個hidden 到hidden的weight的例子

具體一個rnnlm實現的例子參考http://www.cnblogs.com/rocketfan/p/4953290.html 關於rnnlm的圖和介紹。

5.2 關於梯度的爆炸和消失

真正的問題不是backprop而是長距離的依賴非常復雜，梯度的爆炸和消失易於在backprop過程中傳遞疊加出現。

大於1的梯度不斷傳遞帶來梯度爆炸，小於1的梯度不斷傳遞帶來梯度消失。

解決rnn的梯度爆炸和消失的方案：

LSTM
將輸入或者輸出序列逆序，這樣網絡可以先看到近距離的依賴，然后再嘗試學習困難的遠距離的
梯度截斷

Rnnlm,fater-rnnlm采用的第三種方法，強制截斷梯度避免梯度爆炸

LSTM

LSTM通過將單一的單元替換成復雜一些的記憶單元來解決這個問題

tensorflow關於LSTM的例子

https://github.com/jikexueyuanwiki/tensorflow-zh/blob/master/SOURCE/tutorials/recurrent/index.md

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

這里提到當距離較短時，rnn可以學習到歷史的信息，但是當距離較長的情況下rnn是無能為力的。

短距離的例子，預測sky

長距離的例子，預測French

下面這個圖非常清楚普通的rnn，對應值做一個簡單的非線性單元比如sigmoid,tanh

struct SigmoidActivation : public IActivation {

void Forward(Real* hidden, int size) {

for (int i = 0; i < size; i++) {

hidden[i] = exp(hidden[i]) / (1 + exp(hidden[i]));

}

來自 <http://www.cnblogs.com/rocketfan/p/4953290.html>

LSTM將單一的神經網絡層變成了4個。

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.

來自 <http://colah.github.io/posts/2015-08-Understanding-LSTMs/>

LSTM可以通過gates的條件來給cell state去掉或者增加信息

第一步是丟棄信息

forget gate layer

It looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents "completely keep this" while a 0 represents "completely get rid of this."

通過結合前一步輸出和當前輸入，輸出一個0-1直接的數值(sigmoid)，1表示全部保留，0表示全部丟棄。

舉一個例子，比如語言模型，結合當前主題的性別信息來判斷當前代詞是she,he?

當遇到新的主題我們需要忘記掉之前的主題的性別信息

第二步是確定要保留的信息

對應語言模型的例子，遇到新的主題我們添加當前主題的性別信息

兩次變化 sigmoid input layer gate + tanh layer

第三步：前兩步結合到一起，丟棄掉之前的性別信息 + 加入當前的性別信息

第四步：最后輸出

Finally, we need to decide what we're going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we're going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that's what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that's what follows next.

來自 <http://colah.github.io/posts/2015-08-Understanding-LSTMs/>

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 RNN LSTM語言模型神經網絡語言模型NNLM 3. RNN神經網絡-LSTM模型結構用tensorflow實現自然語言處理——基於循環神經網絡的神經語言模型 pytorch ---神經網絡語言模型 NNLM 《A Neural Probabilistic Language Model》循環神經網絡---LSTM模型 tensorflow學習之（十一）RNN+LSTM神經網絡的構造 TensorFlow(十一)：遞歸神經網絡（RNN與LSTM）通過keras例子理解LSTM 循環神經網絡(RNN) 循環神經網絡(RNN)的改進——長短期記憶LSTM