圖像標注 python實現-LSTM篇


上一篇文章介紹了RNN的工作原理和在圖像標注上的應用,這篇介紹RNN的變體LSTM。

要知道為什么會出現LSTM,首先來看RNN存在什么問題。RNN由於激活函數和本身結構的問題,存在梯度消失的現象,導致

(1)網絡結構不能太深,不然深層網絡的梯度可以基本忽略,沒有起到什么作用,白白增加訓練時間。

(2)只能形成短期記憶,不能形成長期記憶。 因為梯度逐層減少,只有比較臨近的層梯度才會相差不多,所以對臨近的信息記憶比較多,對較遠的信息記憶差。

接下來看看LSTM怎么解決這個問題:

所有 RNN 都具有一種重復神經網絡模塊的鏈式的形式。在標准的 RNN 中,這個重復的模塊只有一個非常簡單的結構,例如一個 tanh 層。

正是由於tanh導數恆小於1,出現的梯度越來越小的問題。

 

LSTM通過更加精細的在每步時間節點的計算克服梯度問題,RNN把一個狀態信息用一次計算更新,LSTM用四次計算來更新,即“四個門”。

原本我是把h叫做狀態的,但是在這個新的結構里,h叫狀態已經不太合適了,有一個新的變量C才是真正意義上的狀態,好像大部分資料都把C叫做細胞狀態,至於h,我現在也不知道怎么稱呼它好了,它確實也在傳播狀態信息,但是卻是可有可無的,因為h的值完全可以由C得到,不顯式傳遞也不會有任何問題。

 

在RNN中,$x_t\in \mathbb{R}^{D} ,h_t\in \mathbb{R}^H, W_x\in\mathbb{R}^{H\times D}, W_h\in\mathbb{R}^{H\times H},b\in\mathbb{R}^{H}$。

在LSTM中,$x_t\in \mathbb{R}^{D}, h_t\in \mathbb{R}^H, W_x\in\mathbb{R}^{4H\times D}, W_h\in\mathbb{R}^{4H\times H},b\in\mathbb{R}^{4H}$。

第一步還是一樣,$a\in\mathbb{R}^{4H}$ , $a=W_xx_t + W_hh_{t-1}+b$,RNN得到a直接就可以直接激活當作下一個狀態了,而LSTM中得到了四個輸出。

$$
\begin{align*}
i = \sigma(a_i) \hspace{2pc}
f = \sigma(a_f) \hspace{2pc}
o = \sigma(a_o) \hspace{2pc}
g = \tanh(a_g)
\end{align*}
$$

i,f,o,g分別叫做輸入門,遺忘門,輸出門,阻塞門,$i,f,o,g\in\mathbb{R}^H$。

$$
c_{t} = f\odot c_{t-1} + i\odot g \hspace{4pc}
h_t = o\odot\tanh(c_t)
$$

下面來理解一下這些公式:

“遺忘“可以理解為“之前的內容記住多少“,其精髓在於只能輸出(0,1)小數的sigmoid函數和粉色圓圈的乘法,LSTM網絡經過學習決定讓網絡記住以前百分之多少的內容。

下一步是確定什么樣的新信息被存放在細胞狀態中。這里包含兩個部分。第一,sigmoid 層稱 “輸入門層” 決定什么值我們將要更新。然后,一個 tanh 層稱“阻塞門層”創建一個新的候選值向量加入到狀態中。下一步,我們會講這兩個信息來產生對狀態的更新。

最終,我們需要確定輸出什么值。這個輸出將會基於我們得到的的新的細胞狀態和“輸出門”的信息。

有了遺忘門和輸入門之后,在對h求導,這時候的導數已經不是恆小於1了,所以克服了梯度消失的問題。(關於RNN和LSTM的梯度可以在代碼中很清晰的看到對比,當然也可以手動直接求梯度,也可以發現LSTM中h的梯度不是恆小於1)。

正向計算和反向求梯度

def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
    """
    Forward pass for a single timestep of an LSTM.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Inputs:
    - x: Input data, of shape (N, D)
    - prev_h: Previous hidden state, of shape (N, H)
    - prev_c: previous cell state, of shape (N, H)
    - Wx: Input-to-hidden weights, of shape (D, 4H)
    - Wh: Hidden-to-hidden weights, of shape (H, 4H)
    - b: Biases, of shape (4H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - next_c: Next cell state, of shape (N, H)
    - cache: Tuple of values needed for backward pass.
    """
    next_h, next_c, cache = None, None, None
    #############################################################################
    # TODO: Implement the forward pass for a single timestep of an LSTM.        #
    # You may want to use the numerically stable sigmoid implementation above.  #
    #############################################################################
    H=Wh.shape[0]
    a = np.dot(x, Wx) + np.dot(prev_h, Wh) + b    # (1)
    i = sigmoid(a[:, 0:H])                # (2-5)
    f = sigmoid(a[:, H:2*H])
    o = sigmoid(a[:, 2*H:3*H])
    g = np.tanh(a[:, 3*H:4*H])
    next_c = f * prev_c + i * g           # (6)
    next_h = o * np.tanh(next_c)          # (7)

    cache = (i, f, o, g, x, Wx, Wh, prev_c, prev_h,next_c)
    
    return next_h, next_c, cache


def lstm_step_backward(dnext_h, dnext_c, cache):
    """
    Backward pass for a single timestep of an LSTM.

    Inputs:
    - dnext_h: Gradients of next hidden state, of shape (N, H)
    - dnext_c: Gradients of next cell state, of shape (N, H)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient of input data, of shape (N, D)
    - dprev_h: Gradient of previous hidden state, of shape (N, H)
    - dprev_c: Gradient of previous cell state, of shape (N, H)
    - dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
    - dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
    - db: Gradient of biases, of shape (4H,)
    """
    dx, dh, dc, dWx, dWh, db = None, None, None, None, None, None
    #############################################################################
    # TODO: Implement the backward pass for a single timestep of an LSTM.       #
    #                                                                           #
    # HINT: For sigmoid and tanh you can compute local derivatives in terms of  #
    # the output value from the nonlinearity.                                   #
    #############################################################################
    
    
    
    i, f, o, g, x, Wx, Wh, prev_c, prev_h,next_c =cache
    
    do=dnext_h*np.tanh(next_c)
    dnext_c+=o*(1-np.tanh(next_c)**2)*dnext_h
    di,df,dg,dprev_c=dnext_c*(g,prev_c,i,f)
    da=np.hstack([i*(1-i)*di,f*(1-f)*df,o*(1-o)*do,(1-g*g)*dg])
    
    dx=np.dot(da,Wx.T)
    dWx=np.dot(x.T,da)
    dprev_h=np.dot(da,Wh.T)
    dWh=np.dot(prev_h.T,da)
    db=np.sum(da,axis=0)
    
    return dx, dprev_h, dprev_c, dWx, dWh, db


def lstm_forward(x, h0, Wx, Wh, b):
    """
    Forward pass for an LSTM over an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the LSTM forward, we return the hidden states for all timesteps.

    Note that the initial cell state is passed as input, but the initial cell
    state is set to zero. Also note that the cell state is not returned; it is
    an internal variable to the LSTM and is not accessed from outside.

    Inputs:
    - x: Input data of shape (N, T, D)
    - h0: Initial hidden state of shape (N, H)
    - Wx: Weights for input-to-hidden connections, of shape (D, 4H)
    - Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
    - b: Biases of shape (4H,)

    Returns a tuple of:
    - h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
    - cache: Values needed for the backward pass.
    """
    h, cache = None, None
    #############################################################################
    # TODO: Implement the forward pass for an LSTM over an entire timeseries.   #
    # You should use the lstm_step_forward function that you just defined.      #
    #############################################################################
    N,T,D=x.shape
    H=h0.shape[1]
    h=np.zeros((N,T,H))
    cache={}
    prev_h=h0
    prev_c=np.zeros((N,H))
    
    for t in range(T):
        xt=x[:,t,:]
        next_h,next_c,cache[t]=lstm_step_forward(xt,prev_h,prev_c,Wx,Wh,b)
        prev_h=next_h
        prev_c=next_c
        h[:,t,:]=prev_h
    
    return h, cache


def lstm_backward(dh, cache):
    """
    Backward pass for an LSTM over an entire sequence of data.]

    Inputs:
    - dh: Upstream gradients of hidden states, of shape (N, T, H)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient of input data of shape (N, T, D)
    - dh0: Gradient of initial hidden state of shape (N, H)
    - dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
    - dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
    - db: Gradient of biases, of shape (4H,)
    """
    dx, dh0, dWx, dWh, db = None, None, None, None, None
    #############################################################################
    # TODO: Implement the backward pass for an LSTM over an entire timeseries.  #
    # You should use the lstm_step_backward function that you just defined.     #
    #############################################################################
    N, T, H = dh.shape
    D = cache[0][4].shape[1]

    dprev_h = np.zeros((N, H))
    dprev_c = np.zeros((N, H))
    dx = np.zeros((N, T, D))
    dh0 = np.zeros((N, H))
    dWx= np.zeros((D, 4*H))
    dWh = np.zeros((H, 4*H))
    db = np.zeros((4*H,))

    for t in range(T):
        t = T-1-t
        step_cache = cache[t]
        dnext_h = dh[:,t,:] + dprev_h
        dnext_c = dprev_c
        dx[:,t,:], dprev_h, dprev_c, dWxt, dWht, dbt = lstm_step_backward(dnext_h, dnext_c, step_cache)
        dWx, dWh, db = dWx+dWxt, dWh+dWht, db+dbt

    dh0 = dprev_h 
    
    return dx, dh0, dWx, dWh, db

剩下的代碼和昨天幾乎沒有區別,還有昨天代碼中留下了一個問題,現在應該也能輕松解答了吧hhh。

 

參考:

https://blog.csdn.net/zhaojc1995/article/details/80572098

http://cs231n.github.io 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM