圖像標注 python實現-普通RNN篇


RNN介紹

神經網絡包含輸入層、隱層、輸出層,通過激活函數控制輸出,層與層之間通過權值連接。激活函數是事先確定好的,那么神經網絡模型通過訓練“學“到的東西就蘊含在“權值“中。

RNN與普通神經網絡最大的不同就是建立了時序和狀態的概念,即某個時刻的輸出依賴與前一個狀態和當前的輸入,所以RNN可以用於處理序列數據。

  展開之后

箭頭上的字母代表權重矩陣,也就是不同層之間的連接。x代表輸入序列,h代表狀態,也就是時刻,o代表輸出。

可以看到,整個網絡是共享W、U、V三個矩陣的,這樣的設計主要有兩個好處:

(1)收斂快。參數越少越容易收斂

(2)可以想要有多少層就有多少層。因為權重就是固定的這三個,所以每次訓練或者測試的數據長度不需要相等,比較靈活。

這是多輸出對多輸出的,RNN還有很多其他靈活的變體,可以根據不同的場景選擇:

                     

前向傳播:

對於t時刻:

$$h^{(t)}=f_{w}(W*h^{(t-1)}+U*x^{t}+b)$$

其中是$f_{w}$激活函數,一般來說會選擇tanh函數,原因見下面激活函數介紹,b為偏置。

t時刻的輸出:

$$o^{(t)}=V*h^{(t)}+c$$ 

最終模型的預測輸出為: 

$$y^{(t)}=\sigma(o^{(t)})$$

通常最后的單時刻任務還是分類,所以激活函數$\sigma$選用softmax函數。

所有的y按時刻順序組織起來就是最終輸出。

 

反向傳播:

RNN的反向傳播跟普通的神經網絡反向傳播從本質上來說都是鏈式求導,沒有什么太大的區別。

需要注意的就是權重就共享的,所以梯度需要累加,最后傳播到$h_{0}$的時候更新一次就行。

求導並不難,只是堆在一起之后比較復雜,看起來比較嚇人,用鏈式法則一步一步求就好了,這里就不寫公式了,編輯比較麻煩。

想看具體怎么反向傳播的公式可以直接看代碼。

 

激活函數:

                       sigmoid函數圖像和導數圖像

 

        tanh函數的圖像和導數圖像

 

二者的函數圖像很相似,都把輸出壓縮在了一個范圍之內。他們的導數圖像也非常相近,,sigmoid函數的導數范圍是(0,0.25],tanh函數的導數范圍是(0,1],他們的導數最大都不大於1。這就會導致一個問題,在上面式子累乘的過程中,是一堆小數相乘,會越乘越小,隨着時間序列的不斷深入,小數的累乘就會導致梯度越來越小直到接近於0,這就是“梯度消失“現象。在較為深層的神經網絡中會導致反向傳播時梯度消失,梯度消失就意味消失那一層的參數再也不更新,那么那一層隱層就變成了單純的映射層,毫無意義。但是看起來tanh會比sigmoid消失的更加慢一些,所以這里選用tanh函數。

還有一個原因是sigmoid函數還有一個缺點,Sigmoid函數輸出不是零中心對稱。sigmoid的輸出均大於0,這就使得輸出不是0均值,稱為偏移現象,這將導致后一層的神經元將上一層輸出的非0均值的信號作為輸入。關於原點對稱的輸入和中心對稱的輸出,網絡會收斂地更好。

 

圖像標注:

有了上面的基礎,接下來就可以進入正題了。

首先對輸入的圖像用CNN進行處理,得到降維的特征(這里直接用VGG16的fc7的輸出),然后把正確標注和特征送入RNN。

在實際應用中,對單詞直接處理比較不方便,所以一般會把單詞映射成一個整數,這樣就變成了一個分類問題。編碼過程由word_embedding_forward函數完成。

好了詳細的內容直接上代碼吧。

 

 RNN.py就一個類,定義網絡結構

  1
import numpy as np 2 3 from cs231n.layers import * 4 from cs231n.rnn_layers import * 5 6 class CaptioningRNN(object): 7 """ 8 這個類只是構建RNN的網絡結構和定義內部計算 9 具體的權值更新訓練過程不在這里實現 10 """ 11 12 def __init__(self, word_to_idx,input_dim=512,wordvec_dim=128, 13 hidden_dim=128,cell_type='rnn',dtype=np.float32): 14 15 ''' 16 - word_to_idx: 字典,存儲了單詞到整數的映射關系. 17 - input_dim: 單個輸入的維度 18 - wordvec_dim: 單詞向量的維度. 19 - hidden_dim: 隱層的大小. 20 - cell_type: 'rnn' or 'lstm'. 21 ''' 22 if cell_type not in {'rnn','lstm'}: 23 raise ValueError('Invalid cell type "%s"' %cell_type) 24 25 self.cell_type=cell_type 26 self.dtype=dtype 27 self.word_to_idx=word_to_idx 28 self.idx_to_word={i:w for w,i in word_to_idx.items()} 29 self.params={} 30 31 vocab_size=len(word_to_idx) 32 self._null=word_to_idx['<NULL>'] 33 self._start=word_to_idx.get('<START>',None) 34 self._end=word_to_idx.get('<END>',None) 35 36 #初始化代表每個單詞的向量 37 self.params['W_embed']=np.random.randn(vocab_size,wordvec_dim) 38 self.params['W_embed']/=100 39 40 #由CNN的輸出向RNN的第一層轉換 41 #注意:給的輸入數據已經是經過VGG16提取的特征,所以這里沒有CNN層 42 self.params['W_proj']=np.random.randn(input_dim,hidden_dim) 43 self.params['W_proj']/=np.sqrt(input_dim) 44 self.params['b_proj']=np.zeros(hidden_dim) 45 46 #初始化RNN的參數,每一層共享參數,所以參數只需要設置一次 47 48 ####################這里的4是什么意思???????????見下回lstm分解 49 dim_mul={'lstm':4,'rnn':1}[cell_type] 50 self.params['Wx']=np.random.randn(wordvec_dim,dim_mul*hidden_dim) 51 self.params['Wx']/=np.sqrt(wordvec_dim) 52 self.params['Wh']=np.random.randn(hidden_dim,dim_mul*hidden_dim) 53 self.params['Wh']/=np.sqrt(hidden_dim) 54 self.params['b']=np.zeros(dim_mul*hidden_dim) 55 56 #得到每個時間點的輸出,輸出是每個單詞的得分 57 self.params['W_vocab']=np.random.randn(hidden_dim,vocab_size) 58 self.params['W_vocab']/=np.sqrt(hidden_dim) 59 self.params['b_vocab']=np.zeros(vocab_size) 60 61 #強制轉換,保證每個權重矩陣都是我們設置的數據類型 62 for k,v in self.params.items(): 63 self.params[k]=v.astype(self.dtype) 64 65 def loss(self,features,captions): 66 """ 67 Inputs: 68 - features: 輸入的特征,shape (N, D) 69 - captions: Ground-truth; 數組,shape (N, T),N張圖片都用T個單詞描述 70 71 Returns a tuple of: 72 - loss: Scalar loss 73 - grads: Dictionary of gradients parallel to self.params 74 75 返回loss和梯度(如果是訓練過程) 76 77 """ 78 79 #輸入的描述和輸出的描述對完整的描述有點區別 80 #具體在於輸入去掉最后一個單詞'<end>' 輸出去掉第一個單詞'<start>' 81 #因為輸入第一個單詞的時候就已經需要預測下一個單詞了 所以有一個單詞的錯位 82 captions_in=captions[:,:-1] 83 captions_out=captions[:,1:] 84 85 #每個描述長度有所不同,短的用NULL補齊到T長度,所以NULL不計入loss 86 mask=(captions_out!=self._null) 87 88 W_proj,b_proj=self.params['W_proj'],self.params['b_proj'] 89 W_embed=self.params['W_embed'] 90 Wx,Wh,b=self.params['Wx'],self.params['Wh'],self.params['b'] 91 W_vocab,b_vocab=self.params['W_vocab'],self.params['b_vocab'] 92 93 #前向計算過程: 94 #(1)使用仿射函數將從CNN提取的特征轉換到第一個RNN的隱層狀態(N,H) 95 #(2)將captions_in從單詞轉換成向量(N,T,W) 96 #(3)在RNN各個隱層中進行運算,(N,T,H) 97 #(4)在每個時間點計算各個單詞的得分(N,T,V) 98 #(5)用softmax計算各時間點的loss 99 # W表示單詞向量的維度 V表示單詞詞庫的個數 100 loss,grads=0.0,{} 101 102 #begin 103 104 affine_out,affine_cache=affine_forward(features,W_proj,b_proj) 105 106 word_embedding_out,word_embedding_cache=word_embedding_forward(captions_in,W_embed) 107 108 if self.cell_type=='rnn': 109 rnn_or_lstm_out,rnn_cache=rnn_forward(word_embedding_out,affine_out,Wx,Wh,b) 110 elif self.cell_type=='lstm': 111 rnn_or_lstm_out,lstm_cache=lstm_foward(word_embedding_out,affine_out,Wx,Wh,b) 112 else: 113 raise ValueError('Invalid cell type "%s"' %self.cell_type) 114 115 temporal_affine_out,temporal_affine_cache=temporal_affine_forward(rnn_or_lstm_out,W_vocab,b_vocab) 116 117 ''' 分割線''' 118 loss,dtemporal_affine_out=temporal_softmax_loss(temporal_affine_out,captions_out,mask) 119 120 #反向傳播 121 drnn_or_lstm_out,grads['W_vocab'],grads['b_vocab']=temporal_affine_backward(dtemporal_affine_out,temporal_affine_cache) 122 123 if self.cell_type=='rnn': 124 dword_embedding_out,daffine_out,grads['Wx'],grads['Wh'],grads['b']=rnn_backward(drnn_or_lstm_out,rnn_cache) 125 else: 126 dword_embedding_out,daffine_out,grads['Wx'],grads['Wh'],grads['b']=lstm_backward(drnn_or_lstm_out, lstm_cache) 127 128 grads['W_embed']=word_embedding_backward(dword_embedding_out,word_embedding_cache) 129 130 dfeatures,grads['W_proj'],grads['b_proj']=affine_backward(daffine_out,affine_cache) 131 132 #end 133 134 return loss,grads 135 136 137 def sample(self,features,max_len=30): 138 ''' 139 預測 140 RNN與CNN不同,CNN的預測和訓練過程沒有區別 141 但是RNN不同,訓練過程每個時刻都有輸入,但是預測的時候根本沒有輸出怎么辦? 142 還好我們有個<start>,可以當作第一個時刻的輸入,可以得到第一個時刻的輸出,把第一個時刻的輸出當作第二個時刻的輸入 143 依次類推 144 ''' 145 N,D=features.shape 146 captions=self._null*np.ones((N,max_len),dtype=np.int32) 147 148 W_proj, b_proj = self.params['W_proj'], self.params['b_proj'] 149 W_embed = self.params['W_embed'] 150 Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b'] 151 W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab'] 152 153 affine_out,affine_cache=affine_forward(features,W_proj,b_proj) 154 prev_word_idx=[self._start]*N 155 prev_h=affine_out 156 prev_c=np.zeros(prev_h.shape) 157 captions[:,0]=self._start 158 159 for i in range(1,max_len): 160 prev_word_embed=W_embed[prev_word_idx] 161 if self.cell_type=='rnn': 162 next_h,rnn_step_cache=rnn_step_forward(prev_word_embed,prev_h,Wx,Wh,b) 163 elif self.cell_type=='lstm': 164 pass 165 else: 166 raise ValueError('Invalid cell_type "%s"' % self.cell_type) 167 vocab_affine_out,vocab_affine_cache=affine_forward(next_h,W_vocab,b_vocab) 168 captions[:,i]=list(np.argmax(vocab_affine_out,axis=1)) 169 prev_word_idx=captions[:,i] 170 prev_h=next_h 171 172 return captions
layer.py 分步實現各個層之間的計算

import numpy as np


"""
This file defines layer types that are commonly used for recurrent neural
networks.
"""


def rnn_step_forward(x, prev_h, Wx, Wh, b):
    """
    Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
    activation function.


    Inputs:
    - x: Input data for this timestep, of shape (N, D).
    - prev_h: Hidden state from previous timestep, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - cache: Tuple of values needed for the backward pass.
    """
    next_h, cache = None, None
    ##############################################################################
    # TODO: Implement a single forward step for the vanilla RNN. Store the next  #
    # hidden state and any values you need for the backward pass in the next_h   #
    # and cache variables respectively.                                          #
    ##############################################################################
    a=prev_h.dot(Wh)+x.dot(Wx)+b
    next_h=np.tanh(a)
    cache=(x,prev_h,Wh,Wx,b,next_h)

    return next_h, cache


def rnn_step_backward(dnext_h, cache):
    """
    Backward pass for a single timestep of a vanilla RNN.

    Inputs:
    - dnext_h: Gradient of loss with respect to next hidden state
    - cache: Cache object from the forward pass

    Returns a tuple of:
    - dx: Gradients of input data, of shape (N, D)
    - dprev_h: Gradients of previous hidden state, of shape (N, H)
    - dWx: Gradients of input-to-hidden weights, of shape (D, H)
    - dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
    - db: Gradients of bias vector, of shape (H,)
    """
    dx, dprev_h, dWx, dWh, db = None, None, None, None, None
    ##############################################################################
    # TODO: Implement the backward pass for a single step of a vanilla RNN.      #
    #                                                                            #
    # HINT: For the tanh function, you can compute the local derivative in terms #
    # of the output value from tanh.                                             #
    ##############################################################################
    x,prev_h,Wh,Wx,b,next_h=cache
    da=dnext_h*(1-next_h*next_h)
    dx=da.dot(Wx.T)
    dprev_h=da.dot(Wh.T)
    dWx=x.T.dot(da)
    dWh=prev_h.T.dot(da)
    db=np.sum(da,axis=0)

    return dx, dprev_h, dWx, dWh, db



def rnn_forward(x, h0, Wx, Wh, b):
    """
    Run a vanilla RNN forward on an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The RNN uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the RNN forward, we return the hidden states for all timesteps.

    Inputs:
    - x: Input data for the entire timeseries, of shape (N, T, D).
    - h0: Initial hidden state, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - h: Hidden states for the entire timeseries, of shape (N, T, H).
    - cache: Values needed in the backward pass
    """
    h, cache = None, None
    ##############################################################################
    # TODO: Implement forward pass for a vanilla RNN running on a sequence of    #
    # input data. You should use the rnn_step_forward function that you defined  #
    # above. You can use a for loop to help compute the forward pass.            #
    ##############################################################################
    N,T,D=x.shape
    H=b.shape[0]
    h=np.zeros((N,T,H))
    prev_h=h0
    cache=[]

    for t in range(T):
        xt=x[:,t,:]
        next_h,step_cache=rnn_step_forward(xt,prev_h,Wx,Wh,b)
        cache.append(step_cache)
        h[:,t,:]=next_h
        prev_h=next_h

    return h, cache


def rnn_backward(dh, cache):
    """
    Compute the backward pass for a vanilla RNN over an entire sequence of data.

    Inputs:
    - dh: Upstream gradients of all hidden states, of shape (N, T, H)

    Returns a tuple of:
    - dx: Gradient of inputs, of shape (N, T, D)
    - dh0: Gradient of initial hidden state, of shape (N, H)
    - dWx: Gradient of input-to-hidden weights, of shape (D, H)
    - dWh: Gradient of hidden-to-hidden weights, of shape (H, H)
    - db: Gradient of biases, of shape (H,)
    """
    dx, dh0, dWx, dWh, db = None, None, None, None, None
    ##############################################################################
    # TODO: Implement the backward pass for a vanilla RNN running an entire      #
    # sequence of data. You should use the rnn_step_backward function that you   #
    # defined above. You can use a for loop to help compute the backward pass.   #
    ##############################################################################
    N,T,H=dh.shape
    D=cache[0][0].shape[1]

    dprev_h=np.zeros((N,H))
    dx=np.zeros((N,T,D))
    dWx=np.zeros((D,H))
    dWh=np.zeros((H,H))
    db=np.zeros((H,))

    for t in range(T):
        t=T-1-t
        dx[:,t,:], dprev_h, dWxt, dWht, dbt=rnn_step_backward(dh[:,t,:]+dprev_h,cache[t])
        dWx, dWh, db = dWx+dWxt, dWh+dWht, db+dbt
    dh0=dprev_h

    return dx, dh0, dWx, dWh, db


def word_embedding_forward(x, W):
    """
    Forward pass for word embeddings. We operate on minibatches of size N where
    each sequence has length T. We assume a vocabulary of V words, assigning each
    to a vector of dimension D.

    Inputs:
    - x: Integer array of shape (N, T) giving indices of words. Each element idx
      of x muxt be in the range 0 <= idx < V.
    - W: Weight matrix of shape (V, D) giving word vectors for all words.

    Returns a tuple of:
    - out: Array of shape (N, T, D) giving word vectors for all input words.
    - cache: Values needed for the backward pass
    """
    out, cache = None, None
    ##############################################################################
    # TODO: Implement the forward pass for word embeddings.                      #

    N,T=x.shape
    V,D=W.shape
    out=np.zeros((N,T,D))

    for i in range(N):
        for j in range(T):
            out[i,j]=W[x[i,j]]   

    cache=(x,W.shape)

    return out, cache


def word_embedding_backward(dout, cache):
    """
    Backward pass for word embeddings. We cannot back-propagate into the words
    since they are integers, so we only return gradient for the word embedding
    matrix.

    HINT: Look up the function np.add.at

    Inputs:
    - dout: Upstream gradients of shape (N, T, D)
    - cache: Values from the forward pass

    Returns:
    - dW: Gradient of word embedding matrix, of shape (V, D).
    """
    dW = None
    ##############################################################################
    # TODO: Implement the backward pass for word embeddings.                     #
    #                                                                            #
    # Note that Words can appear more than once in a sequence.                   #
    # HINT: Look up the function np.add.at                                       #
    ##############################################################################

    x,W_shape=cache
    dW=np.zeros(W_shape)
    np.add.at(dW,x,dout)
    
    return dW


def sigmoid(x):
    """
    A numerically stable version of the logistic sigmoid function.
    """
    pos_mask = (x >= 0)
    neg_mask = (x < 0)
    z = np.zeros_like(x)
    z[pos_mask] = np.exp(-x[pos_mask])
    z[neg_mask] = np.exp(x[neg_mask])
    top = np.ones_like(x)
    top[neg_mask] = z[neg_mask]
    return top / (1 + z)




def temporal_affine_forward(x, w, b):
    """
    Forward pass for a temporal affine layer. The input is a set of D-dimensional
    vectors arranged into a minibatch of N timeseries, each of length T. We use
    an affine function to transform each of those vectors into a new vector of
    dimension M.

    Inputs:
    - x: Input data of shape (N, T, D)
    - w: Weights of shape (D, M)
    - b: Biases of shape (M,)

    Returns a tuple of:
    - out: Output data of shape (N, T, M)
    - cache: Values needed for the backward pass
    """
    N, T, D = x.shape
    M = b.shape[0]
    out = x.reshape(N * T, D).dot(w).reshape(N, T, M) + b
    cache = x, w, b, out
    return out, cache


def temporal_affine_backward(dout, cache):
    """
    Backward pass for temporal affine layer.

    Input:
    - dout: Upstream gradients of shape (N, T, M)
    - cache: Values from forward pass

    Returns a tuple of:
    - dx: Gradient of input, of shape (N, T, D)
    - dw: Gradient of weights, of shape (D, M)
    - db: Gradient of biases, of shape (M,)
    """
    x, w, b, out = cache
    N, T, D = x.shape
    M = b.shape[0]

    dx = dout.reshape(N * T, M).dot(w.T).reshape(N, T, D)
    dw = dout.reshape(N * T, M).T.dot(x.reshape(N * T, D)).T
    db = dout.sum(axis=(0, 1))

    return dx, dw, db


def temporal_softmax_loss(x, y, mask, verbose=False):
    """
    A temporal version of softmax loss for use in RNNs. We assume that we are
    making predictions over a vocabulary of size V for each timestep of a
    timeseries of length T, over a minibatch of size N. The input x gives scores
    for all vocabulary elements at all timesteps, and y gives the indices of the
    ground-truth element at each timestep. We use a cross-entropy loss at each
    timestep, summing the loss over all timesteps and averaging across the
    minibatch.

    As an additional complication, we may want to ignore the model output at some
    timesteps, since sequences of different length may have been combined into a
    minibatch and padded with NULL tokens. The optional mask argument tells us
    which elements should contribute to the loss.

    Inputs:
    - x: Input scores, of shape (N, T, V)
    - y: Ground-truth indices, of shape (N, T) where each element is in the range
         0 <= y[i, t] < V
    - mask: Boolean array of shape (N, T) where mask[i, t] tells whether or not
      the scores at x[i, t] should contribute to the loss.

    Returns a tuple of:
    - loss: Scalar giving loss
    - dx: Gradient of loss with respect to scores x.
    """

    N, T, V = x.shape

    x_flat = x.reshape(N * T, V)
    y_flat = y.reshape(N * T)
    mask_flat = mask.reshape(N * T)

    probs = np.exp(x_flat - np.max(x_flat, axis=1, keepdims=True))
    probs /= np.sum(probs, axis=1, keepdims=True)
    loss = -np.sum(mask_flat * np.log(probs[np.arange(N * T), y_flat])) / N
    dx_flat = probs.copy()
    dx_flat[np.arange(N * T), y_flat] -= 1
    dx_flat /= N
    dx_flat *= mask_flat[:, None]

    if verbose: print('dx_flat: ', dx_flat.shape)

    dx = dx_flat.reshape(N, T, V)

    return loss, dx
import numpy as np

from cs231n import optim
from cs231n.coco_utils import sample_coco_minibatch

class CaptioningSolver(object):
    """基於RNN結構的訓練過程"""
    def __init__(self,model,data,**kwargs):
        """
        必選參數:
        - model: 符合一定要求的model
        - data: 特定格式的訓練和驗證數據 

        可選參數:
        - update_rule: 可選梯度下降方法,默認sgd
        - optim_config: 梯度下降的參數
        - lr_decay: 學習率衰減參數,各個epoch更新lr_decay=lr*lr_decay.默認不衰減,即1.0
        - batch_size: 批大小
        - num_epochs: 訓練的epoch數量
        - print_every: 幾個epochs打印一次信息,默認一個.
        - verbose: 訓練過程中是否打印信息
        """

        self.model=model
        self.data=data

        self.update_rule=kwargs.pop('update_rule','sgd')
        self.optim_config=kwargs.pop('optim_config',{})
        self.lr_decay=kwargs.pop('lr_decay',1.0)
        self.batch_size=kwargs.pop('batch_size',128)
        self.num_epochs=kwargs.pop('num_epochs',10)
        self.print_every=kwargs.pop('print_every',1)
        self.verbose=kwargs.pop('verbose',True)

        #有錯誤或者多余參數 報錯
        if len(kwargs)>0:
            extra=','.join('"%s"' % k for k in kwargs.keys())
            raise ValueError('Unrecognized arguments %s'%extra)

        if not hasattr(optim,self.update_rule):
            raise ValueError('Invalid update rule %s' %self.update_rule)

        self.update_rule=getattr(optim,self.update_rule)

        self._reset()


    def _reset(self):
        #對一些重要變量的追蹤

        self.epoch=0
        self.best_val_acc=0
        self.best_paprams={}
        self.loss_history=[]
        self.train_acc_history=[]
        self.val_acc_history=[]

        #給每個權重矩陣都保存一份優化參數,有的方法如adam需要保存一些歷史信息
        #而每個參數矩陣的歷史信息不一樣,需要各自保存
        self.optim_configs={}
        for p in self.model.params:
            d={k:v for k,v in self.optim_config.items()}
            self.optim_configs[p]=d

    def _step(self):
        '''
        單步更新參數,即一個迭代,只能被train()調用
        '''

        #獲取小批量數據
        minibatch=sample_coco_minibatch(self.data,batch_size=self.batch_size,split='train')
        captions,features,urls=minibatch

        #計算這批數據在現在參數下的loss和梯度
        loss,grads=self.model.loss(features,captions)
        self.loss_history.append(loss)

        #更新參數
        for p,w in self.model.params.items():
            dw=grads[p]
            config=self.optim_configs[p]
            next_w,next_config=self.update_rule(w,dw,config)
            self.model.params[p]=next_w
            self.optim_configs[p]=next_config


     #計算正確率
    def check_accuracy(self,X,y,num_samples=None,batch_size=128):
        N=X.shape[0]
        print(X.shape,y.shape)
        #如果num_sample不為空 則只從全部數據中選則num_sample個數據計算
        if num_samples is not None and N>num_samples:
            mask=np.random.choice(N,num_samples)
            N=num_samples
            X=X[mask]
            y=y[mask]

        num_batches=N//batch_size
        if N%batch_size!=0:
            num_batches+=1
        y_pred=[]
        for i in range(num_batches):
            start=i*batch_size
            end=(i+1)*batch_size
            y_pred.append(self.model.sample(X[start:end]))
        y_pred=np.concatenate(y_pred,axis=0)
        acc=np.mean(y_pred==y)

        return acc

    def train(self):
        '''
        自動訓練全過程
        '''

        num_train=self.data['train_captions'].shape[0]
        iterations_pre_epoch=max(num_train//self.batch_size,1)
        num_iterations=self.num_epochs*iterations_pre_epoch

        for t in range(num_iterations):
            self._step()

            epoch_end=(t+1)%iterations_pre_epoch==0
            if epoch_end:
                self.epoch+=1
                for k in self.optim_configs:
                    self.optim_configs[k]['learning_rate']*=self.lr_decay

                if(self.epoch%self.print_every==0):
                    print ('(epoch %d / %d) loss: %f' % (self.epoch, self.num_epochs, self.loss_history[-1]))

            '''
            first_it=(t==0)
            last_it=(t==num_iterations-1)
            if first_it or last_it or epoch_end:
                train_acc=self.check_accuracy(self.data['train_features'][self.data['train_image_idxs'] ],self.data['train_captions'],num_samples=1280)
                val_acc=  self.check_accuracy(self.data['val_features'][self.data['val_image_idxs'] ],self.data['val_captions'],64)
                self.train_acc_history.append(train_acc)
                self.val_acc_history.append(val_acc)

                #可視化進度
                if self.verbose:
                    print ('(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                     self.epoch, self.num_epochs, train_acc, val_acc))

                #檢查、保存模型
                if val_acc>self.best_val_acc:
                    self.best_val_acc=val_acc
                    self.best_params={}
                    for k,v in self.model.params.items():
                        self.best_params[k]=v.copy()
                '''
            

 

 

 

參考:https://blog.csdn.net/zhaojc1995/article/details/80572098

   http://cs231n.github.io 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM