終於來到了最終的大BOSS,卷積神經網絡~
這里我想還是主要關注代碼的實現,具體的CNN的知識點想以后在好好寫一寫,CNN的代碼關鍵就是要加上卷積層和池話層.
一、卷積層
卷積層的前向傳播還是比較容易的,我們主要關注的是反向傳播,看下圖就知道了:
def conv_forward_naive(x, w, b, conv_param): stride, pad = conv_param['stride'], conv_param['pad'] N, C, H, W = x.shape F, C, HH, WW = w.shape x_padded = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant') #補零 H_new = 1 + (H + 2 * pad - HH) / stride W_new = 1 + (W + 2 * pad - WW) / stride s = stride out = np.zeros((N, F, H_new, W_new)) for i in xrange(N): # ith image for f in xrange(F): # fth filter for j in xrange(H_new): for k in xrange(W_new): out[i, f, j, k] = np.sum(x_padded[i, :, j*s:HH+j*s, k*s:WW+k*s] * w[f]) + b[f]#對應位相乘 cache = (x, w, b, conv_param) return out, cache def conv_backward_naive(dout, cache): x, w, b, conv_param = cache pad = conv_param['pad'] stride = conv_param['stride'] F, C, HH, WW = w.shape N, C, H, W = x.shape H_new = 1 + (H + 2 * pad - HH) / stride W_new = 1 + (W + 2 * pad - WW) / stride dx = np.zeros_like(x) dw = np.zeros_like(w) db = np.zeros_like(b) s = stride x_padded = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), 'constant') dx_padded = np.pad(dx, ((0, 0), (0, 0), (pad, pad), (pad, pad)), 'constant') for i in xrange(N): # ith image for f in xrange(F): # fth filter for j in xrange(H_new): for k in xrange(W_new): window = x_padded[i, :, j*s:HH+j*s, k*s:WW+k*s] db[f] += dout[i, f, j, k] dw[f] += window * dout[i, f, j, k] dx_padded[i, :, j*s:HH+j*s, k*s:WW+k*s] += w[f] * dout[i, f, j, k]#上面的式子,關鍵就在於+號 # Unpad dx = dx_padded[:, :, pad:pad+H, pad:pad+W] return dx, dw, db
和http://www.cnblogs.com/tornadomeet/p/3468450.html中提到的一樣,卷積層的BP算法就是這么計算的,也就是一個正統的卷積操作
二、pooling層
def max_pool_forward_naive(x, pool_param): HH, WW = pool_param['pool_height'], pool_param['pool_width'] s = pool_param['stride'] N, C, H, W = x.shape H_new = 1 + (H - HH) / s W_new = 1 + (W - WW) / s out = np.zeros((N, C, H_new, W_new)) for i in xrange(N): for j in xrange(C): for k in xrange(H_new): for l in xrange(W_new): window = x[i, j, k*s:HH+k*s, l*s:WW+l*s] out[i, j, k, l] = np.max(window) cache = (x, pool_param) return out, cache def max_pool_backward_naive(dout, cache): x, pool_param = cache HH, WW = pool_param['pool_height'], pool_param['pool_width'] s = pool_param['stride'] N, C, H, W = x.shape H_new = 1 + (H - HH) / s W_new = 1 + (W - WW) / s dx = np.zeros_like(x) for i in xrange(N): for j in xrange(C): for k in xrange(H_new): for l in xrange(W_new): window = x[i, j, k*s:HH+k*s, l*s:WW+l*s] m = np.max(window) #獲得之前的那個值,這樣下面只要windows==m就能得到相應的位置 dx[i, j, k*s:HH+k*s, l*s:WW+l*s] = (window == m) * dout[i, j, k, l] return dx
三、與之前的區別
這里BN算法與之前是不太一樣的,因為網絡的輸入變成了saptail的
def spatial_batchnorm_forward(x, gamma, beta, bn_param): N, C, H, W = x.shape x_new = x.transpose(0, 2, 3, 1).reshape(N*H*W, C)#分成不同的channel來算,所以可以直接用之前的代碼 out, cache = batchnorm_forward(x_new, gamma, beta, bn_param) out = out.reshape(N, H, W, C).transpose(0, 3, 1, 2) return out, cache def spatial_batchnorm_backward(dout, cache): N, C, H, W = dout.shape dout_new = dout.transpose(0, 2, 3, 1).reshape(N*H*W, C) dx, dgamma, dbeta = batchnorm_backward(dout_new, cache) dx = dx.reshape(N, H, W, C).transpose(0, 3, 1, 2) return dx, dgamma, dbeta
四、總結
assignment2終於弄完了,總的來說..numpy還是要多熟悉,具體的操作也要熟悉。卷積層的前向傳播很好理解,反向傳播和之前的區別不大,只不過需要做一個卷積的操作。