一、前向計算和反向傳播數學過程講解
這里講解的是平均池化層,最大池化層見本文第三小節
二、測試代碼
數據和上面完全一致,自行打印驗證即可。
1、前向傳播
import tensorflow as tf import numpy as np # 輸入張量為3×3的二維矩陣 M = np.array([ [[1], [-1], [0]], [[-1], [2], [1]], [[0], [2], [-2]] ]) # 定義卷積核權重和偏置項。由權重可知我們只定義了一個2×2×1的卷積核 filter_weight = tf.get_variable('weights', [2, 2, 1, 1], initializer=tf.constant_initializer([ [1, -1], [0, 2]])) biases = tf.get_variable('biases', [1], initializer=tf.constant_initializer(1)) # 調整輸入格式符合TensorFlow要求 M = np.asarray(M, dtype='float32') M = M.reshape(1, 3, 3, 1) # 計算輸入張量通過卷積核和池化濾波器計算后的結果 x = tf.placeholder('float32', [1, None, None, 1]) # 我們使用了帶Padding,步幅為2的卷積操作,因為filter_weight的深度確定了卷積核的數量 conv = tf.nn.conv2d(x, filter_weight, strides=[1, 2, 2, 1], padding='SAME') bias = tf.nn.bias_add(conv, biases) # 使用帶Padding,步幅為2的平均池化操作 pool = tf.nn.avg_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') # 執行計算圖 with tf.Session() as sess: tf.global_variables_initializer().run() convoluted_M = sess.run(bias, feed_dict={x: M}) pooled_M = sess.run(pool, feed_dict={x: M}) print("convoluted_M: \n", convoluted_M) print("pooled_M: \n", pooled_M)
2、反向傳播
import tensorflow as tf import numpy as np # 輸入張量為3×3的二維矩陣 M = np.array([ [[1], [-1], [0]], [[-1], [2], [1]], [[0], [2], [-2]] ]) # 定義卷積核權重和偏置項。由權重可知我們只定義了一個2×2×1的卷積核 filter_weight = tf.get_variable('weights', [2, 2, 1, 1], initializer=tf.constant_initializer([ [1, -1], [0, 2]])) biases = tf.get_variable('biases', [1], initializer=tf.constant_initializer(1)) # 調整輸入格式符合TensorFlow要求 M = np.asarray(M, dtype='float32') M = M.reshape(1, 3, 3, 1) # 計算輸入張量通過卷積核和池化濾波器計算后的結果 x = tf.placeholder('float32', [1, None, None, 1]) # 我們使用了帶Padding,步幅為2的卷積操作,因為filter_weight的深度確定了卷積核的數量 conv = tf.nn.conv2d(x, filter_weight, strides=[1, 2, 2, 1], padding='SAME') bias = tf.nn.bias_add(conv, biases) d_filter = tf.gradients(bias,filter_weight) d_biases = tf.gradients(bias,biases) d_conv = tf.gradients(bias,conv) d_conv_x = tf.gradients(conv,x) d_conv_w = tf.gradients(conv,filter_weight) # d_bias_x = tf.gradients(bias,x) # 使用帶Padding,步幅為2的平均池化操作 pool = tf.nn.avg_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') d_pool = tf.gradients(pool,x) # 執行計算圖 with tf.Session() as sess: tf.global_variables_initializer().run() # convoluted_M = sess.run(bias, feed_dict={x: M}) # pooled_M = sess.run(pool, feed_dict={x: M}) # # print("convoluted_M: \n", convoluted_M) # print("pooled_M: \n", pooled_M) print("d_filter:\n", sess.run(d_filter, feed_dict={x: M})) print("d_biases:\n", sess.run(d_biases, feed_dict={x: M})) print("d_conv:\n", sess.run(d_conv, feed_dict={x: M})) print("d_conv_x:\n", sess.run(d_conv_x, feed_dict={x: M})) print("d_conv_w:\n", sess.run(d_conv_w, feed_dict={x: M}))
四、CS31n上實現的卷積層池化層API
1、卷積層
卷積層向前傳播示意圖:
def conv_forward_naive(x, w, b, conv_param): """ A naive implementation of the forward pass for a convolutional layer. The input consists of N data points, each with C channels, height H and width W. We convolve each input with F different filters, where each filter spans all C channels and has height HH and width HH. Input: - x: Input data of shape (N, C, H, W) - w: Filter weights of shape (F, C, HH, WW) - b: Biases, of shape (F,) - conv_param: A dictionary with the following keys: - 'stride': The number of pixels between adjacent receptive fields in the horizontal and vertical directions. - 'pad': The number of pixels that will be used to zero-pad the input. Returns a tuple of: - out: Output data, of shape (N, F, H', W') where H' and W' are given by H' = 1 + (H + 2 * pad - HH) / stride W' = 1 + (W + 2 * pad - WW) / stride - cache: (x, w, b, conv_param) """ out = None ############################################################################# # TODO: Implement the convolutional forward pass. # # Hint: you can use the function np.pad for padding. # ############################################################################ pad = conv_param['pad'] stride = conv_param['stride'] N, C, H, W = x.shape F, _, HH, WW = w.shape H0 = 1 + (H + 2 * pad - HH) / stride W0 = 1 + (W + 2 * pad - WW) / stride x_pad = np.pad(x, ((0,0),(0,0),(pad,pad),(pad,pad)),'constant') # 填充后的輸入 out = np.zeros((N,F,H0,W0)) # 初始化的輸出 # 以輸出的每一個像素點為單位寫出其前傳表達式 for n in range(N): for f in range(F): for h0 in range(H0): for w0 in range(W0): out[n,f,h0,w0] = np.sum(x_pad[n,:,h0*stride:HH+h0*stride,w0*stride:WW+w0*stride] * w[f]) + b[f] ############################################################################# # END OF YOUR CODE # ############################################################################# cache = (x, w, b, conv_param) return out, cache
卷積層反向傳播示意圖:
def conv_backward_naive(dout, cache): """ A naive implementation of the backward pass for a convolutional layer. Inputs: - dout: Upstream derivatives. - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive Returns a tuple of: - dx: Gradient with respect to x - dw: Gradient with respect to w - db: Gradient with respect to b """ dx, dw, db = None, None, None ############################################################################# # TODO: Implement the convolutional backward pass. # ############################################################################# x, w, b, conv_param = cache pad = conv_param['pad'] stride = conv_param['stride'] N, C, H, W = x.shape F, _, HH, WW = w.shape _, _, H0, W0 = out.shape x_pad = np.pad(x, [(0,0), (0,0), (pad,pad), (pad,pad)], 'constant') dx, dw = np.zeros_like(x), np.zeros_like(w) dx_pad = np.pad(dx, [(0,0), (0,0), (pad,pad), (pad,pad)], 'constant') # 計算b的梯度(F,) db = np.sum(dout, axis=(0,2,3)) # dout:(N,F,H0,W0) # 以每一個dout點為基准計算其兩個輸入矩陣x(:,:,窗,窗)和w(f)的梯度,注意由於這兩個矩陣都是多次參與運算,所以都是累加的關系 for n in range(N): for f in range(F): for h0 in range(H0): for w0 in range(W0): x_win = x_pad[n,:,h0*stride:h0*stride+HH,w0*stride:w0*stride+WW] dw[f] += x_win * dout[n,f,h0,w0] dx_pad[n,:,h0*stride:h0*stride+HH,w0*stride:w0*stride+WW] += w[f] * dout[n,f,h0,w0] dx = dx_pad[:,:,pad:pad+H,pad:pad+W] ############################################################################# # END OF YOUR CODE # ############################################################################# return dx, dw, db
2、最大池化層
池化層向前傳播:
和卷積層類似,但是更簡單一點,只要在對應feature map的原輸入上取個窗口然后池化之即可,
def max_pool_forward_naive(x, pool_param): HH, WW = pool_param['pool_height'], pool_param['pool_width'] s = pool_param['stride'] N, C, H, W = x.shape H_new = 1 + (H - HH) / s W_new = 1 + (W - WW) / s out = np.zeros((N, C, H_new, W_new)) for i in xrange(N): for j in xrange(C): for k in xrange(H_new): for l in xrange(W_new): window = x[i, j, k*s:HH+k*s, l*s:WW+l*s] out[i, j, k, l] = np.max(window) cache = (x, pool_param) return out, cache
池化層反向傳播:
反向傳播的時候也是還原窗口,除最大值處繼承上層梯度外(也就是說本層梯度為零),其他位置置零。
池化層沒有過濾器,只有dx梯度,且x的窗口不像卷積層會重疊,所以不用累加,
def max_pool_backward_naive(dout, cache): x, pool_param = cache HH, WW = pool_param['pool_height'], pool_param['pool_width'] s = pool_param['stride'] N, C, H, W = x.shape H_new = 1 + (H - HH) / s W_new = 1 + (W - WW) / s dx = np.zeros_like(x) for i in xrange(N): for j in xrange(C): for k in xrange(H_new): for l in xrange(W_new): window = x[i, j, k*s:HH+k*s, l*s:WW+l*s] m = np.max(window) dx[i, j, k*s:HH+k*s, l*s:WW+l*s] = (window == m) * dout[i, j, k, l] return dx
五、實際框架實現方法
實際框架當然不會才用這種大循環的手段實現卷積操作,矩陣化運算才是正路。
1、Theano
常見的一種拆法是將二維 input 展開成一維向量([in_h * in_w]
->
[out_h * out_w]),將卷積核展開為([in_h * in_w, out_h * out_w]),
上面僅討論了2維輸入,其實由於 input 的 channels 和 kernal 的 channels 數一致,所以情況延申起來原理並無改變。最后的運算如下:
y = C·xT
2、Caffe
caffe的卷積矩陣化如下,其直接把 input 的各個通道的值放在了一個矩陣種,將各個 kernals 的各個通道值放入同一個矩陣,一次解決所有運算,感覺比上面的做法高明了一點(不過都是很厲害的算法)。