1.引言
卷積神經網絡(Convolutional Neural Networks , CNN)受到視網膜上的細胞只對視野范圍內的部分區域敏感,這一部分區域稱為感受域(receptive field).卷積神經網絡正是采用了這種機制,每一個神經元只與一部分輸入相連接。
2.稀疏連接
CNNs通過局部連接的方式揭示了空間中的局部相關性。在 $m$ 層的隱單元的輸入來自於 $m-1$ 層的一部分單元的加權和,這一部分單元在空間上是連續的感受域。如下圖:
可以把 $m-1$ 層想象成視網膜輸入。$m$ 層的單元的感受域的寬度均為3,因此只與視網膜層的 3 個相鄰的神經元相連接。$m+1$ 層的單元與其下面一層的連接方式也是如此。每一個神經元對不在感受域范圍內的變化是沒有反應的,所以上面的結構保證學習出一種“濾波器“,使其對局部空間的輸入模式產生強烈的反應。
但是,正如上面圖中所示,把許多這樣的濾波器層層級聯,局部感知逐漸變得全局感知,$m$ 層的每一個單元只對部分輸入感知,而 $m+1$ 層的單元又將 $m$ 層的感知結果綜合起來從而形成對輸入層全部的一個感知,所以$m+1$隱層單元可以看作是對寬度為5的特征的一個非線性編碼。
3.共享權重(Shared Weights)
在CNNs,每個濾波器 $h_{i}$ 重復地逐步橫跨整個輸入層。重復的單元共享參數(權重向量和偏置),從而形成一幅特征圖。
在上圖中,3個隱層單元屬於同一幅特征圖,一樣顏色的權重值是共享的,即相等的。
濾波器通過這種方式使得圖像中可視層中任意位置的特征都能被檢測出來,權重共享大大減少了需要學習的參數的數量。
4.細節和符號
通過重復地把一個函數運用到整個圖像的子區域可以得到一幅特征圖,即用一個線性濾波器對圖像進行卷積操作,加上偏置項,然后再采用一個非線性函數。如果用 $h^{k}$ 表示第 $k$ 幅特征圖,其對應的濾波器由 $W^{k}$ 和偏置 $b_{k}$ 決定, 那么特征圖 $h^{k}$ 可以由下計算得到(采用 tanh 作為非線性函數):
$h_{ij}^{k}=tanh((W^{k}*x)_{ij}+b_{k}$
為了得到對數據更加豐富的表示,通常每個隱層都由多幅特征圖組成:$\{h^{\text{(k)}},k=0,...K\}$.權重 $W$ 由一個4維的張量表示, 4各維度分別表示:目的特征圖,源特征圖,源特征圖的垂直坐標,源特征圖的水平坐標。偏置 $b$ 由一個向量表示,其中每一個元素是每一個目標特征圖對應的偏置。可以表示如下:
在上圖中 $W_{ij}^{kl}$ 表示在 $m-1$ 層的第 $k$ 幅特征圖的每一個像素 與第 $m$ 層的第 $l$ 幅特征圖的像素 $(i,j)$ 之間的連接權重。
5.卷積操作
卷積操作(Convolution operation,ConvOp)在theano中是通過theano.tensor.signal.conv2d實現的,它需要兩個輸入:
- 輸入圖像的部分子集對應的一個4階張量,該張量的每一維分別表示:子集的大小,輸入特征圖的編號,圖像的高度,圖像的寬度
- 表示權重矩陣 $W$ 的一個4階張量,每一維分別表示:在 $m$ 的特征圖像的編號,$m-1$ 層特征圖像的編號,濾波器的高度,濾波器的寬度
這里還要介紹一個在下面代碼中將要用到的一個函數 dimshuffle(*pattern):
例如dimshuffle('x', 2, 'x', 0, 1),就是將原來3階張量擴展為5階張量,新張量的第0維和第2維為0,而第1維,第3維和第4維分別由原來3階張量的第2維,第0維和第1維映射而來。
如果原來張量的形狀為(20,30,40),通過dimshuffle('x', 2, 'x', 0, 1)之后,形狀變為(1,40,1,20,30)
dimshuffle(0, 1) -> 和原來一樣
dimshuffle(1, 0) -> 交換第1維和第0維的數據
更多詳細資料參看:dimshuffle
下面用到的圖片3wolfmoon
下面對輸入是3 幅RGB 特征圖,進行卷積操作,並輸出卷積前后的對比圖:
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Tue Apr 28 10:22:14 2015 4 5 @author: ZengJiulin 6 """ 7 8 import theano 9 from theano import tensor as T 10 from theano.tensor.nnet import conv 11 import pylab 12 from PIL import Image 13 import numpy 14 15 rng = numpy.random.RandomState(23455) 16 17 # instantiate 4D tensor for input 18 input = T.tensor4(name='input',dtype='float64') 19 20 # initialize shared variable for weights. 21 # 輸出的特征圖 2 幅 22 # 輸入的特征圖 3 幅 23 # 濾波器的大小 9*9 24 w_shp = (2, 3, 9, 9) 25 w_bound = numpy.sqrt(3 * 9 * 9) 26 W = theano.shared( numpy.asarray( 27 rng.uniform( 28 low=-1.0 / w_bound, 29 high=1.0 / w_bound, 30 size=w_shp), 31 dtype=input.dtype), name ='W') 32 33 # initialize shared variable for bias (1D tensor) with random values 34 # IMPORTANT: biases are usually initialized to zero. However in this 35 # particular application, we simply apply the convolutional layer to 36 # an image without learning the parameters. We therefore initialize 37 # them to random values to "simulate" learning. 38 # 輸出的特征圖有 2 幅,所以偏置向量的元素個數同樣為 2 39 b_shp = (2,) 40 b = theano.shared(numpy.asarray( 41 rng.uniform(low=-.5, high=.5, size=b_shp), 42 dtype=input.dtype), name ='b') 43 44 # build symbolic expression that computes the convolution of input with filters in w 45 conv_out = conv.conv2d(input, W) 46 47 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output 48 # A few words on ``dimshuffle`` : 49 # ``dimshuffle`` is a powerful tool in reshaping a tensor; 50 # what it allows you to do is to shuffle dimension around 51 # but also to insert new ones along which the tensor will be 52 # broadcastable; 53 # dimshuffle('x', 2, 'x', 0, 1) 54 # This will work on 3d tensors with no broadcastable 55 # dimensions. The first dimension will be broadcastable, 56 # then we will have the third dimension of the input tensor as 57 # the second of the resulting tensor, etc. If the tensor has 58 # shape (20, 30, 40), the resulting tensor will have dimensions 59 # (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor) 60 # More examples: 61 # dimshuffle('x') -> make a 0d (scalar) into a 1d vector 62 # dimshuffle(0, 1) -> identity 63 # dimshuffle(1, 0) -> inverts the first and second dimensions 64 # dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN) 65 # dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1) 66 # dimshuffle(2, 0, 1) -> AxBxC to CxAxB 67 # dimshuffle(0, 'x', 1) -> AxB to Ax1xB 68 # dimshuffle(1, 'x', 0) -> AxB to Bx1xA 69 70 # 卷積后的結果加上偏置,然后進行一個非線性函數計算,這里采用的是sigmoid函數 71 output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x')) 72 73 # create theano function to compute filtered images 74 f = theano.function([input], output) 75 76 77 78 # open random image of dimensions 639x516 79 img_file = open('E:\\Python\\3wolfmoon.jpg','rb') 80 img = Image.open(img_file) 81 # dimensions are (height, width, channel) 82 img = numpy.asarray(img, dtype='float64') / 256. 83 84 # put image in 4D tensor of shape (1, 3, height, width) 85 cc = img.transpose(2, 0, 1) 86 img_ = img.transpose(2, 0, 1).reshape(1, 3, 639, 516) 87 filtered_img = f(img_) 88 89 # plot original image and first and second components of output 90 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img) 91 pylab.gray(); 92 # recall that the convOp output (filtered image) is actually a "minibatch", 93 # of size 1 here, so we take index 0 in the first dimension: 94 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :]) 95 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :]) 96 pylab.show()
注意到,隨機初始化的濾波器非常像一個邊緣檢測器。
6.最大池化(MaxPooling)
最大池化是一種下采樣的形式,最大池化額操作就是把圖像分割成不重疊的矩形區域,每一個子區域選出一個最大值。
最大池化的兩個作用:
- 去除了非最大值,減少了后面一層的計算量
- (這里還沒怎么看懂,后面是原講義的說法)It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.
最大池化在theano中是通過theano.tensor.signal.downsample.max_pool_2d實現的,例如:
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Tue Apr 28 15:17:23 2015 4 5 @author: ZengJiulin 6 """ 7 import theano 8 from theano import tensor as T 9 import numpy 10 from theano.tensor.signal import downsample 11 12 input = T.dtensor4('input') 13 maxpool_shape = (2, 2) 14 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True) 15 f = theano.function([input],pool_out) 16 17 invals = numpy.random.RandomState(1).rand(3, 2, 5, 5) 18 print 'With ignore_border set to True:' 19 print 'invals[0, 0, :, :] =\n', invals[0, 0, :, :] 20 print 'output[0, 0, :, :] =\n', f(invals)[0, 0, :, :] 21 22 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False) 23 f = theano.function([input],pool_out) 24 print 'With ignore_border set to False:' 25 print 'invals[1, 0, :, :] =\n ', invals[1, 0, :, :] 26 print 'output[1, 0, :, :] =\n ', f(invals)[1, 0, :, :]
注意忽略邊界和不忽略邊界的區別:
>>> runfile('E:/Python/downsample.py', wdir=r'E:/Python') Using gpu device 0: GeForce GT 720M With ignore_border set to True: invals[0, 0, :, :] = [[ 4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01 1.46755891e-01] [ 9.23385948e-02 1.86260211e-01 3.45560727e-01 3.96767474e-01 5.38816734e-01] [ 4.19194514e-01 6.85219500e-01 2.04452250e-01 8.78117436e-01 2.73875932e-02] [ 6.70467510e-01 4.17304802e-01 5.58689828e-01 1.40386939e-01 1.98101489e-01] [ 8.00744569e-01 9.68261576e-01 3.13424178e-01 6.92322616e-01 8.76389152e-01]] output[0, 0, :, :] = [[ 0.72032449 0.39676747] [ 0.6852195 0.87811744]] With ignore_border set to False: invals[1, 0, :, :] = [[ 0.01936696 0.67883553 0.21162812 0.26554666 0.49157316] [ 0.05336255 0.57411761 0.14672857 0.58930554 0.69975836] [ 0.10233443 0.41405599 0.69440016 0.41417927 0.04995346] [ 0.53589641 0.66379465 0.51488911 0.94459476 0.58655504] [ 0.90340192 0.1374747 0.13927635 0.80739129 0.39767684]] output[1, 0, :, :] = [[ 0.67883553 0.58930554 0.69975836] [ 0.66379465 0.94459476 0.58655504] [ 0.90340192 0.80739129 0.39767684]] >>>
7.LeNet整個模型
稀疏,卷積層和最大池化是 LeNet 模型的核心,但是具體的其他細節可能變化很大。下圖給出LeNet的一個描述:
底層由卷積層和下采樣層交替,頂層與傳統的 MLP 全連接。
從整個執行過程看,就是把一個4階的張量整理成MLP能夠處理的2維特征圖。
8.全部代碼

1 # -*- coding: utf-8 -*- 2 """ 3 Created on Sat Apr 25 14:20:02 2015 4 5 @author: ZengJiulin 6 """ 7 8 """This tutorial introduces the LeNet5 neural network architecture 9 using Theano. LeNet5 is a convolutional neural network, good for 10 classifying images. This tutorial shows how to build the architecture, 11 and comes with all the hyper-parameters you need to reproduce the 12 paper's MNIST results. 13 14 15 This implementation simplifies the model in the following ways: 16 17 - LeNetConvPool doesn't implement location-specific gain and bias parameters 18 - LeNetConvPool doesn't implement pooling by average, it implements pooling 19 by max. 20 - Digit classification is implemented with a logistic regression rather than 21 an RBF network 22 - LeNet5 was not fully-connected convolutions at second layer 23 24 References: 25 - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: 26 Gradient-Based Learning Applied to Document 27 Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998. 28 http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf 29 30 """ 31 import os 32 import sys 33 import time 34 35 import numpy 36 37 import theano 38 import theano.tensor as T 39 from theano.tensor.signal import downsample 40 from theano.tensor.nnet import conv 41 42 from logistic_sgd import LogisticRegression, load_data 43 from mlp import HiddenLayer 44 45 46 class LeNetConvPoolLayer(object): 47 """Pool Layer of a convolutional network """ 48 49 def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)): 50 """ 51 Allocate a LeNetConvPoolLayer with shared variable internal parameters. 52 53 :type rng: numpy.random.RandomState 54 :param rng: a random number generator used to initialize weights 55 56 :type input: theano.tensor.dtensor4 57 :param input: symbolic image tensor, of shape image_shape 58 59 :type filter_shape: tuple or list of length 4 60 :param filter_shape: (number of filters, num input feature maps, 61 filter height, filter width) 62 63 :type image_shape: tuple or list of length 4 64 :param image_shape: (batch size, num input feature maps, 65 image height, image width) 66 67 :type poolsize: tuple or list of length 2 68 :param poolsize: the downsampling (pooling) factor (#rows, #cols) 69 """ 70 71 assert image_shape[1] == filter_shape[1] 72 self.input = input 73 74 # there are "num input feature maps * filter height * filter width" 75 # inputs to each hidden unit 76 77 fan_in = numpy.prod(filter_shape[1:]) 78 # each unit in the lower layer receives a gradient from: 79 # "num output feature maps * filter height * filter width" / 80 # pooling size 81 fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) / 82 numpy.prod(poolsize)) 83 # initialize weights with random weights 84 W_bound = numpy.sqrt(6. / (fan_in + fan_out)) 85 #卷積核本質上就是下面這個權重矩陣 86 self.W = theano.shared( 87 numpy.asarray( 88 rng.uniform(low=-W_bound, high=W_bound, size=filter_shape), 89 dtype=theano.config.floatX 90 ), 91 borrow=True 92 ) 93 94 # the bias is a 1D tensor -- one bias per output feature map 95 b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX) 96 self.b = theano.shared(value=b_values, borrow=True) 97 98 # convolve input feature maps with filters 99 conv_out = conv.conv2d( 100 input=input, 101 filters=self.W, 102 filter_shape=filter_shape, 103 image_shape=image_shape 104 ) 105 106 # downsample each feature map individually, using maxpooling 107 pooled_out = downsample.max_pool_2d( 108 input=conv_out, 109 ds=poolsize, 110 ignore_border=True 111 ) 112 113 # add the bias term. Since the bias is a vector (1D array), we first 114 # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will 115 # thus be broadcasted across mini-batches and feature map 116 # width & height 117 self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x')) 118 119 # store parameters of this layer 120 self.params = [self.W, self.b] 121 122 123 def evaluate_lenet5(learning_rate=0.1, n_epochs=200, 124 dataset='mnist.pkl.gz', 125 nkerns=[20, 50], batch_size=500): 126 """ Demonstrates lenet on MNIST dataset 127 128 :type learning_rate: float 129 :param learning_rate: learning rate used (factor for the stochastic 130 gradient) 131 132 :type n_epochs: int 133 :param n_epochs: maximal number of epochs to run the optimizer 134 135 :type dataset: string 136 :param dataset: path to the dataset used for training /testing (MNIST here) 137 138 :type nkerns: list of ints 139 :param nkerns: number of kernels on each layer(兩層,第一層20個卷積核, 140 第二層50個卷積核) 141 """ 142 143 rng = numpy.random.RandomState(23455) 144 145 datasets = load_data(dataset) 146 147 train_set_x, train_set_y = datasets[0] 148 valid_set_x, valid_set_y = datasets[1] 149 test_set_x, test_set_y = datasets[2] 150 151 # compute number of minibatches for training, validation and testing 152 n_train_batches = train_set_x.get_value(borrow=True).shape[0] 153 n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] 154 n_test_batches = test_set_x.get_value(borrow=True).shape[0] 155 n_train_batches /= batch_size 156 n_valid_batches /= batch_size 157 n_test_batches /= batch_size 158 159 # allocate symbolic variables for the data 160 index = T.lscalar() # index to a [mini]batch 161 162 # start-snippet-1 163 x = T.matrix('x') # the data is presented as rasterized images 164 y = T.ivector('y') # the labels are presented as 1D vector of 165 # [int] labels 166 167 ###################### 168 # BUILD ACTUAL MODEL # 169 ###################### 170 print '... building the model' 171 172 # Reshape matrix of rasterized images of shape (batch_size, 28 * 28) 173 # to a 4D tensor, compatible with our LeNetConvPoolLayer 174 # (28, 28) is the size of MNIST images. 175 # 輸入一幅圖像 176 layer0_input = x.reshape((batch_size, 1, 28, 28)) 177 178 # Construct the first convolutional pooling layer: 179 # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24) 180 # maxpooling reduces this further to (24/2, 24/2) = (12, 12) 181 # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12) 182 layer0 = LeNetConvPoolLayer( 183 rng, 184 input=layer0_input, 185 image_shape=(batch_size, 1, 28, 28), 186 filter_shape=(nkerns[0], 1, 5, 5), 187 poolsize=(2, 2) 188 ) 189 190 # Construct the second convolutional pooling layer 191 # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8) 192 # maxpooling reduces this further to (8/2, 8/2) = (4, 4) 193 # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4) 194 # 由於第0層有nkerns[0]個卷積核,所以輸出了nkerns[0]幅特征圖 195 # 第1層的輸入就是第0層的輸出 196 layer1 = LeNetConvPoolLayer( 197 rng, 198 input=layer0.output, 199 image_shape=(batch_size, nkerns[0], 12, 12), 200 filter_shape=(nkerns[1], nkerns[0], 5, 5), 201 poolsize=(2, 2) 202 ) 203 204 # the HiddenLayer being fully-connected, it operates on 2D matrices of 205 # shape (batch_size, num_pixels) (i.e matrix of rasterized images). 206 # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4), 207 # or (500, 50 * 4 * 4) = (500, 800) with the default values. 208 layer2_input = layer1.output.flatten(2) 209 210 # construct a fully-connected sigmoidal layer 211 layer2 = HiddenLayer( 212 rng, 213 input=layer2_input, 214 n_in=nkerns[1] * 4 * 4, 215 n_out=500, 216 activation=T.tanh 217 ) 218 219 # classify the values of the fully-connected sigmoidal layer 220 layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10) 221 222 # the cost we minimize during training is the NLL of the model 223 cost = layer3.negative_log_likelihood(y) 224 225 # create a function to compute the mistakes that are made by the model 226 test_model = theano.function( 227 [index], 228 layer3.errors(y), 229 givens={ 230 x: test_set_x[index * batch_size: (index + 1) * batch_size], 231 y: test_set_y[index * batch_size: (index + 1) * batch_size] 232 } 233 ) 234 235 validate_model = theano.function( 236 [index], 237 layer3.errors(y), 238 givens={ 239 x: valid_set_x[index * batch_size: (index + 1) * batch_size], 240 y: valid_set_y[index * batch_size: (index + 1) * batch_size] 241 } 242 ) 243 244 # create a list of all model parameters to be fit by gradient descent 245 params = layer3.params + layer2.params + layer1.params + layer0.params 246 247 # create a list of gradients for all model parameters 248 grads = T.grad(cost, params) 249 250 # train_model is a function that updates the model parameters by 251 # SGD Since this model has many parameters, it would be tedious to 252 # manually create an update rule for each model parameter. We thus 253 # create the updates list by automatically looping over all 254 # (params[i], grads[i]) pairs. 255 updates = [ 256 (param_i, param_i - learning_rate * grad_i) 257 for param_i, grad_i in zip(params, grads) 258 ] 259 260 train_model = theano.function( 261 [index], 262 cost, 263 updates=updates, 264 givens={ 265 x: train_set_x[index * batch_size: (index + 1) * batch_size], 266 y: train_set_y[index * batch_size: (index + 1) * batch_size] 267 } 268 ) 269 # end-snippet-1 270 271 ############### 272 # TRAIN MODEL # 273 ############### 274 print '... training' 275 # early-stopping parameters 276 patience = 10000 # look as this many examples regardless 277 patience_increase = 2 # wait this much longer when a new best is 278 # found 279 improvement_threshold = 0.995 # a relative improvement of this much is 280 # considered significant 281 validation_frequency = min(n_train_batches, patience / 2) 282 # go through this many 283 # minibatche before checking the network 284 # on the validation set; in this case we 285 # check every epoch 286 287 best_validation_loss = numpy.inf 288 best_iter = 0 289 test_score = 0. 290 start_time = time.clock() 291 292 epoch = 0 293 done_looping = False 294 295 while (epoch < n_epochs) and (not done_looping): 296 epoch = epoch + 1 297 for minibatch_index in xrange(n_train_batches): 298 299 iter = (epoch - 1) * n_train_batches + minibatch_index 300 301 if iter % 100 == 0: 302 print 'training @ iter = ', iter 303 cost_ij = train_model(minibatch_index) 304 305 if (iter + 1) % validation_frequency == 0: 306 307 # compute zero-one loss on validation set 308 validation_losses = [validate_model(i) for i 309 in xrange(n_valid_batches)] 310 this_validation_loss = numpy.mean(validation_losses) 311 print('epoch %i, minibatch %i/%i, validation error %f %%' % 312 (epoch, minibatch_index + 1, n_train_batches, 313 this_validation_loss * 100.)) 314 315 # if we got the best validation score until now 316 if this_validation_loss < best_validation_loss: 317 318 #improve patience if loss improvement is good enough 319 if this_validation_loss < best_validation_loss * \ 320 improvement_threshold: 321 patience = max(patience, iter * patience_increase) 322 323 # save best validation score and iteration number 324 best_validation_loss = this_validation_loss 325 best_iter = iter 326 327 # test it on the test set 328 test_losses = [ 329 test_model(i) 330 for i in xrange(n_test_batches) 331 ] 332 test_score = numpy.mean(test_losses) 333 print((' epoch %i, minibatch %i/%i, test error of ' 334 'best model %f %%') % 335 (epoch, minibatch_index + 1, n_train_batches, 336 test_score * 100.)) 337 338 if patience <= iter: 339 done_looping = True 340 break 341 342 end_time = time.clock() 343 print('Optimization complete.') 344 print('Best validation score of %f %% obtained at iteration %i, ' 345 'with test performance %f %%' % 346 (best_validation_loss * 100., best_iter + 1, test_score * 100.)) 347 print >> sys.stderr, ('The code for file ' + 348 os.path.split(__file__)[1] + 349 ' ran for %.2fm' % ((end_time - start_time) / 60.)) 350 351 if __name__ == '__main__': 352 evaluate_lenet5() 353 354 355 def experiment(state, channel): 356 evaluate_lenet5(state.learning_rate, dataset=state.dataset)
在GeForce GT 720M GPU上運行170多分鍾
9.訓練技巧
- 濾波器的數量:計算一個卷積濾波器要比訓練傳統的MLPs花費更多的時間!由於特征圖的尺寸隨着深度不斷減小,所以在靠近輸出層的時候,濾波器(卷積核)的數量通常比較少。為了保留輸入層的信息,激活單元的數量在層數增加的時候要保證不能減少。
- 濾波器尺寸:濾波器尺寸通常依賴於數據集。在Minist數據集上最好的尺寸是5*5,通常的自然圖像較好的是12*12或者15*15
- 池化尺寸:典型的值就是2*2,對於很大的輸入,可以在較低的層上使用4*4,但是記住,這將會使得信號的維度降低為原來的1/16,可能會損失太多的信息