基於theano的深度卷積神經網絡


1.引言

卷積神經網絡(Convolutional Neural Networks , CNN)受到視網膜上的細胞只對視野范圍內的部分區域敏感,這一部分區域稱為感受域(receptive field).卷積神經網絡正是采用了這種機制,每一個神經元只與一部分輸入相連接。

2.稀疏連接

CNNs通過局部連接的方式揭示了空間中的局部相關性。在 $m$ 層的隱單元的輸入來自於 $m-1$ 層的一部分單元的加權和,這一部分單元在空間上是連續的感受域。如下圖:

可以把 $m-1$ 層想象成視網膜輸入。$m$ 層的單元的感受域的寬度均為3,因此只與視網膜層的 3 個相鄰的神經元相連接。$m+1$ 層的單元與其下面一層的連接方式也是如此。每一個神經元對不在感受域范圍內的變化是沒有反應的,所以上面的結構保證學習出一種“濾波器“,使其對局部空間的輸入模式產生強烈的反應。

但是,正如上面圖中所示,把許多這樣的濾波器層層級聯,局部感知逐漸變得全局感知,$m$ 層的每一個單元只對部分輸入感知,而 $m+1$ 層的單元又將 $m$ 層的感知結果綜合起來從而形成對輸入層全部的一個感知,所以$m+1$隱層單元可以看作是對寬度為5的特征的一個非線性編碼。

3.共享權重(Shared Weights

在CNNs,每個濾波器 $h_{i}$ 重復地逐步橫跨整個輸入層。重復的單元共享參數(權重向量和偏置),從而形成一幅特征圖。

在上圖中,3個隱層單元屬於同一幅特征圖,一樣顏色的權重值是共享的,即相等的。

濾波器通過這種方式使得圖像中可視層中任意位置的特征都能被檢測出來,權重共享大大減少了需要學習的參數的數量。

4.細節和符號

通過重復地把一個函數運用到整個圖像的子區域可以得到一幅特征圖,即用一個線性濾波器對圖像進行卷積操作,加上偏置項,然后再采用一個非線性函數。如果用 $h^{k}$ 表示第 $k$ 幅特征圖,其對應的濾波器由 $W^{k}$ 和偏置 $b_{k}$ 決定, 那么特征圖 $h^{k}$ 可以由下計算得到(采用 tanh 作為非線性函數):

$h_{ij}^{k}=tanh((W^{k}*x)_{ij}+b_{k}$

為了得到對數據更加豐富的表示,通常每個隱層都由多幅特征圖組成:$\{h^{\text{(k)}},k=0,...K\}$.權重 $W$ 由一個4維的張量表示, 4各維度分別表示:目的特征圖,源特征圖,源特征圖的垂直坐標,源特征圖的水平坐標。偏置 $b$ 由一個向量表示,其中每一個元素是每一個目標特征圖對應的偏置。可以表示如下:

在上圖中 $W_{ij}^{kl}$ 表示在 $m-1$ 層的第 $k$ 幅特征圖的每一個像素 與第 $m$ 層的第 $l$ 幅特征圖的像素 $(i,j)$ 之間的連接權重。

5.卷積操作

卷積操作(Convolution operation,ConvOp)在theano中是通過theano.tensor.signal.conv2d實現的,它需要兩個輸入:

  • 輸入圖像的部分子集對應的一個4階張量,該張量的每一維分別表示:子集的大小,輸入特征圖的編號,圖像的高度,圖像的寬度
  • 表示權重矩陣 $W$ 的一個4階張量,每一維分別表示:在 $m$ 的特征圖像的編號,$m-1$ 層特征圖像的編號,濾波器的高度,濾波器的寬度

這里還要介紹一個在下面代碼中將要用到的一個函數 dimshuffle(*pattern):

  例如dimshuffle('x', 2, 'x', 0, 1),就是將原來3階張量擴展為5階張量,新張量的第0維和第2維為0,而第1維,第3維和第4維分別由原來3階張量的第2維,第0維和第1維映射而來。

  如果原來張量的形狀為(20,30,40),通過dimshuffle('x', 2, 'x', 0, 1)之后,形狀變為(1,40,1,20,30)

  dimshuffle(0, 1) -> 和原來一樣

  dimshuffle(1, 0) -> 交換第1維和第0維的數據

  更多詳細資料參看:dimshuffle

下面用到的圖片3wolfmoon

下面對輸入是3 幅RGB 特征圖,進行卷積操作,並輸出卷積前后的對比圖:

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Tue Apr 28 10:22:14 2015
 4 
 5 @author: ZengJiulin
 6 """
 7 
 8 import theano
 9 from theano import tensor as T
10 from theano.tensor.nnet import conv
11 import pylab
12 from PIL import Image
13 import numpy
14 
15 rng = numpy.random.RandomState(23455)
16 
17 # instantiate 4D tensor for input
18 input = T.tensor4(name='input',dtype='float64')
19 
20 # initialize shared variable for weights.
21 # 輸出的特征圖 2 幅
22 # 輸入的特征圖 3 幅
23 # 濾波器的大小 9*9
24 w_shp = (2, 3, 9, 9)
25 w_bound = numpy.sqrt(3 * 9 * 9)
26 W = theano.shared( numpy.asarray(
27             rng.uniform(
28                 low=-1.0 / w_bound,
29                 high=1.0 / w_bound,
30                 size=w_shp),
31             dtype=input.dtype), name ='W')
32 
33 # initialize shared variable for bias (1D tensor) with random values
34 # IMPORTANT: biases are usually initialized to zero. However in this
35 # particular application, we simply apply the convolutional layer to
36 # an image without learning the parameters. We therefore initialize
37 # them to random values to "simulate" learning.
38 # 輸出的特征圖有 2 幅,所以偏置向量的元素個數同樣為 2
39 b_shp = (2,)
40 b = theano.shared(numpy.asarray(
41             rng.uniform(low=-.5, high=.5, size=b_shp),
42             dtype=input.dtype), name ='b')
43 
44 # build symbolic expression that computes the convolution of input with filters in w
45 conv_out = conv.conv2d(input, W)
46 
47 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output
48 # A few words on ``dimshuffle`` :
49 #   ``dimshuffle`` is a powerful tool in reshaping a tensor;
50 #   what it allows you to do is to shuffle dimension around
51 #   but also to insert new ones along which the tensor will be
52 #   broadcastable;
53 #   dimshuffle('x', 2, 'x', 0, 1)
54 #   This will work on 3d tensors with no broadcastable
55 #   dimensions. The first dimension will be broadcastable,
56 #   then we will have the third dimension of the input tensor as
57 #   the second of the resulting tensor, etc. If the tensor has
58 #   shape (20, 30, 40), the resulting tensor will have dimensions
59 #   (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)
60 #   More examples:
61 #    dimshuffle('x') -> make a 0d (scalar) into a 1d vector
62 #    dimshuffle(0, 1) -> identity
63 #    dimshuffle(1, 0) -> inverts the first and second dimensions
64 #    dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN)
65 #    dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1)
66 #    dimshuffle(2, 0, 1) -> AxBxC to CxAxB
67 #    dimshuffle(0, 'x', 1) -> AxB to Ax1xB
68 #    dimshuffle(1, 'x', 0) -> AxB to Bx1xA
69 
70 # 卷積后的結果加上偏置,然后進行一個非線性函數計算,這里采用的是sigmoid函數
71 output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x'))
72 
73 # create theano function to compute filtered images
74 f = theano.function([input], output)
75 
76 
77 
78 # open random image of dimensions 639x516
79 img_file = open('E:\\Python\\3wolfmoon.jpg','rb')
80 img = Image.open(img_file)
81 # dimensions are (height, width, channel)
82 img = numpy.asarray(img, dtype='float64') / 256.
83 
84 # put image in 4D tensor of shape (1, 3, height, width)
85 cc = img.transpose(2, 0, 1)
86 img_ = img.transpose(2, 0, 1).reshape(1, 3, 639, 516)
87 filtered_img = f(img_)
88 
89 # plot original image and first and second components of output
90 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
91 pylab.gray();
92 # recall that the convOp output (filtered image) is actually a "minibatch",
93 # of size 1 here, so we take index 0 in the first dimension:
94 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
95 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
96 pylab.show()

注意到,隨機初始化的濾波器非常像一個邊緣檢測器。

6.最大池化(MaxPooling)

最大池化是一種下采樣的形式,最大池化額操作就是把圖像分割成不重疊的矩形區域,每一個子區域選出一個最大值。

最大池化的兩個作用:

  • 去除了非最大值,減少了后面一層的計算量
  • (這里還沒怎么看懂,后面是原講義的說法)It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

最大池化在theano中是通過theano.tensor.signal.downsample.max_pool_2d實現的,例如:

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Tue Apr 28 15:17:23 2015
 4 
 5 @author: ZengJiulin
 6 """
 7 import theano
 8 from theano import tensor as T
 9 import numpy
10 from theano.tensor.signal import downsample
11 
12 input = T.dtensor4('input')
13 maxpool_shape = (2, 2)
14 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True)
15 f = theano.function([input],pool_out)
16 
17 invals = numpy.random.RandomState(1).rand(3, 2, 5, 5)
18 print 'With ignore_border set to True:'
19 print 'invals[0, 0, :, :] =\n', invals[0, 0, :, :]
20 print 'output[0, 0, :, :] =\n', f(invals)[0, 0, :, :]
21 
22 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False)
23 f = theano.function([input],pool_out)
24 print 'With ignore_border set to False:'
25 print 'invals[1, 0, :, :] =\n ', invals[1, 0, :, :]
26 print 'output[1, 0, :, :] =\n ', f(invals)[1, 0, :, :]

 

注意忽略邊界和不忽略邊界的區別:

>>> runfile('E:/Python/downsample.py', wdir=r'E:/Python')
Using gpu device 0: GeForce GT 720M
With ignore_border set to True:
invals[0, 0, :, :] =
[[  4.17022005e-01   7.20324493e-01   1.14374817e-04   3.02332573e-01
    1.46755891e-01]
 [  9.23385948e-02   1.86260211e-01   3.45560727e-01   3.96767474e-01
    5.38816734e-01]
 [  4.19194514e-01   6.85219500e-01   2.04452250e-01   8.78117436e-01
    2.73875932e-02]
 [  6.70467510e-01   4.17304802e-01   5.58689828e-01   1.40386939e-01
    1.98101489e-01]
 [  8.00744569e-01   9.68261576e-01   3.13424178e-01   6.92322616e-01
    8.76389152e-01]]
output[0, 0, :, :] =
[[ 0.72032449  0.39676747]
 [ 0.6852195   0.87811744]]
With ignore_border set to False:
invals[1, 0, :, :] =
  [[ 0.01936696  0.67883553  0.21162812  0.26554666  0.49157316]
 [ 0.05336255  0.57411761  0.14672857  0.58930554  0.69975836]
 [ 0.10233443  0.41405599  0.69440016  0.41417927  0.04995346]
 [ 0.53589641  0.66379465  0.51488911  0.94459476  0.58655504]
 [ 0.90340192  0.1374747   0.13927635  0.80739129  0.39767684]]
output[1, 0, :, :] =
  [[ 0.67883553  0.58930554  0.69975836]
 [ 0.66379465  0.94459476  0.58655504]
 [ 0.90340192  0.80739129  0.39767684]]
>>> 

 

7.LeNet整個模型

稀疏,卷積層和最大池化是 LeNet 模型的核心,但是具體的其他細節可能變化很大。下圖給出LeNet的一個描述:

底層由卷積層和下采樣層交替,頂層與傳統的 MLP 全連接。

從整個執行過程看,就是把一個4階的張量整理成MLP能夠處理的2維特征圖。

8.全部代碼

  1 # -*- coding: utf-8 -*-
  2 """
  3 Created on Sat Apr 25 14:20:02 2015
  4 
  5 @author: ZengJiulin
  6 """
  7 
  8 """This tutorial introduces the LeNet5 neural network architecture
  9 using Theano.  LeNet5 is a convolutional neural network, good for
 10 classifying images. This tutorial shows how to build the architecture,
 11 and comes with all the hyper-parameters you need to reproduce the
 12 paper's MNIST results.
 13 
 14 
 15 This implementation simplifies the model in the following ways:
 16 
 17  - LeNetConvPool doesn't implement location-specific gain and bias parameters
 18  - LeNetConvPool doesn't implement pooling by average, it implements pooling
 19    by max.
 20  - Digit classification is implemented with a logistic regression rather than
 21    an RBF network
 22  - LeNet5 was not fully-connected convolutions at second layer
 23 
 24 References:
 25  - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner:
 26    Gradient-Based Learning Applied to Document
 27    Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998.
 28    http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
 29 
 30 """
 31 import os
 32 import sys
 33 import time
 34 
 35 import numpy
 36 
 37 import theano
 38 import theano.tensor as T
 39 from theano.tensor.signal import downsample
 40 from theano.tensor.nnet import conv
 41 
 42 from logistic_sgd import LogisticRegression, load_data
 43 from mlp import HiddenLayer
 44 
 45 
 46 class LeNetConvPoolLayer(object):
 47     """Pool Layer of a convolutional network """
 48 
 49     def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
 50         """
 51         Allocate a LeNetConvPoolLayer with shared variable internal parameters.
 52 
 53         :type rng: numpy.random.RandomState
 54         :param rng: a random number generator used to initialize weights
 55 
 56         :type input: theano.tensor.dtensor4
 57         :param input: symbolic image tensor, of shape image_shape
 58 
 59         :type filter_shape: tuple or list of length 4
 60         :param filter_shape: (number of filters, num input feature maps,
 61                               filter height, filter width)
 62 
 63         :type image_shape: tuple or list of length 4
 64         :param image_shape: (batch size, num input feature maps,
 65                              image height, image width)
 66 
 67         :type poolsize: tuple or list of length 2
 68         :param poolsize: the downsampling (pooling) factor (#rows, #cols)
 69         """
 70 
 71         assert image_shape[1] == filter_shape[1]
 72         self.input = input
 73 
 74         # there are "num input feature maps * filter height * filter width"
 75         # inputs to each hidden unit
 76         
 77         fan_in = numpy.prod(filter_shape[1:])
 78         # each unit in the lower layer receives a gradient from:
 79         # "num output feature maps * filter height * filter width" /
 80         #   pooling size
 81         fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /
 82                    numpy.prod(poolsize))
 83         # initialize weights with random weights
 84         W_bound = numpy.sqrt(6. / (fan_in + fan_out))
 85         #卷積核本質上就是下面這個權重矩陣
 86         self.W = theano.shared(
 87             numpy.asarray(
 88                 rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
 89                 dtype=theano.config.floatX
 90             ),
 91             borrow=True
 92         )
 93 
 94         # the bias is a 1D tensor -- one bias per output feature map
 95         b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
 96         self.b = theano.shared(value=b_values, borrow=True)
 97 
 98         # convolve input feature maps with filters
 99         conv_out = conv.conv2d(
100             input=input,
101             filters=self.W,
102             filter_shape=filter_shape,
103             image_shape=image_shape
104         )
105 
106         # downsample each feature map individually, using maxpooling
107         pooled_out = downsample.max_pool_2d(
108             input=conv_out,
109             ds=poolsize,
110             ignore_border=True
111         )
112 
113         # add the bias term. Since the bias is a vector (1D array), we first
114         # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will
115         # thus be broadcasted across mini-batches and feature map
116         # width & height
117         self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
118 
119         # store parameters of this layer
120         self.params = [self.W, self.b]
121 
122 
123 def evaluate_lenet5(learning_rate=0.1, n_epochs=200,
124                     dataset='mnist.pkl.gz',
125                     nkerns=[20, 50], batch_size=500):
126     """ Demonstrates lenet on MNIST dataset
127 
128     :type learning_rate: float
129     :param learning_rate: learning rate used (factor for the stochastic
130                           gradient)
131 
132     :type n_epochs: int
133     :param n_epochs: maximal number of epochs to run the optimizer
134 
135     :type dataset: string
136     :param dataset: path to the dataset used for training /testing (MNIST here)
137 
138     :type nkerns: list of ints
139     :param nkerns: number of kernels on each layer(兩層,第一層20個卷積核,
140         第二層50個卷積核)
141     """
142 
143     rng = numpy.random.RandomState(23455)
144 
145     datasets = load_data(dataset)
146 
147     train_set_x, train_set_y = datasets[0]
148     valid_set_x, valid_set_y = datasets[1]
149     test_set_x, test_set_y = datasets[2]
150 
151     # compute number of minibatches for training, validation and testing
152     n_train_batches = train_set_x.get_value(borrow=True).shape[0]
153     n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
154     n_test_batches = test_set_x.get_value(borrow=True).shape[0]
155     n_train_batches /= batch_size
156     n_valid_batches /= batch_size
157     n_test_batches /= batch_size
158 
159     # allocate symbolic variables for the data
160     index = T.lscalar()  # index to a [mini]batch
161 
162     # start-snippet-1
163     x = T.matrix('x')   # the data is presented as rasterized images
164     y = T.ivector('y')  # the labels are presented as 1D vector of
165                         # [int] labels
166 
167     ######################
168     # BUILD ACTUAL MODEL #
169     ######################
170     print '... building the model'
171 
172     # Reshape matrix of rasterized images of shape (batch_size, 28 * 28)
173     # to a 4D tensor, compatible with our LeNetConvPoolLayer
174     # (28, 28) is the size of MNIST images.
175     # 輸入一幅圖像
176     layer0_input = x.reshape((batch_size, 1, 28, 28))
177 
178     # Construct the first convolutional pooling layer:
179     # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24)
180     # maxpooling reduces this further to (24/2, 24/2) = (12, 12)
181     # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12)
182     layer0 = LeNetConvPoolLayer(
183         rng,
184         input=layer0_input,
185         image_shape=(batch_size, 1, 28, 28),
186         filter_shape=(nkerns[0], 1, 5, 5),
187         poolsize=(2, 2)
188     )
189 
190     # Construct the second convolutional pooling layer
191     # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8)
192     # maxpooling reduces this further to (8/2, 8/2) = (4, 4)
193     # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4)
194     # 由於第0層有nkerns[0]個卷積核,所以輸出了nkerns[0]幅特征圖
195     # 第1層的輸入就是第0層的輸出
196     layer1 = LeNetConvPoolLayer(
197         rng,
198         input=layer0.output,
199         image_shape=(batch_size, nkerns[0], 12, 12),
200         filter_shape=(nkerns[1], nkerns[0], 5, 5),
201         poolsize=(2, 2)
202     )
203 
204     # the HiddenLayer being fully-connected, it operates on 2D matrices of
205     # shape (batch_size, num_pixels) (i.e matrix of rasterized images).
206     # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4),
207     # or (500, 50 * 4 * 4) = (500, 800) with the default values.
208     layer2_input = layer1.output.flatten(2)
209 
210     # construct a fully-connected sigmoidal layer
211     layer2 = HiddenLayer(
212         rng,
213         input=layer2_input,
214         n_in=nkerns[1] * 4 * 4,
215         n_out=500,
216         activation=T.tanh
217     )
218 
219     # classify the values of the fully-connected sigmoidal layer
220     layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
221 
222     # the cost we minimize during training is the NLL of the model
223     cost = layer3.negative_log_likelihood(y)
224 
225     # create a function to compute the mistakes that are made by the model
226     test_model = theano.function(
227         [index],
228         layer3.errors(y),
229         givens={
230             x: test_set_x[index * batch_size: (index + 1) * batch_size],
231             y: test_set_y[index * batch_size: (index + 1) * batch_size]
232         }
233     )
234 
235     validate_model = theano.function(
236         [index],
237         layer3.errors(y),
238         givens={
239             x: valid_set_x[index * batch_size: (index + 1) * batch_size],
240             y: valid_set_y[index * batch_size: (index + 1) * batch_size]
241         }
242     )
243 
244     # create a list of all model parameters to be fit by gradient descent
245     params = layer3.params + layer2.params + layer1.params + layer0.params
246 
247     # create a list of gradients for all model parameters
248     grads = T.grad(cost, params)
249 
250     # train_model is a function that updates the model parameters by
251     # SGD Since this model has many parameters, it would be tedious to
252     # manually create an update rule for each model parameter. We thus
253     # create the updates list by automatically looping over all
254     # (params[i], grads[i]) pairs.
255     updates = [
256         (param_i, param_i - learning_rate * grad_i)
257         for param_i, grad_i in zip(params, grads)
258     ]
259 
260     train_model = theano.function(
261         [index],
262         cost,
263         updates=updates,
264         givens={
265             x: train_set_x[index * batch_size: (index + 1) * batch_size],
266             y: train_set_y[index * batch_size: (index + 1) * batch_size]
267         }
268     )
269     # end-snippet-1
270 
271     ###############
272     # TRAIN MODEL #
273     ###############
274     print '... training'
275     # early-stopping parameters
276     patience = 10000  # look as this many examples regardless
277     patience_increase = 2  # wait this much longer when a new best is
278                            # found
279     improvement_threshold = 0.995  # a relative improvement of this much is
280                                    # considered significant
281     validation_frequency = min(n_train_batches, patience / 2)
282                                   # go through this many
283                                   # minibatche before checking the network
284                                   # on the validation set; in this case we
285                                   # check every epoch
286 
287     best_validation_loss = numpy.inf
288     best_iter = 0
289     test_score = 0.
290     start_time = time.clock()
291 
292     epoch = 0
293     done_looping = False
294 
295     while (epoch < n_epochs) and (not done_looping):
296         epoch = epoch + 1
297         for minibatch_index in xrange(n_train_batches):
298 
299             iter = (epoch - 1) * n_train_batches + minibatch_index
300 
301             if iter % 100 == 0:
302                 print 'training @ iter = ', iter
303             cost_ij = train_model(minibatch_index)
304 
305             if (iter + 1) % validation_frequency == 0:
306 
307                 # compute zero-one loss on validation set
308                 validation_losses = [validate_model(i) for i
309                                      in xrange(n_valid_batches)]
310                 this_validation_loss = numpy.mean(validation_losses)
311                 print('epoch %i, minibatch %i/%i, validation error %f %%' %
312                       (epoch, minibatch_index + 1, n_train_batches,
313                        this_validation_loss * 100.))
314 
315                 # if we got the best validation score until now
316                 if this_validation_loss < best_validation_loss:
317 
318                     #improve patience if loss improvement is good enough
319                     if this_validation_loss < best_validation_loss *  \
320                        improvement_threshold:
321                         patience = max(patience, iter * patience_increase)
322 
323                     # save best validation score and iteration number
324                     best_validation_loss = this_validation_loss
325                     best_iter = iter
326 
327                     # test it on the test set
328                     test_losses = [
329                         test_model(i)
330                         for i in xrange(n_test_batches)
331                     ]
332                     test_score = numpy.mean(test_losses)
333                     print(('     epoch %i, minibatch %i/%i, test error of '
334                            'best model %f %%') %
335                           (epoch, minibatch_index + 1, n_train_batches,
336                            test_score * 100.))
337 
338             if patience <= iter:
339                 done_looping = True
340                 break
341 
342     end_time = time.clock()
343     print('Optimization complete.')
344     print('Best validation score of %f %% obtained at iteration %i, '
345           'with test performance %f %%' %
346           (best_validation_loss * 100., best_iter + 1, test_score * 100.))
347     print >> sys.stderr, ('The code for file ' +
348                           os.path.split(__file__)[1] +
349                           ' ran for %.2fm' % ((end_time - start_time) / 60.))
350 
351 if __name__ == '__main__':
352     evaluate_lenet5()
353 
354 
355 def experiment(state, channel):
356     evaluate_lenet5(state.learning_rate, dataset=state.dataset)
View Code

 

在GeForce GT 720M GPU上運行170多分鍾

9.訓練技巧

  • 濾波器的數量:計算一個卷積濾波器要比訓練傳統的MLPs花費更多的時間!由於特征圖的尺寸隨着深度不斷減小,所以在靠近輸出層的時候,濾波器(卷積核)的數量通常比較少。為了保留輸入層的信息,激活單元的數量在層數增加的時候要保證不能減少。
  • 濾波器尺寸:濾波器尺寸通常依賴於數據集。在Minist數據集上最好的尺寸是5*5,通常的自然圖像較好的是12*12或者15*15
  • 池化尺寸:典型的值就是2*2,對於很大的輸入,可以在較低的層上使用4*4,但是記住,這將會使得信號的維度降低為原來的1/16,可能會損失太多的信息

學習資料來源:http://deeplearning.net/tutorial/lenet.html#lenet


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM