基於theano的降噪自動編碼器(Denoising Autoencoders--DA)


1.自動編碼器

自動編碼器首先通過下面的映射,把輸入 $x\in[0,1]^{d}$映射到一個隱層 $y\in[0,1]^{d^{'}}$(編碼器):

$y=s(Wx+b)$

其中 $s$ 是非線性的函數,例如sigmoid. 隱層表示 $y$ ,即編碼然后被映射回(通過解碼器)一個重構的 $z$,形狀與輸入$x$ 一樣:

$z=s(W^{'}y+b^{'})$

這里 $W^{'}$ 不是表示 $W$ 的轉置。$z$ 應該看作在給定編碼 $y$ 的條件下對 $x$ 的預測。反向映射的權重矩陣 $W^{'}$ 可能會被約束為前向映射的轉置:

$W^{'}=W^{T}$,這就是所謂的tied weights, 所以這個模型的參數($W$, $b$, $b^{'}$)的優化就是使得平均重構誤差最小化。

重構誤差有很多方式衡量,主要取決於對輸入給定的編碼的分布的合適的假設。傳統的平方差 $L(x,z)=||x-z||^{2}$ 可以應用到此。如果輸入可以看作是位向量或者概率向量,交叉熵也可以用來衡量:

$L_{H}(x,z)=-\sum_{k=1}^{d}[x_{k}\log z_{k}+(1-x_{k})\log(1-x_{k})]$

這就是希望編碼 $y$ 是一種分布的表示,這種表示能夠捕捉到輸入數據變化的主元的坐標,有一點類似於PCA.

確實,如果有一層線性隱層,並且用均方差最小化的准則來訓練網絡,那么 $k$ 個隱單元就是學習去把輸入映射到數據最開始的 $k$ 個主元的的范圍之內。如果隱層是非線性的,自動編碼器就不同於PCA,因為自動編碼器能夠捕獲數據分布的不同方面。

因為 $y$ 可以看做是 $x$ 的有損失的壓縮,所以很難做到對所有的輸入 $x$ 都能產生很好(損失小)的壓縮。優化就是要使得對於訓練樣本的壓縮很好,並且希望對於其他輸入同樣有很好的壓縮,但是這里的其他輸入並不是指任意的輸入,一般要求其他輸入要和訓練集是服從同一分布的。這就是一個自動編碼器泛化的意義:對於與訓練樣本來自同一分布的測試樣本,產生很小的重構誤差,但是對於在輸入空間隨意采樣的的其他輸入通常產生很高的重構誤差。

首先用theano來實現自動編碼器----

  1 class dA(object):
  2     """Auto-Encoder class
  3 
  4     一個降噪自動編碼器就是去通過把一些加噪聲的輸入映射到隱層空間,然后再映射回輸入空間來
  5     重構真實的輸入。
  6     (1)首先把真實輸入加噪聲
  7     (2)把加噪聲的輸入映射到隱層空間
  8     (3)重構真實輸入
  9     (4)計算重構誤差
 10     .. math::
 11 
 12         \tilde{x} ~ q_D(\tilde{x}|x)                                     (1)
 13 
 14         y = s(W \tilde{x} + b)                                           (2)
 15 
 16         x = s(W' y  + b')                                                (3)
 17 
 18         L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)]      (4)
 19 
 20     """
 21 
 22     def __init__(
 23         self,
 24         numpy_rng,
 25         theano_rng=None,
 26         input=None,
 27         n_visible=784,
 28         n_hidden=500,
 29         W=None,
 30         bhid=None,
 31         bvis=None
 32     ):
 33         """
 34         dA類的初始化:可視層單元數量,隱層單元數量,加噪聲程度。重構過程還需要輸入,權重
 35         和偏置。
 36         :type numpy_rng: numpy.random.RandomState
 37         :param numpy_rng: number random generator used to generate weights
 38 
 39         :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
 40         :param theano_rng: Theano random generator; if None is given one is
 41                      generated based on a seed drawn from `rng`
 42 
 43         :type input: theano.tensor.TensorType
 44         :param input: a symbolic description of the input or None for
 45                       standalone dA
 46 
 47         :type n_visible: int
 48         :param n_visible: number of visible units
 49 
 50         :type n_hidden: int
 51         :param n_hidden:  number of hidden units
 52 
 53         :type W: theano.tensor.TensorType
 54         :param W: Theano variable pointing to a set of weights that should be
 55                   shared belong the dA and another architecture; if dA should
 56                   be standalone set this to None
 57 
 58         :type bhid: theano.tensor.TensorType
 59         :param bhid: Theano variable pointing to a set of biases values (for
 60                      hidden units) that should be shared belong dA and another
 61                      architecture; if dA should be standalone set this to None
 62 
 63         :type bvis: theano.tensor.TensorType
 64         :param bvis: Theano variable pointing to a set of biases values (for
 65                      visible units) that should be shared belong dA and another
 66                      architecture; if dA should be standalone set this to None
 67 
 68 
 69         """
 70         self.n_visible = n_visible
 71         self.n_hidden = n_hidden
 72 
 73         # create a Theano random generator that gives symbolic random values
 74         if not theano_rng:
 75             theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
 76 
 77         # note : W' was written as `W_prime` and b' as `b_prime`
 78         if not W:
 79             # W is initialized with `initial_W` which is uniformely sampled
 80             # from -4*sqrt(6./(n_visible+n_hidden)) and
 81             # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
 82             # converted using asarray to dtype
 83             # theano.config.floatX so that the code is runable on GPU
 84             initial_W = numpy.asarray(
 85                 numpy_rng.uniform(
 86                     low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
 87                     high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
 88                     size=(n_visible, n_hidden)
 89                 ),
 90                 dtype=theano.config.floatX
 91             )
 92             W = theano.shared(value=initial_W, name='W', borrow=True)
 93 
 94         if not bvis:
 95             bvis = theano.shared(
 96                 value=numpy.zeros(
 97                     n_visible,
 98                     dtype=theano.config.floatX
 99                 ),
100                 borrow=True
101             )
102 
103         if not bhid:
104             bhid = theano.shared(
105                 value=numpy.zeros(
106                     n_hidden,
107                     dtype=theano.config.floatX
108                 ),
109                 name='b',
110                 borrow=True
111             )
112 
113         self.W = W
114         # b corresponds to the bias of the hidden
115         self.b = bhid
116         # b_prime corresponds to the bias of the visible
117         self.b_prime = bvis
118         # tied weights, therefore W_prime is W transpose
119         self.W_prime = self.W.T
120         self.theano_rng = theano_rng
121         # if no input is given, generate a variable representing the input
122         if input is None:
123             # we use a matrix because we expect a minibatch of several
124             # examples, each example being a row
125             self.x = T.dmatrix(name='input')
126         else:
127             self.x = input
128 
129         self.params = [self.W, self.b, self.b_prime]

 

在棧式自動編碼器中,前一層的輸出將作為后面一層的輸入。

現在計算隱層表示和重構信號:

  def get_hidden_values(self, input):
        """ Computes the values of the hidden layer """
        return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
 def get_reconstructed_input(self, hidden):
        """Computes the reconstructed input given the values of the
        hidden layer

        """
        return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

 

利用這些函數計算損失和更新參數:

 1   def get_cost_updates(self, corruption_level, learning_rate):
 2         """ This function computes the cost and the updates for one trainng
 3         step of the dA """
 4 
 5         tilde_x = self.get_corrupted_input(self.x, corruption_level)
 6         y = self.get_hidden_values(tilde_x)
 7         z = self.get_reconstructed_input(y)
 8         # note : we sum over the size of a datapoint; if we are using
 9         #        minibatches, L will be a vector, with one entry per
10         #        example in minibatch
11         L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
12         # note : L is now a vector, where each element is the
13         #        cross-entropy cost of the reconstruction of the
14         #        corresponding example of the minibatch. We need to
15         #        compute the average of all these to get the cost of
16         #        the minibatch
17         cost = T.mean(L)
18 
19         # compute the gradients of the cost of the `dA` with respect
20         # to its parameters
21         gparams = T.grad(cost, self.params)
22         # generate the list of updates
23         updates = [
24             (param, param - learning_rate * gparam)
25             for param, gparam in zip(self.params, gparams)
26         ]
27 
28         return (cost, updates)

 

  現在可以定義一個函數迭代更新參數使得重構誤差最小化:

 1   da = dA(
 2         numpy_rng=rng,
 3         theano_rng=theano_rng,
 4         input=x,
 5         n_visible=28 * 28,
 6         n_hidden=500
 7     )
 8 
 9     cost, updates = da.get_cost_updates(
10         corruption_level=0.,
11         learning_rate=learning_rate
12     )
13 
14     train_da = theano.function(
15         [index],
16         cost,
17         updates=updates,
18         givens={
19             x: train_set_x[index * batch_size: (index + 1) * batch_size]
20         }
21     )

 

如果除了重構誤差最小之外沒有其他約束,我們自然希望重構的數據與輸入完全一樣最好,即隱層的編碼維度與輸入數據維度一樣。

然而在[Bengio07] 中的實驗表明,在實際中,隱單元數目比輸入層單元多(稱為超完備)的非線性自動編碼器能夠產生更加有用的表示(這里”有用“意思是產生更低的分類誤差)。

為了獲得對連續輸入較好的重構,單隱層的自動編碼器的非線性單元需要在第一層(編碼)時權重較小,這樣使得隱單元的非線性數據能夠處於激活函數的近似線性范圍內,而解碼時,需要很大的權重。

對於二元輸入,同樣需要最小化重構誤差。由於顯示或者隱式的正則化,使得解碼時的權重很難達到一個較大的值,優化算法會找到只適合與訓練集相似的樣本的編碼,這正是我們想要的。

2.降噪自動編碼器(DA)

DA的思路很簡單:為了使得隱層學習出更加魯棒的表示,防止簡單地學習出一個等價的表示,我們訓練一個DA,使得自動編碼器能能夠從加噪聲的的數據中重構出真實的數據。

DA主要做兩件事:對輸入進行編碼和消除噪聲的負面影響。

[Vincent08]中,隨機加噪聲就是隨機把輸入中的一些數據置為0,因此自動編碼器就是試圖通過沒有加噪聲的數據預測出加噪聲的數據。

為了把上面的自動編碼器轉換成DA,需要加上對輸入隨機加噪聲的操作,加噪聲的方式很多,這里只是隨機將輸入中的部分數據置為0.

下面做法就是生成與輸入形狀一樣的二項分布隨機數,然后與輸入相乘即可:

 1 from theano.tensor.shared_randomstreams import RandomStreams
 2 
 3 def get_corrupted_input(self, input, corruption_level):
 4       """ This function keeps ``1-corruption_level`` entries of the inputs the same
 5       and zero-out randomly selected subset of size ``coruption_level``
 6       Note : first argument of theano.rng.binomial is the shape(size) of
 7              random numbers that it should produce
 8              second argument is the number of trials
 9              third argument is the probability of success of any trial
10 
11               this will produce an array of 0s and 1s where 1 has a probability of
12               1 - ``corruption_level`` and 0 with ``corruption_level``
13       """
14       return  self.theano_rng.binomial(size=input.shape, n=1, p=1 - corruption_level) * input

 

最終DA類代碼變成下面:

  1 class dA(object):
  2    """Denoising Auto-Encoder class (dA)
  3 
  4    A denoising autoencoders tries to reconstruct the input from a corrupted
  5    version of it by projecting it first in a latent space and reprojecting
  6    it afterwards back in the input space. Please refer to Vincent et al.,2008
  7    for more details. If x is the input then equation (1) computes a partially
  8    destroyed version of x by means of a stochastic mapping q_D. Equation (2)
  9    computes the projection of the input into the latent space. Equation (3)
 10    computes the reconstruction of the input, while equation (4) computes the
 11    reconstruction error.
 12 
 13    .. math::
 14 
 15        \tilde{x} ~ q_D(\tilde{x}|x)                                     (1)
 16 
 17        y = s(W \tilde{x} + b)                                           (2)
 18 
 19        x = s(W' y  + b')                                                (3)
 20 
 21        L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)]      (4)
 22 
 23    """
 24 
 25    def __init__(self, numpy_rng, theano_rng=None, input=None, n_visible=784, n_hidden=500,
 26               W=None, bhid=None, bvis=None):
 27        """
 28        Initialize the dA class by specifying the number of visible units (the
 29        dimension d of the input ), the number of hidden units ( the dimension
 30        d' of the latent or hidden space ) and the corruption level. The
 31        constructor also receives symbolic variables for the input, weights and
 32        bias. Such a symbolic variables are useful when, for example the input is
 33        the result of some computations, or when weights are shared between the
 34        dA and an MLP layer. When dealing with SdAs this always happens,
 35        the dA on layer 2 gets as input the output of the dA on layer 1,
 36        and the weights of the dA are used in the second stage of training
 37        to construct an MLP.
 38 
 39        :type numpy_rng: numpy.random.RandomState
 40        :param numpy_rng: number random generator used to generate weights
 41 
 42        :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
 43        :param theano_rng: Theano random generator; if None is given one is generated
 44                     based on a seed drawn from `rng`
 45 
 46        :type input: theano.tensor.TensorType
 47        :paran input: a symbolic description of the input or None for standalone
 48                      dA
 49 
 50        :type n_visible: int
 51        :param n_visible: number of visible units
 52 
 53        :type n_hidden: int
 54        :param n_hidden:  number of hidden units
 55 
 56        :type W: theano.tensor.TensorType
 57        :param W: Theano variable pointing to a set of weights that should be
 58                  shared belong the dA and another architecture; if dA should
 59                  be standalone set this to None
 60 
 61        :type bhid: theano.tensor.TensorType
 62        :param bhid: Theano variable pointing to a set of biases values (for
 63                     hidden units) that should be shared belong dA and another
 64                     architecture; if dA should be standalone set this to None
 65 
 66        :type bvis: theano.tensor.TensorType
 67        :param bvis: Theano variable pointing to a set of biases values (for
 68                     visible units) that should be shared belong dA and another
 69                     architecture; if dA should be standalone set this to None
 70 
 71 
 72        """
 73        self.n_visible = n_visible
 74        self.n_hidden = n_hidden
 75 
 76        # create a Theano random generator that gives symbolic random values
 77        if not theano_rng :
 78            theano_rng = RandomStreams(rng.randint(2 ** 30))
 79 
 80        # note : W' was written as `W_prime` and b' as `b_prime`
 81        if not W:
 82            # W is initialized with `initial_W` which is uniformely sampled
 83            # from -4.*sqrt(6./(n_visible+n_hidden)) and 4.*sqrt(6./(n_hidden+n_visible))
 84            # the output of uniform if converted using asarray to dtype
 85            # theano.config.floatX so that the code is runable on GPU
 86            initial_W = numpy.asarray(numpy_rng.uniform(
 87                      low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
 88                      high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
 89                      size=(n_visible, n_hidden)), dtype=theano.config.floatX)
 90            W = theano.shared(value=initial_W, name='W')
 91 
 92        if not bvis:
 93            bvis = theano.shared(value = numpy.zeros(n_visible,
 94                                         dtype=theano.config.floatX), name='bvis')
 95 
 96        if not bhid:
 97            bhid = theano.shared(value=numpy.zeros(n_hidden,
 98                                              dtype=theano.config.floatX), name='bhid')
 99 
100        self.W = W
101        # b corresponds to the bias of the hidden
102        self.b = bhid
103        # b_prime corresponds to the bias of the visible
104        self.b_prime = bvis
105        # tied weights, therefore W_prime is W transpose
106        self.W_prime = self.W.T
107        self.theano_rng = theano_rng
108        # if no input is given, generate a variable representing the input
109        if input == None:
110            # we use a matrix because we expect a minibatch of several examples,
111            # each example being a row
112            self.x = T.dmatrix(name='input')
113        else:
114            self.x = input
115 
116        self.params = [self.W, self.b, self.b_prime]
117 
118    def get_corrupted_input(self, input, corruption_level):
119        """ This function keeps ``1-corruption_level`` entries of the inputs the same
120        and zero-out randomly selected subset of size ``coruption_level``
121        Note : first argument of theano.rng.binomial is the shape(size) of
122               random numbers that it should produce
123               second argument is the number of trials
124               third argument is the probability of success of any trial
125 
126                this will produce an array of 0s and 1s where 1 has a probability of
127                1 - ``corruption_level`` and 0 with ``corruption_level``
128        """
129        return  self.theano_rng.binomial(size=input.shape, n=1, p=1 - corruption_level) * input
130 
131 
132    def get_hidden_values(self, input):
133        """ Computes the values of the hidden layer """
134        return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
135 
136    def get_reconstructed_input(self, hidden ):
137        """ Computes the reconstructed input given the values of the hidden layer """
138        return  T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
139 
140    def get_cost_updates(self, corruption_level, learning_rate):
141        """ This function computes the cost and the updates for one trainng
142        step of the dA """
143 
144        tilde_x = self.get_corrupted_input(self.x, corruption_level)
145        y = self.get_hidden_values( tilde_x)
146        z = self.get_reconstructed_input(y)
147        # note : we sum over the size of a datapoint; if we are using minibatches,
148        #        L will  be a vector, with one entry per example in minibatch
149        L = -T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1 )
150        # note : L is now a vector, where each element is the cross-entropy cost
151        #        of the reconstruction of the corresponding example of the
152        #        minibatch. We need to compute the average of all these to get
153        #        the cost of the minibatch
154        cost = T.mean(L)
155 
156        # compute the gradients of the cost of the `dA` with respect
157        # to its parameters
158        gparams = T.grad(cost, self.params)
159        # generate the list of updates
160        updates = []
161        for param, gparam in zip(self.params, gparams):
162            updates.append((param, param - learning_rate * gparam))
163 
164        return (cost, updates)
View Code

 

3.訓練

為了視覺上看出,學習出來的權重到底是什么樣子,將使用下面代碼:

  1 import numpy
  2 
  3 
  4 def scale_to_unit_interval(ndar, eps=1e-8):
  5     """ Scales all values in the ndarray ndar to be between 0 and 1 """
  6     ndar = ndar.copy()
  7     ndar -= ndar.min()
  8     ndar *= 1.0 / (ndar.max() + eps)
  9     return ndar
 10 
 11 
 12 def tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0),
 13                        scale_rows_to_unit_interval=True,
 14                        output_pixel_vals=True):
 15     """
 16     Transform an array with one flattened image per row, into an array in
 17     which images are reshaped and layed out like tiles on a floor.
 18 
 19     This function is useful for visualizing datasets whose rows are images,
 20     and also columns of matrices for transforming those rows
 21     (such as the first layer of a neural net).
 22 
 23     :type X: a 2-D ndarray or a tuple of 4 channels, elements of which can
 24     be 2-D ndarrays or None;
 25     :param X: a 2-D array in which every row is a flattened image.
 26 
 27     :type img_shape: tuple; (height, width)
 28     :param img_shape: the original shape of each image
 29 
 30     :type tile_shape: tuple; (rows, cols)
 31     :param tile_shape: the number of images to tile (rows, cols)
 32 
 33     :param output_pixel_vals: if output should be pixel values (i.e. int8
 34     values) or floats
 35 
 36     :param scale_rows_to_unit_interval: if the values need to be scaled before
 37     being plotted to [0,1] or not
 38 
 39 
 40     :returns: array suitable for viewing as an image.
 41     (See:`Image.fromarray`.)
 42     :rtype: a 2-d array with same dtype as X.
 43 
 44     """
 45 
 46     assert len(img_shape) == 2
 47     assert len(tile_shape) == 2
 48     assert len(tile_spacing) == 2
 49 
 50     # The expression below can be re-written in a more C style as
 51     # follows :
 52     #
 53     # out_shape    = [0,0]
 54     # out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] -
 55     #                tile_spacing[0]
 56     # out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] -
 57     #                tile_spacing[1]
 58     out_shape = [
 59         (ishp + tsp) * tshp - tsp
 60         for ishp, tshp, tsp in zip(img_shape, tile_shape, tile_spacing)
 61     ]
 62 
 63     if isinstance(X, tuple):
 64         assert len(X) == 4
 65         # Create an output numpy ndarray to store the image
 66         if output_pixel_vals:
 67             out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
 68                                     dtype='uint8')
 69         else:
 70             out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
 71                                     dtype=X.dtype)
 72 
 73         #colors default to 0, alpha defaults to 1 (opaque)
 74         if output_pixel_vals:
 75             channel_defaults = [0, 0, 0, 255]
 76         else:
 77             channel_defaults = [0., 0., 0., 1.]
 78 
 79         for i in xrange(4):
 80             if X[i] is None:
 81                 # if channel is None, fill it with zeros of the correct
 82                 # dtype
 83                 dt = out_array.dtype
 84                 if output_pixel_vals:
 85                     dt = 'uint8'
 86                 out_array[:, :, i] = numpy.zeros(
 87                     out_shape,
 88                     dtype=dt
 89                 ) + channel_defaults[i]
 90             else:
 91                 # use a recurrent call to compute the channel and store it
 92                 # in the output
 93                 out_array[:, :, i] = tile_raster_images(
 94                     X[i], img_shape, tile_shape, tile_spacing,
 95                     scale_rows_to_unit_interval, output_pixel_vals)
 96         return out_array
 97 
 98     else:
 99         # if we are dealing with only one channel
100         H, W = img_shape
101         Hs, Ws = tile_spacing
102 
103         # generate a matrix to store the output
104         dt = X.dtype
105         if output_pixel_vals:
106             dt = 'uint8'
107         out_array = numpy.zeros(out_shape, dtype=dt)
108 
109         for tile_row in xrange(tile_shape[0]):
110             for tile_col in xrange(tile_shape[1]):
111                 if tile_row * tile_shape[1] + tile_col < X.shape[0]:
112                     this_x = X[tile_row * tile_shape[1] + tile_col]
113                     if scale_rows_to_unit_interval:
114                         # if we should scale values to be between 0 and 1
115                         # do this by calling the `scale_to_unit_interval`
116                         # function
117                         this_img = scale_to_unit_interval(
118                             this_x.reshape(img_shape))
119                     else:
120                         this_img = this_x.reshape(img_shape)
121                     # add the slice to the corresponding position in the
122                     # output array
123                     c = 1
124                     if output_pixel_vals:
125                         c = 255
126                     out_array[
127                         tile_row * (H + Hs): tile_row * (H + Hs) + H,
128                         tile_col * (W + Ws): tile_col * (W + Ws) + W
129                     ] = this_img * c
130         return out_array
View Code

 

所以整個訓練代碼如下:

"""
 This tutorial introduces denoising auto-encoders (dA) using Theano.

 Denoising autoencoders are the building blocks for SdA.
 They are based on auto-encoders as the ones used in Bengio et al. 2007.
 An autoencoder takes an input x and first maps it to a hidden representation
 y = f_{\theta}(x) = s(Wx+b), parameterized by \theta={W,b}. The resulting
 latent representation y is then mapped back to a "reconstructed" vector
 z \in [0,1]^d in input space z = g_{\theta'}(y) = s(W'y + b').  The weight
 matrix W' can optionally be constrained such that W' = W^T, in which case
 the autoencoder is said to have tied weights. The network is trained such
 that to minimize the reconstruction error (the error between x and z).

 For the denosing autoencoder, during training, first x is corrupted into
 \tilde{x}, where \tilde{x} is a partially destroyed version of x by means
 of a stochastic mapping. Afterwards y is computed as before (using
 \tilde{x}), y = s(W\tilde{x} + b) and z as s(W'y + b'). The reconstruction
 error is now measured between z and the uncorrupted input x, which is
 computed as the cross-entropy :
      - \sum_{k=1}^d[ x_k \log z_k + (1-x_k) \log( 1-z_k)]


 References :
   - P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol: Extracting and
   Composing Robust Features with Denoising Autoencoders, ICML'08, 1096-1103,
   2008
   - Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle: Greedy Layer-Wise
   Training of Deep Networks, Advances in Neural Information Processing
   Systems 19, 2007

"""

import os
import sys
import time

import numpy

import theano
import theano.tensor as T
from theano.tensor.shared_randomstreams import RandomStreams

from logistic_sgd import load_data
from utils import tile_raster_images

try:
    import PIL.Image as Image
except ImportError:
    import Image


# start-snippet-1
class dA(object):
    """Denoising Auto-Encoder class (dA)

    A denoising autoencoders tries to reconstruct the input from a corrupted
    version of it by projecting it first in a latent space and reprojecting
    it afterwards back in the input space. Please refer to Vincent et al.,2008
    for more details. If x is the input then equation (1) computes a partially
    destroyed version of x by means of a stochastic mapping q_D. Equation (2)
    computes the projection of the input into the latent space. Equation (3)
    computes the reconstruction of the input, while equation (4) computes the
    reconstruction error.

    .. math::

        \tilde{x} ~ q_D(\tilde{x}|x)                                     (1)

        y = s(W \tilde{x} + b)                                           (2)

        x = s(W' y  + b')                                                (3)

        L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)]      (4)

    """

    def __init__(
        self,
        numpy_rng,
        theano_rng=None,
        input=None,
        n_visible=784,
        n_hidden=500,
        W=None,
        bhid=None,
        bvis=None
    ):
        """
        Initialize the dA class by specifying the number of visible units (the
        dimension d of the input ), the number of hidden units ( the dimension
        d' of the latent or hidden space ) and the corruption level. The
        constructor also receives symbolic variables for the input, weights and
        bias. Such a symbolic variables are useful when, for example the input
        is the result of some computations, or when weights are shared between
        the dA and an MLP layer. When dealing with SdAs this always happens,
        the dA on layer 2 gets as input the output of the dA on layer 1,
        and the weights of the dA are used in the second stage of training
        to construct an MLP.

        :type numpy_rng: numpy.random.RandomState
        :param numpy_rng: number random generator used to generate weights

        :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
        :param theano_rng: Theano random generator; if None is given one is
                     generated based on a seed drawn from `rng`

        :type input: theano.tensor.TensorType
        :param input: a symbolic description of the input or None for
                      standalone dA

        :type n_visible: int
        :param n_visible: number of visible units

        :type n_hidden: int
        :param n_hidden:  number of hidden units

        :type W: theano.tensor.TensorType
        :param W: Theano variable pointing to a set of weights that should be
                  shared belong the dA and another architecture; if dA should
                  be standalone set this to None

        :type bhid: theano.tensor.TensorType
        :param bhid: Theano variable pointing to a set of biases values (for
                     hidden units) that should be shared belong dA and another
                     architecture; if dA should be standalone set this to None

        :type bvis: theano.tensor.TensorType
        :param bvis: Theano variable pointing to a set of biases values (for
                     visible units) that should be shared belong dA and another
                     architecture; if dA should be standalone set this to None


        """
        self.n_visible = n_visible
        self.n_hidden = n_hidden

        # create a Theano random generator that gives symbolic random values
        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))

        # note : W' was written as `W_prime` and b' as `b_prime`
        if not W:
            # W is initialized with `initial_W` which is uniformely sampled
            # from -4*sqrt(6./(n_visible+n_hidden)) and
            # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
            # converted using asarray to dtype
            # theano.config.floatX so that the code is runable on GPU
            initial_W = numpy.asarray(
                numpy_rng.uniform(
                    low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    size=(n_visible, n_hidden)
                ),
                dtype=theano.config.floatX
            )
            W = theano.shared(value=initial_W, name='W', borrow=True)

        if not bvis:
            bvis = theano.shared(
                value=numpy.zeros(
                    n_visible,
                    dtype=theano.config.floatX
                ),
                borrow=True
            )

        if not bhid:
            bhid = theano.shared(
                value=numpy.zeros(
                    n_hidden,
                    dtype=theano.config.floatX
                ),
                name='b',
                borrow=True
            )

        self.W = W
        # b corresponds to the bias of the hidden
        self.b = bhid
        # b_prime corresponds to the bias of the visible
        self.b_prime = bvis
        # tied weights, therefore W_prime is W transpose
        self.W_prime = self.W.T
        self.theano_rng = theano_rng
        # if no input is given, generate a variable representing the input
        if input is None:
            # we use a matrix because we expect a minibatch of several
            # examples, each example being a row
            self.x = T.dmatrix(name='input')
        else:
            self.x = input

        self.params = [self.W, self.b, self.b_prime]
    # end-snippet-1

    def get_corrupted_input(self, input, corruption_level):
        """This function keeps ``1-corruption_level`` entries of the inputs the
        same and zero-out randomly selected subset of size ``coruption_level``
        Note : first argument of theano.rng.binomial is the shape(size) of
               random numbers that it should produce
               second argument is the number of trials
               third argument is the probability of success of any trial

                this will produce an array of 0s and 1s where 1 has a
                probability of 1 - ``corruption_level`` and 0 with
                ``corruption_level``

                The binomial function return int64 data type by
                default.  int64 multiplicated by the input
                type(floatX) always return float64.  To keep all data
                in floatX when floatX is float32, we set the dtype of
                the binomial to floatX. As in our case the value of
                the binomial is always 0 or 1, this don't change the
                result. This is needed to allow the gpu to work
                correctly as it only support float32 for now.

        """
        return self.theano_rng.binomial(size=input.shape, n=1,
                                        p=1 - corruption_level,
                                        dtype=theano.config.floatX) * input

    def get_hidden_values(self, input):
        """ Computes the values of the hidden layer """
        return T.nnet.sigmoid(T.dot(input, self.W) + self.b)

    def get_reconstructed_input(self, hidden):
        """Computes the reconstructed input given the values of the
        hidden layer

        """
        return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

    def get_cost_updates(self, corruption_level, learning_rate):
        """ This function computes the cost and the updates for one trainng
        step of the dA """

        tilde_x = self.get_corrupted_input(self.x, corruption_level)
        y = self.get_hidden_values(tilde_x)
        z = self.get_reconstructed_input(y)
        # note : we sum over the size of a datapoint; if we are using
        #        minibatches, L will be a vector, with one entry per
        #        example in minibatch
        L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
        # note : L is now a vector, where each element is the
        #        cross-entropy cost of the reconstruction of the
        #        corresponding example of the minibatch. We need to
        #        compute the average of all these to get the cost of
        #        the minibatch
        cost = T.mean(L)

        # compute the gradients of the cost of the `dA` with respect
        # to its parameters
        gparams = T.grad(cost, self.params)
        # generate the list of updates
        updates = [
            (param, param - learning_rate * gparam)
            for param, gparam in zip(self.params, gparams)
        ]

        return (cost, updates)


def test_dA(learning_rate=0.1, training_epochs=15,
            dataset='mnist.pkl.gz',
            batch_size=20, output_folder='dA_plots'):

    """
    This demo is tested on MNIST

    :type learning_rate: float
    :param learning_rate: learning rate used for training the DeNosing
                          AutoEncoder

    :type training_epochs: int
    :param training_epochs: number of epochs used for training

    :type dataset: string
    :param dataset: path to the picked dataset

    """
    datasets = load_data(dataset)
    train_set_x, train_set_y = datasets[0]

    # compute number of minibatches for training, validation and testing
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size

    # allocate symbolic variables for the data
    index = T.lscalar()    # index to a [mini]batch
    x = T.matrix('x')  # the data is presented as rasterized images

    if not os.path.isdir(output_folder):
        os.makedirs(output_folder)
    os.chdir(output_folder)
    ####################################
    # BUILDING THE MODEL NO CORRUPTION #
    ####################################

    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))

    da = dA(
        numpy_rng=rng,
        theano_rng=theano_rng,
        input=x,
        n_visible=28 * 28,
        n_hidden=500
    )

    cost, updates = da.get_cost_updates(
        corruption_level=0.,
        learning_rate=learning_rate
    )

    train_da = theano.function(
        [index],
        cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size]
        }
    )

    start_time = time.clock()

    ############
    # TRAINING #
    ############

    # go through training epochs
    for epoch in xrange(training_epochs):
        # go through trainng set
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))

        print 'Training epoch %d, cost ' % epoch, numpy.mean(c)

    end_time = time.clock()

    training_time = (end_time - start_time)

    print >> sys.stderr, ('The no corruption code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % ((training_time) / 60.))
    image = Image.fromarray(
        tile_raster_images(X=da.W.get_value(borrow=True).T,
                           img_shape=(28, 28), tile_shape=(10, 10),
                           tile_spacing=(1, 1)))
    image.save('filters_corruption_0.png')

    #####################################
    # BUILDING THE MODEL CORRUPTION 30% #
    #####################################

    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))

    da = dA(
        numpy_rng=rng,
        theano_rng=theano_rng,
        input=x,
        n_visible=28 * 28,
        n_hidden=500
    )

    cost, updates = da.get_cost_updates(
        corruption_level=0.3,
        learning_rate=learning_rate
    )

    train_da = theano.function(
        [index],
        cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size]
        }
    )

    start_time = time.clock()

    ############
    # TRAINING #
    ############

    # go through training epochs
    for epoch in xrange(training_epochs):
        # go through trainng set
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))

        print 'Training epoch %d, cost ' % epoch, numpy.mean(c)

    end_time = time.clock()

    training_time = (end_time - start_time)

    print >> sys.stderr, ('The 30% corruption code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % (training_time / 60.))

    image = Image.fromarray(tile_raster_images(
        X=da.W.get_value(borrow=True).T,
        img_shape=(28, 28), tile_shape=(10, 10),
        tile_spacing=(1, 1)))
    image.save('filters_corruption_30.png')

    os.chdir('../')


if __name__ == '__main__':
    test_dA()
View Code

 

... loading data
Training epoch 0, cost  63.2891694201
Training epoch 1, cost  55.7866565443
Training epoch 2, cost  54.7631168984
Training epoch 3, cost  54.2420533514
Training epoch 4, cost  53.888670659
Training epoch 5, cost  53.6203505434
Training epoch 6, cost  53.4037459012
Training epoch 7, cost  53.2219976788
Training epoch 8, cost  53.0658010178
Training epoch 9, cost  52.9295596873
Training epoch 10, cost  52.8094163525
Training epoch 11, cost  52.7024367362
Training epoch 12, cost  52.606310148
Training epoch 13, cost  52.5191693641
Training epoch 14, cost  52.4395240004
The no corruption code for file dA.py ran for 10.21m
Training epoch 0, cost  81.7714190632
Training epoch 1, cost  73.4285756365
Training epoch 2, cost  70.8632686268
Training epoch 3, cost  69.3396642015
Training epoch 4, cost  68.4134660704
Training epoch 5, cost  67.723705304
Training epoch 6, cost  67.2401360252
Training epoch 7, cost  66.849303071
Training epoch 8, cost  66.5663948395
Training epoch 9, cost  66.3591257941
Training epoch 10, cost  66.1336658308
Training epoch 11, cost  65.9893924612
Training epoch 12, cost  65.8344131768
Training epoch 13, cost  65.7185348901
Training epoch 14, cost  65.6010749532
The 30% corruption code for file dA.py ran for 10.37m

 

沒有加噪聲訓練出的權重可視化結果:

加了30%噪聲訓練出的權重可視化結果:

學習資料來源:http://deeplearning.net/tutorial/dA.html#daa


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM