Deep learning：四十二(Denoise Autoencoder簡單理解)

本文轉載自查看原文 2013-08-16 08:02 68446 機器學習/ Deep Learning

　　前言：

　　當采用無監督的方法分層預訓練深度網絡的權值時，為了學習到較魯棒的特征，可以在網絡的可視層（即數據的輸入層）引入隨機噪聲，這種方法稱為Denoise Autoencoder(簡稱dAE)，由Bengio在08年提出，見其文章Extracting and composing robust features with denoising autoencoders.使用dAE時，可以用被破壞的輸入數據重構出原始的數據（指沒被破壞的數據），所以它訓練出來的特征會更魯棒。本篇博文主要是根據Benigio的那篇文章簡單介紹下dAE，然后通過2個簡單的實驗來說明實際編程中該怎樣應用dAE。這2個實驗都是網絡上現成的工具稍加改變而成，其中一個就是matlab的Deep Learning toolbox，見https://github.com/rasmusbergpalm/DeepLearnToolbox，另一個是與python相關的theano，參考：http://deeplearning.net/tutorial/dA.html.

　　基礎知識：

　　首先來看看Bengio論文中關於dAE的示意圖，如下：

　　由上圖可知，樣本x按照qD分布加入隨機噪聲后變為 ,按照文章的意思，這里並不是加入高斯噪聲，而是以一定概率使輸入層節點的值清為0，這點與上篇博文介紹的dropout（Deep learning：四十一(Dropout簡單理解)）很類似，只不過dropout作用在隱含層。此時輸入到可視層的數據變為，隱含層輸出為y，然后由y重構x的輸出z，注意此時這里不是重構，而是x.

　　Bengio對dAE的直觀解釋為：1.dAE有點類似人體的感官系統，比如人眼看物體時，如果物體某一小部分被遮住了，人依然能夠將其識別出來，2.多模態信息輸入人體時（比如聲音，圖像等），少了其中某些模態的信息有時影響也不大。3.普通的autoencoder的本質是學習一個相等函數，即輸入和重構后的輸出相等，這種相等函數的表示有個缺點就是當測試樣本和訓練樣本不符合同一分布，即相差較大時，效果不好，明顯，dAE在這方面的處理有所進步。

　　當然作者也從數學上給出了一定的解釋。

　　1. 流形學習的觀點。一般情況下，高維的數據都處於一個較低維的流形曲面上，而使用dAE得到的特征就基本處於這個曲面上，如下圖所示。而普通的autoencoder，即使是加入了稀疏約束，其提取出的特征也不是都在這個低維曲面上（雖然這樣也能提取出原始數據的主要信息）。

　　2.自頂向下的生成模型觀點的解釋。3.信息論觀點的解釋。4.隨機法觀點的解釋。這幾個觀點的解釋數學有一部分數學公式，大家具體去仔細看他的paper。

　　當在訓練深度網絡時，且采用了無監督方法預訓練權值，通常，Dropout和Denoise Autoencoder在使用時有一個小地方不同：Dropout在分層預訓練權值的過程中是不參與的，只是后面的微調部分引入；而Denoise Autoencoder是在每層預訓練的過程中作為輸入層被引入，在進行微調時不參與。另外，一般的重構誤差可以采用均方誤差的形式，但是如果輸入和輸出的向量元素都是位變量，則一般采用交叉熵來表示兩者的差異。

　　實驗過程：

　　實驗一：

　　同樣是用mnist手寫數字識別數據庫，訓練樣本數為60000，測試樣本為10000，采用matlab的Deep Learning工具箱（https://github.com/rasmusbergpalm/DeepLearnToolbox），2個隱含層，每個隱含層節點個數都是100，即整體網絡結構為：784-100-100-10. 實驗對比了有無使用denoise技術時識別的錯誤率以及兩種情況下學習到了的特征形狀，其實驗結果如下所示：

　　沒采用denoise的autoencoder時特征圖顯示：

　　測試樣本誤差率：9.33%

　　采用了denoise autoencoder時的特征圖顯示：

　　測試樣本誤差率：8.26%

　　由實驗結果圖可知，加入了噪聲后的自編碼器學習到的特征要稍好些（沒有去調參數，如果能調得一手好參的話，效果會更好）。

　　實驗一主要部分的代碼及注釋：

　　Test.m:

%% //導入數據
load mnist_uint8;
train_x = double(train_x)/255;
test_x  = double(test_x)/255;
train_y = double(train_y);
test_y  = double(test_y);

%% //實驗一：采用denoising autoencoder進行預訓練
rng(0);
sae = saesetup([784 100 100]); % //其實這里nn中的W已經被隨機初始化過
sae.ae{1}.activation_function       = 'sigm';
sae.ae{1}.learningRate              = 1;
sae.ae{1}.inputZeroMaskedFraction   = 0.;
sae.ae{2}.activation_function       = 'sigm';
sae.ae{2}.learningRate              = 1;
sae.ae{2}.inputZeroMaskedFraction   = 0.; %這里的denoise autocoder相當於隱含層的dropout,但它是分層訓練的
opts.numepochs =   1;
opts.batchsize = 100;
sae = saetrain(sae, train_x, opts);% //無監督學習，不需要傳入標簽值，學習好的權重放在sae中，
                                    %  //並且train_x是最后一個隱含層的輸出。由於是分層預訓練
                                    %  //的，所以每次訓練其實只考慮了一個隱含層，隱含層的輸入有
                                    %  //相應的denoise操作
visualize(sae.ae{1}.W{1}(:,2:end)')
% Use the SDAE to initialize a FFNN
nn = nnsetup([784 100 100 10]);
nn.activation_function              = 'sigm';
nn.learningRate                     = 1;
%add pretrained weights
nn.W{1} = sae.ae{1}.W{1}; % //將sae訓練好了的權值賦給nn網絡作為初始值，覆蓋了前面的隨機初始化
nn.W{2} = sae.ae{2}.W{1};
% Train the FFNN
opts.numepochs =   1;
opts.batchsize = 100;
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
str = sprintf('testing error rate is: %f',er);
disp(str)


%% //實驗二：采用denoising autoencoder進行預訓練
rng(0);
sae = saesetup([784 100 100]); % //其實這里nn中的W已經被隨機初始化過
sae.ae{1}.activation_function       = 'sigm';
sae.ae{1}.learningRate              = 1;
sae.ae{1}.inputZeroMaskedFraction   = 0.5;
sae.ae{2}.activation_function       = 'sigm';
sae.ae{2}.learningRate              = 1;
sae.ae{2}.inputZeroMaskedFraction   = 0.5; %這里的denoise autocoder相當於隱含層的dropout,但它是分層訓練的
opts.numepochs =   1;
opts.batchsize = 100;
sae = saetrain(sae, train_x, opts);% //無監督學習，不需要傳入標簽值，學習好的權重放在sae中，
                                    %  //並且train_x是最后一個隱含層的輸出。由於是分層預訓練
                                    %  //的，所以每次訓練其實只考慮了一個隱含層，隱含層的輸入有
                                    %  //相應的denoise操作
figure,visualize(sae.ae{1}.W{1}(:,2:end)')
% Use the SDAE to initialize a FFNN
nn = nnsetup([784 100 100 10]);
nn.activation_function              = 'sigm';
nn.learningRate                     = 1;
%add pretrained weights
nn.W{1} = sae.ae{1}.W{1}; % //將sae訓練好了的權值賦給nn網絡作為初始值，覆蓋了前面的隨機初始化
nn.W{2} = sae.ae{2}.W{1};
% Train the FFNN
opts.numepochs =   1;
opts.batchsize = 100;
nn = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);
str = sprintf('testing error rate is: %f',er);
disp(str)

　　也可以類似於上篇博文跟蹤Dropout代碼一樣，這里去跟蹤下dAE代碼。使用sae時將輸入層加入50%噪聲的語句：

　　sae.ae{1}.inputZeroMaskedFraction = 0.5;

　　繼續跟蹤到sae的訓練過程，其訓練過程也是采用nntrain()函數，里面有如下代碼：

if(nn.inputZeroMaskedFraction ~= 0)

　　batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction); % //在輸入數據上加入噪聲，rand()為0-1之間的均勻分布

　　代碼一目了然。

　　實驗二：

　　這部分的實驗基本上就是網頁教程上的：http://deeplearning.net/tutorial/dA.html，具體細節可以參考教程的內容，里面講得比較詳細。由於其dAE的實現是用了theano庫，所以首先需要安裝theano以及與之相關的一系列庫，比如在ubuntu下安裝就可以參考網頁Installing Theano和Easy Installation of an optimized Theano on Ubuntu，很容易成功（注意在測試時有些不重要的小failure可以忽略掉）。下面是我安裝theano時的各版本號：

　　ubuntu 13.04,linux操作系統.

　　python： 2.7.4，編程語言包.

　　python-numpy 1.7.1，python的數學運算包，包含矩陣運算.

　　python-scipy 0.11，有利於稀疏矩陣運算.

　　python-pip,1.1,python的包管理軟件.　　

　　python-nose,1.1.2,有利於thenao的測試.

　　libopenblas-dev,0.2.6,用來管理頭文件的.

　　git,1.8.1,用來下載軟件版本的.

　　gcc,4.7.3,用來編譯c的.

　　theano,0.6.0rc3,多維矩陣操作，優化，可與GPU結合的python庫.

　　這個實驗也是用的mnist數據庫，不過只用了一個隱含層節點，節點個數為500. 實驗目的只是為了對比在使用denoise前后的autoencoder學習到的特征形狀的區別。

　　沒用denoise時的特征：

　　使用了denoise時的特征：

　　由圖可見，加入了denoise后學習到的特征更具有代表性。

　　實驗二主要部分的代碼及注釋：

　　dA.py:

#_*_coding:UTF-8_*_
import cPickle
import gzip
import os
import sys
import time
import numpy
import theano
import theano.tensor as T #theano中一些常見的符號操作在子庫tensor中
from theano.tensor.shared_randomstreams import RandomStreams
from logistic_sgd import load_data
from utils import tile_raster_images
import PIL.Image #繪圖所用

class dA(object):
    def __init__(self, numpy_rng, theano_rng=None, input=None,
                 n_visible=784, n_hidden=500,
                 W=None, bhid=None, bvis=None):
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        if not W:
            initial_W = numpy.asarray(numpy_rng.uniform(
                      low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                      high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                      size=(n_visible, n_hidden)), dtype=theano.config.floatX)
            W = theano.shared(value=initial_W, name='W', borrow=True) #W,bvis,bhid都為共享變量
        if not bvis:
            bvis = theano.shared(value=numpy.zeros(n_visible, dtype=theano.config.floatX), borrow=True)
        if not bhid:
            bhid = theano.shared(value=numpy.zeros(n_hidden, dtype=theano.config.floatX), name='b', borrow=True)
        self.W = W
        self.b = bhid
        self.b_prime = bvis
        self.W_prime = self.W.T
        self.theano_rng = theano_rng
        if input == None:
            self.x = T.dmatrix(name='input')
        else:
            self.x = input #保存輸入數據
        self.params = [self.W, self.b, self.b_prime]

    def get_corrupted_input(self, input, corruption_level):
        return  self.theano_rng.binomial(size=input.shape, n=1,
                                         p=1 - corruption_level,
                                         dtype=theano.config.floatX) * input #binomial()函數為產生0，1的分布，這里是設置產生1的概率為p

    def get_hidden_values(self, input):
        return T.nnet.sigmoid(T.dot(input, self.W) + self.b)

    def get_reconstructed_input(self, hidden):
        return  T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

    def get_cost_updates(self, corruption_level, learning_rate): #每調用該函數一次，就算出了前向傳播的誤差cost，網絡參數及其導數
        tilde_x = self.get_corrupted_input(self.x, corruption_level)
        y = self.get_hidden_values(tilde_x)
        z = self.get_reconstructed_input(y)
        L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
        cost = T.mean(L)
        gparams = T.grad(cost, self.params)
        updates = []
        for param, gparam in zip(self.params, gparams):
            updates.append((param, param - learning_rate * gparam)) #append列表中存的是參數和其導數構成的元組
        return (cost, updates)

# 測試函數
def test_dA(learning_rate=0.1, training_epochs=15,
            dataset='data/mnist.pkl.gz',
            batch_size=20, output_folder='dA_plots'):
    datasets = load_data(dataset)
    train_set_x, train_set_y = datasets[0] #train_set_x矩陣中每一行代表一個樣本
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size #求出batch的個數
    index = T.lscalar()    # index to a [mini]batch
    x = T.matrix('x')  # the data is presented as rasterized images
    if not os.path.isdir(output_folder):
        os.makedirs(output_folder)
    os.chdir(output_folder)

    # 沒有使用denoise時
    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))
    da = dA(numpy_rng=rng, theano_rng=theano_rng, input=x,
            n_visible=28 * 28, n_hidden=500) # 創建dA對象時，並不需要數據x，只是給對象da中的一些網絡結構參數賦值
    cost, updates = da.get_cost_updates(corruption_level=0.,
                                        learning_rate=learning_rate)
    train_da = theano.function([index], cost, updates=updates, #theano.function()為定義一個符號函數，這里的自變量為indexy
         givens={x: train_set_x[index * batch_size: (index + 1) * batch_size]}) #輸出變量為cost
    start_time = time.clock()
    for epoch in xrange(training_epochs):
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))
        print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
    end_time = time.clock()
    training_time = (end_time - start_time)
    print >> sys.stderr, ('The no corruption code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % ((training_time) / 60.))
    image = PIL.Image.fromarray(
        tile_raster_images(X=da.W.get_value(borrow=True).T,
                           img_shape=(28, 28), tile_shape=(10, 10),
                           tile_spacing=(1, 1)))
    image.save('filters_corruption_0.png')

    # 使用了denoise時
    rng = numpy.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))
    da = dA(numpy_rng=rng, theano_rng=theano_rng, input=x,
            n_visible=28 * 28, n_hidden=500)
    cost, updates = da.get_cost_updates(corruption_level=0.3,
                                        learning_rate=learning_rate) #將輸入樣本每個像素點以30%的概率被清0
    train_da = theano.function([index], cost, updates=updates,
         givens={x: train_set_x[index * batch_size:
                                  (index + 1) * batch_size]})
    start_time = time.clock()
    for epoch in xrange(training_epochs):
        c = []
        for batch_index in xrange(n_train_batches):
            c.append(train_da(batch_index))
        print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
    end_time = time.clock()
    training_time = (end_time - start_time)
    print >> sys.stderr, ('The 30% corruption code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % (training_time / 60.))
    image = PIL.Image.fromarray(tile_raster_images(
        X=da.W.get_value(borrow=True).T,
        img_shape=(28, 28), tile_shape=(10, 10),
        tile_spacing=(1, 1)))
    image.save('filters_corruption_30.png')
    os.chdir('../')

if __name__ == '__main__':
    test_dA()

　　其中與dAE相關的代碼為：

def get_corrupted_input(self, input, corruption_level):
      return self.theano_rng.binomial(size=input.shape, n=1,p=1 - corruption_level,\
             dtype=theano.config.floatX) * input #binomial()函數為產生0，1的分布，這里是設置產生1的概率

　　參考資料：

　　Vincent, P., et al. (2008). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on Machine learning, ACM.

https://github.com/rasmusbergpalm/DeepLearnToolbox

http://deeplearning.net/tutorial/dA.html

Deep learning：四十一(Dropout簡單理解)

Installing Theano

Easy Installation of an optimized Theano on Ubuntu

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Deep learning：四十八(Contractive AutoEncoder簡單理解) Deep learning：八(Sparse Autoencoder) Deep learning：九(Sparse Autoencoder練習) Deep learning：四十六(DropConnect簡單理解) Deep learning：十九(RBM簡單理解) Deep learning：四十一(Dropout簡單理解) Deep learning：五十(Deconvolution Network簡單理解) Deep learning：四十五(maxout簡單理解) Deep learning：二十四(stacked autoencoder練習) Deep learning：四十九(RNN-RBM簡單理解)