Deep Learning 8_深度學習UFLDL教程：Stacked Autocoders and Implement deep networks for digit classification_Exercise（斯坦福大學深度學習教程）

本文轉載自查看原文 2015-11-07 17:31 1956 deep learning/ Adrew Ng/ 斯坦福大學/ ufldl/ 深度學習

前言

1.理論知識：UFLDL教程、Deep learning：十六(deep networks)

2.實驗環境：win7， matlab2015b，16G內存，2T硬盤

3.實驗內容：Exercise: Implement deep networks for digit classification。利用深度網絡完成MNIST手寫數字數據庫中手寫數字的識別。即：用6萬個已標注數據（即：6萬張28*28的圖像塊（patches）），作為訓練數據集，然后把它輸入到棧式自編碼器中，它的第一層自編碼器提取出訓練數據集的一階特征，接着把這個一階特征輸入到第二層自編碼器中提取出二階特征，然后把把這個二階特征輸入到softmax分類器，再用原始數據的標簽和二階特征來訓練softmax分類器，最后利用BP算法對整個網絡的權重值進行微調以更好地學習數據，再用1萬個已標注數據（即：1萬張28*28的圖像塊（patches））作為測試數據集，用前面訓練好的softmax分類器對測試數據集進行分類，並計算分類的正確率。本節整個網絡結構如下：

注意：本節實驗與Deep Learning六：Softmax Regression_Exercise（斯坦福大學深度學習教程UFLDL）和Deep Learning七：Self-Taught Learning_Exercise（斯坦福大學深度學習教程UFLDL）的區別如下：

Deep Learning六：Softmax Regression_Exercise（斯坦福大學深度學習教程UFLDL）用原始數據本身作訓練集直接輸入softmax分類器分類，Deep Learning七：Self-Taught Learning_Exercise（斯坦福大學深度學習教程UFLDL）是從原始數據中提取特征作訓練集，再把其特征輸入softmax分類器分類，而本節實驗是從原始數據中提取特征作訓練集，再把其特征輸入softmax分類器分類，最后再用大量已標注數據對整個網絡的權重值進行微調，從而得到更好的分類結果。所以，最后結果能看出，本節實驗方法得到的准確率97.590%高於Deep Learning六：Softmax Regression_Exercise（斯坦福大學深度學習教程UFLDL）中得到的准確率92.640%。對於Deep Learning七：Self-Taught Learning_Exercise（斯坦福大學深度學習教程UFLDL），因為它的訓練樣本不一樣，故其准確率結果不能直接比較。但是，本節實驗對進行微調前后的准確率作了對比，見本節實驗結果。

4.本節方法適用范圍

在什么時候應用微調？通常僅在有大量已標注訓練數據的情況下使用。在這樣的情況下，微調能顯著提升分類器性能。然而，如果有大量未標注數據集（用於非監督特征學習/預訓練），卻只有相對較少的已標注訓練集，微調的作用非常有限，這時可用Deep Learning七：Self-Taught Learning_Exercise（斯坦福大學深度學習教程UFLDL）中介紹的方法。

5.一些matlab函數

[params, netconfig] = stack2params(stack)

　　是將stack層次的網絡參數（可能是多個參數）轉換成一個向量params，這樣有利用使用各種優化算法來進行優化操作。Netconfig中保存的是該網絡的相關信息，其中netconfig.inputsize表示的是網絡的輸入層節點的個數。netconfig.layersizes中的元素分別表示每一個隱含層對應節點的個數。

　　[ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, numClasses, netconfig,lambda, data, labels)

　　該函數內部實現整個網絡損失函數和損失函數對每個參數偏導的計算。其中損失函數是個實數值，當然就只有1個了，其計算方法是根據sofmax分類器來計算的，只需知道標簽值和softmax輸出層的值即可。而損失函數對所有參數的偏導卻有很多個，因此每個參數處應該就有一個偏導值，這些參數不僅包括了多個隱含層的，而且還包括了softmax那個網絡層的。其中softmax那部分的偏導是根據其公式直接獲得，而深度網絡層那部分這通過BP算法方向推理得到（即先計算每一層的誤差值，然后利用該誤差值計算參數w和b）。

　　stack = params2stack(params, netconfig)

　　和上面的函數功能相反，是吧一個向量參數按照深度網絡的結構依次展開。

　　[pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)

　　這個函數其實就是對輸入的data數據進行預測，看該data對應的輸出類別是多少。其中theta為整個網絡的參數（包括了分類器部分的網絡），numClasses為所需分類的類別，netconfig為網絡的結構參數。

　　[h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)

　　該函數是用來顯示矩陣A的，此時要求A中的每一列為一個權值，並且A是完全平方數。函數運行后會將A中每一列顯示為一個小的patch圖像，具體的有多少個patch和patch之間該怎么擺設是程序內部自動決定的。

　　struct：

　 s = sturct;表示創建一個結構數組s。

　　nargout:

　　表示函數輸出參數的個數。

　　save：

　　比如函數save('saves/step2.mat', 'sae1OptTheta');則要求當前目錄下有saves這個目錄，否則該語句會調用失敗的。

棧式自編碼神經網絡是一個由多層稀疏自編碼器組成的神經網絡，其前一層自編碼器的輸出作為其后一層自編碼器的輸入。

6.解疑

1. 解決了Deep Learning六：Softmax Regression_Exercise（斯坦福大學深度學習教程UFLDL）中提出的一個問題：

在softmaxExercise.m中有如下一句代碼：

images = loadMNISTImages('train-images.idx3-ubyte');
labels = loadMNISTLabels('train-labels.idx1-ubyte');
labels(labels==0) = 10; % 把標簽0變為標簽10，故labels的值是[1，10]，而原來是[0，9] ？為什么非要這樣？

為什么非要把原來的標簽0變為標簽10呢？搞不懂！

這個問題在本節實驗中的stackedAEExercise.m中也有：

trainLabels(trainLabels == 0) = 10; % 一直沒搞懂，為什么非要把標簽0變為10？

原因：為了方便后面預測分類結果時，能直接通過max函數返回的是大值的列號就是所預測的分類結果。如本節實驗中stackedAEPredict.m中的這句話：

[prob pred] = max(softmaxTheta*a{depth+1});

其中pred就是保存的所要預測的結果。

7. 疑問

1.如果我們后面的分類器不是用的softmax分類器，而是用的其它的，比如svm等，這個時候前面特征提取的網絡參數已經預訓練好了，用該參數是可以初始化前面的網絡，但是此時該怎么微調呢？

2.從代碼中，可以看出整個網絡的代價函數實際上就是softmax分類器的代價函數，這是怎么推導來的？

3.第二個隱含層的特征怎么顯示出來？這個問題折騰了我好幾天，然后最近還因為發一篇論文各種折騰，所以一直沒有靜下心來想這個問題。

為了解答這個問題，有必要把顯示每一層特征的函數display_network.m完全看懂，搞懂為什么不能按照用它顯示第一層特征的方式來顯示第二特征，所以在這里我詳細注釋了display_network.m的代碼，見下面。

首先，要清楚第二個隱含層特征顯示不出來的原因是什么，很多人（比如：Deep learning：二十四(stacked autoencoder練習)）以為是這個原因：因為display_network.m這個函數要求隱含層神經元數的均方根必須是整數，而在本節實驗中隱含層神經元數設置的是200，它不是一個整數的平方，所以不能顯示出來，但這只是一個程序編寫的問題，實際上這個問題很好解決，我們只需要把隱含層神經元數設置為196，就可以用按照顯示第一層特征的方式用函數display_network.m把它顯示出。但實際上並不是這個原因，具體我們可以從下面得到的結果證明，結果如下：

隱含層神經元數設置為196時，第一層特征可視化為：

隱含層神經元數設置為196時，第二層特征可視化為：

從第二層特征的可視化結果可看出，上面實現第二層可視化的方式肯定是錯的，因為它並沒有顯示出什么點、角等特征。

那么，它究竟為什么不能這樣顯示，究竟該怎么樣顯示呢？這實際上是一個深度學習的一個研究方向，具體可參考：Deep Learning論文筆記之（七）深度網絡高層特征可視化

8 代價函數

Ng沒有直接給出代價函數,但可能通過代碼看出他的代價函數.他的計算代價函數的代碼如下:

 1 depth = size(stack, 1);  % 隱藏層的數量
 2 a = cell(depth+1, 1);    % 輸入層和隱藏層的輸出值，即：輸入層的輸出值和隱藏層的激活值
 3 a{1} = data; % 輸入層的輸出值  
 4 Jweight = 0; % 權重懲罰項  
 5 m = size(data, 2); % 樣本數  
 6   
 7 % 計算隱藏層的激活值
 8 for i=2:numel(a)  
 9     a{i} = sigmoid(stack{i-1}.w*a{i-1}+repmat(stack{i-1}.b, [1 size(a{i-1}, 2)]));  
10     %Jweight = Jweight + sum(sum(stack{i-1}.w).^2);  
11 end  
12  
13 M = softmaxTheta*a{depth+1};  % a{depth+1}為最后一層隱藏層的輸出,此時M為輸入softmax層的數據,即它是未計算softmax層激活函數前的數值.
14 M = bsxfun(@minus, M, max(M, [], 1));  %防止下一步計算指數函數時溢出
15 M = exp(M);  
16 p = bsxfun(@rdivide, M, sum(M));  % p為softmax層的輸出,就是每種類別的分類概率
17   
18 Jweight = Jweight + sum(softmaxTheta(:).^2); % softmaxTheta是softmax層的權重參數
19 
20 % 計算softmax分類器的代價函數，為什么它就是整個模型的代價函數？
21 cost = -1/m .* groundTruth(:)'*log(p(:)) + lambda/2*Jweight;% 代價函數＝均方差項+權重衰減項（也叫：規則化項）

所以,其代價函數實際上就是softmax分類器的代價函數,而softmax的代價函數可見Softmax回歸,即代價函數為:

實驗步驟

1.初始化參數，加載MNIST手寫數字數據庫。

2.利用原始數據訓練第一個自編碼器，得到它的權重參數值sae1OptTheta，通過sae1OptTheta可得到原始數據的一階特征sae1Features。

3.利用一階特征sae1Features訓練第二個自編碼器，得到它的權重參數值sae2OptTheta，通過sae2OptTheta可得到原始數據的二階特征sae2Features。

4.利用二階特征sae2Features和原始數據的標簽來訓練softmax分類器，得到softmax分類器的權重參數saeSoftmaxOptTheta。

5.利用前面得到的所有權重參數sae1OptTheta、sae2OptTheta、saeSoftmaxOptTheta，得到微調前整個網絡的權重參數stackedAETheta，然后在利用原始數據及其標簽的基礎上通過BP算法對stackedAETheta進行微調，得到微調后的整個網絡的權重參數stackedAEOptTheta。

6.通過微調前整個網絡的權重參數stackedAETheta和微調后的整個網絡的權重參數stackedAEOptTheta，分別對測試數據進行分類，得到兩者的分類准確率。

運行結果：

Before Finetuning Test Accuracy: 92.140%

After Finetuning Test Accuracy: 97.590%

第一層特征顯示如下：

代碼

stackedAEExercise.m:

%% CS294A/CS294W Stacked Autoencoder Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  sstacked autoencoder exercise. You will need to complete code in
%  stackedAECost.m
%  You will also need to have implemented sparseAutoencoderCost.m and 
%  softmaxCost.m from previous exercises. You will need the initializeParameters.m
%  loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises.
%  
%  For the purpose of completing the assignment, you do not need to
%  change the code in this file. 
%
%%======================================================================
%% STEP 0: Here we provide the relevant parameters values that will
%  allow your sparse autoencoder to get good filters; you do not need to 
%  change the parameters below.
DISPLAY = true;
inputSize = 28 * 28;
numClasses = 10;
hiddenSizeL1 = 200;    % Layer 1 Hidden Size
hiddenSizeL2 = 200;    % Layer 2 Hidden Size
sparsityParam = 0.1;   % desired average activation of the hidden units.
                       % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
                       %  in the lecture notes). 
lambda = 3e-3;         % weight decay parameter       
beta = 3;              % weight of sparsity penalty term       

%%======================================================================
%% STEP 1: Load data from the MNIST database
%
%  This loads our training data from the MNIST database files.

% Load MNIST database files
trainData = loadMNISTImages('train-images.idx3-ubyte');
trainLabels = loadMNISTLabels('train-labels.idx1-ubyte');

trainLabels(trainLabels == 0) = 10; % 一直沒搞懂，為什么非要把標簽0變為10？ Remap 0 to 10 since our labels need to start from 1

%%======================================================================
%% STEP 2: Train the first sparse autoencoder
%  This trains the first sparse autoencoder on the unlabelled STL training
%  images.
%  If you've correctly implemented sparseAutoencoderCost.m, you don't need
%  to change anything here.


%  Randomly initialize the parameters
sae1Theta = initializeParameters(hiddenSizeL1, inputSize);

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the first layer sparse autoencoder, this layer has
%                an hidden size of "hiddenSizeL1"
%                You should store the optimal parameters in sae1OptTheta

addpath minFunc/;
options = struct;
options.Method = 'lbfgs';
options.maxIter = 400;
options.display = 'on';
[sae1OptTheta, cost] =  minFunc(@(p)sparseAutoencoderCost(p,...
    inputSize,hiddenSizeL1,lambda,sparsityParam,beta,trainData),sae1Theta,options);%訓練出第一層網絡的參數
save('saves/step2.mat', 'sae1OptTheta');

if DISPLAY
  W1 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);
  display_network(W1');
end




% -------------------------------------------------------------------------



%%======================================================================
%% STEP 3: Train the second sparse autoencoder
%  This trains the second sparse autoencoder on the first autoencoder
%  featurse.
%  If you've correctly implemented sparseAutoencoderCost.m, you don't need
%  to change anything here.
%  利用第一個稀疏自編碼器的權重參數sae1OptTheta，得到輸入數據的一階特征表示  

[sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ...
                                        inputSize, trainData);

%  Randomly initialize the parameters
sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1);

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the second layer sparse autoencoder, this layer has
%                an hidden size of "hiddenSizeL2" and an inputsize of
%                "hiddenSizeL1"
%
%                You should store the optimal parameters in sae2OptTheta

[sae2OptTheta, cost] =  minFunc(@(p)sparseAutoencoderCost(p,...
    hiddenSizeL1,hiddenSizeL2,lambda,sparsityParam,beta,sae1Features),sae2Theta,options);%訓練出第二層網絡的參數
save('saves/step3.mat', 'sae2OptTheta');

figure;
if DISPLAY
  W11 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);
  W12 = reshape(sae2OptTheta(1:hiddenSizeL2 * hiddenSizeL1), hiddenSizeL2, hiddenSizeL1);
  % TODO(zellyn): figure out how to display a 2-level network
%  display_network(log(W11' ./ (1-W11')) * W12');
%   W12_temp = W12(1:196,1:196);
%   display_network(W12_temp');
%   figure;
%   display_network(W12_temp');
end

% -------------------------------------------------------------------------


%%======================================================================
%% STEP 4: 用二階特征訓練softmax分類器 Train the softmax classifier
%  This trains the sparse autoencoder on the second autoencoder features.
%  If you've correctly implemented softmaxCost.m, you don't need
%  to change anything here.

%  利用第二個稀疏自編碼器的權重參數sae2OptTheta，得到輸入數據的二階特征表示  
[sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ...
                                        hiddenSizeL1, sae1Features);

%  Randomly initialize the parameters
saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);%這個參數拿來干嘛？計算softmaxCost函數嗎？可以舍去！
                                                              %因為softmaxCost函數在softmaxExercise練習中已經實現，並且已經證明其梯度計算是正確的！


%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the softmax classifier, the classifier takes in
%                input of dimension "hiddenSizeL2" corresponding to the
%                hidden layer size of the 2nd layer.
%
%                You should store the optimal parameters in saeSoftmaxOptTheta 
%
%  NOTE: If you used softmaxTrain to complete this part of the exercise,
%        set saeSoftmaxOptTheta = softmaxModel.optTheta(:);

softmaxLambda = 1e-4;
numClasses = 10;
softoptions = struct;
softoptions.maxIter = 400;
softmaxModel = softmaxTrain(hiddenSizeL2,numClasses,softmaxLambda,...
                            sae2Features,trainLabels,softoptions);
saeSoftmaxOptTheta = softmaxModel.optTheta(:);%得到softmax分類器的權重參數

save('saves/step4.mat', 'saeSoftmaxOptTheta');

% -------------------------------------------------------------------------



%%======================================================================
%% STEP 5: 微調 Finetune softmax model

% Implement the stackedAECost to give the combined cost of the whole model
% then run this cell.

% Initialize the stack using the parameters learned
stack = cell(2,1);
stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ...
                     hiddenSizeL1, inputSize);
stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1);
stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ...
                     hiddenSizeL2, hiddenSizeL1);
stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2);

% Initialize the parameters for the deep model
[stackparams, netconfig] = stack2params(stack);%把stack層（即：兩個隱藏層）的權重參數變為一個向量stackparams
stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ];% 得到微調前整個網絡參數向量stackedAETheta，它包括softmax分類器那部分的參數向量saeSoftmaxOptTheta，且分類器那部分的參數放前面

%% ---------------------- YOUR CODE HERE  ---------------------------------
%  Instructions: Train the deep network, hidden size here refers to the '
%                dimension of the input to the classifier, which corresponds 
%                to "hiddenSizeL2".
%
%  用BP算法微調，得到微調后的整個網絡參數stackedAEOptTheta

[stackedAEOptTheta, cost] =  minFunc(@(p)stackedAECost(p,inputSize,hiddenSizeL2,...
                         numClasses, netconfig,lambda, trainData, trainLabels),...
                        stackedAETheta,options);%訓練出第三層網絡的參數
save('saves/step5.mat', 'stackedAEOptTheta');

figure;
if DISPLAY
  optStack = params2stack(stackedAEOptTheta(hiddenSizeL2*numClasses+1:end), netconfig);
  W11 = optStack{1}.w;
  W12 = optStack{2}.w;
  % TODO(zellyn): figure out how to display a 2-level network
  % display_network(log(1 ./ (1-W11')) * W12');
end



% -------------------------------------------------------------------------



%%======================================================================
%% STEP 6: Test 
%  Instructions: You will need to complete the code in stackedAEPredict.m
%                before running this part of the code
%

% Get labelled test images
% Note that we apply the same kind of preprocessing as the training set
testData = loadMNISTImages('t10k-images.idx3-ubyte');
testLabels = loadMNISTLabels('t10k-labels.idx1-ubyte');

testLabels(testLabels == 0) = 10; % Remap 0 to 10

[pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...
                          numClasses, netconfig, testData);

acc = mean(testLabels(:) == pred(:));
fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100);

[pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...
                          numClasses, netconfig, testData);

acc = mean(testLabels(:) == pred(:));
fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100);

% Accuracy is the proportion of correctly classified images
% The results for our implementation were:
%
% Before Finetuning Test Accuracy: 87.7%
% After Finetuning Test Accuracy:  97.6%
%
% If your values are too low (accuracy less than 95%), you should check 
% your code for errors, and make sure you are training on the 
% entire data set of 60000 28x28 training images 
% (unless you modified the loading code, this should be the case)

stackedAECost.m

  1 function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
  2                                               numClasses, netconfig, ...
  3                                               lambda, data, labels)
  4 % 計算整個模型的代價函數及其梯度 
  5 % 注意：完成這個函數后最好用checkStackedAECost函數檢查梯度計算是否正確  
  6 
  7 % stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
  8 % and returns cost and gradient using a stacked autoencoder model. Used for
  9 % finetuning.
 10                                          
 11 % theta: trained weights from the autoencoder
 12 % visibleSize: the number of input units
 13 % hiddenSize:  the number of hidden units *at the 2nd layer*
 14 % numClasses:  the number of categories
 15 % netconfig:   the network configuration of the stack
 16 % lambda:      the weight regularization penalty
 17 % data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example. 
 18 % labels: A vector containing labels, where labels(i) is the label for the
 19 % i-th training example
 20 
 21 
 22 %% Unroll softmaxTheta parameter
 23 
 24 % We first extract the part which compute the softmax gradient
 25 softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);%從整個網絡參數向量中提取出softmax分類器部分的參數，並以矩陣表示
 26 
 27 % Extract out the "stack"
 28 stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);%從整個網絡參數向量中提取出隱藏層部分的參數，並以結構表示
 29 
 30 % You will need to compute the following gradients
 31 softmaxThetaGrad = zeros(size(softmaxTheta));% softmaxTheta的梯度
 32 stackgrad = cell(size(stack));               % stack的梯度
 33 for d = 1:numel(stack)
 34     stackgrad{d}.w = zeros(size(stack{d}.w));
 35     stackgrad{d}.b = zeros(size(stack{d}.b));
 36 end
 37 
 38 cost = 0; % You need to compute this
 39 
 40 % You might find these variables useful
 41 M = size(data, 2);
 42 groundTruth = full(sparse(labels, 1:M, 1));
 43 
 44 
 45 %% --------------------------- YOUR CODE HERE -----------------------------
 46 %  Instructions: Compute the cost function and gradient vector for 
 47 %                the stacked autoencoder.
 48 %
 49 %                You are given a stack variable which is a cell-array of
 50 %                the weights and biases for every layer. In particular, you
 51 %                can refer to the weights of Layer d, using stack{d}.w and
 52 %                the biases using stack{d}.b . To get the total number of
 53 %                layers, you can use numel(stack).
 54 %
 55 %                The last layer of the network is connected to the softmax
 56 %                classification layer, softmaxTheta.
 57 %
 58 %                You should compute the gradients for the softmaxTheta,
 59 %                storing that in softmaxThetaGrad. Similarly, you should
 60 %                compute the gradients for each layer in the stack, storing
 61 %                the gradients in stackgrad{d}.w and stackgrad{d}.b
 62 %                Note that the size of the matrices in stackgrad should
 63 %                match exactly that of the size of the matrices in stack.
 64 %
 65 
 66 depth = size(stack, 1);  % 隱藏層的數量
 67 a = cell(depth+1, 1);    % 輸入層和隱藏層的輸出值，即：輸入層的輸出值和隱藏層的激活值
 68 a{1} = data; % 輸入層的輸出值  
 69 Jweight = 0; % 權重懲罰項  
 70 m = size(data, 2); % 樣本數  
 71   
 72 % 計算隱藏層的激活值
 73 for i=2:numel(a)  
 74     a{i} = sigmoid(stack{i-1}.w*a{i-1}+repmat(stack{i-1}.b, [1 size(a{i-1}, 2)]));  
 75     %Jweight = Jweight + sum(sum(stack{i-1}.w).^2);  
 76 end  
 77   
 78 M = softmaxTheta*a{depth+1};  
 79 M = bsxfun(@minus, M, max(M, [], 1));  %防止下一步計算指數函數時溢出
 80 M = exp(M);  
 81 p = bsxfun(@rdivide, M, sum(M));  
 82   
 83 Jweight = Jweight + sum(softmaxTheta(:).^2); 
 84 
 85 % 計算softmax分類器的代價函數，為什么它就是整個模型的代價函數？
 86 cost = -1/m .* groundTruth(:)'*log(p(:)) + lambda/2*Jweight;% 代價函數＝均方差項+權重衰減項（也叫：規則化項）   
 87   
 88 %計算softmax分類器代價函數的梯度，即輸出層的梯度  
 89 softmaxThetaGrad = -1/m .* (groundTruth - p)*a{depth+1}' + lambda*softmaxTheta;  
 90   
 91 delta = cell(depth+1, 1);  %隱藏層和輸出層的殘差 
 92 
 93 %計算輸出層的殘差  
 94 delta{depth+1} = -softmaxTheta' * (groundTruth - p) .* a{depth+1} .* (1-a{depth+1});  
 95   
 96 %計算隱藏層的殘差
 97 for i=depth:-1:2  
 98     delta{i} = stack{i}.w'*delta{i+1}.*a{i}.*(1-a{i});  
 99 end  
100   
101 % 通過前面得到的輸出層和隱藏層的殘差，計算隱藏層參數的梯度
102 for i=depth:-1:1  
103     stackgrad{i}.w = 1/m .* delta{i+1}*a{i}';  
104     stackgrad{i}.b = 1/m .* sum(delta{i+1}, 2);  
105 end  
106 
107 % -------------------------------------------------------------------------
108 
109 %% Roll gradient vector
110 grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];
111 
112 end
113 
114 
115 % You might find this useful
116 function sigm = sigmoid(x)
117     sigm = 1 ./ (1 + exp(-x));
118 end

stackedAEPredict.m

 1 function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)
 2                                          
 3 % stackedAEPredict: Takes a trained theta and a test data set,
 4 % and returns the predicted labels for each example.
 5                                          
 6 % theta: trained weights from the autoencoder
 7 % visibleSize: the number of input units
 8 % hiddenSize:  the number of hidden units *at the 2nd layer*
 9 % numClasses:  the number of categories
10 % data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example. 
11 
12 % Your code should produce the prediction matrix 
13 % pred, where pred(i) is argmax_c P(y(c) | x(i)).
14  
15 %% Unroll theta parameter
16 
17 % We first extract the part which compute the softmax gradient
18 softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);
19 
20 % Extract out the "stack"
21 stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);
22 
23 %% ---------- YOUR CODE HERE --------------------------------------
24 %  Instructions: Compute pred using theta assuming that the labels start 
25 %                from 1.
26 
27 depth = numel(stack);  
28 a = cell(depth+1);  
29 a{1} = data;  
30 m = size(data, 2);  
31   
32 for i=2:depth+1  
33     a{i} = sigmoid(stack{i-1}.w*a{i-1}+ repmat(stack{i-1}.b, [1 m]));  
34 end  
35   
36 [prob pred] = max(softmaxTheta*a{depth+1}); 
37 
38 
39 
40 
41 % -----------------------------------------------------------
42 
43 end
44 
45 
46 % You might find this useful
47 function sigm = sigmoid(x)
48     sigm = 1 ./ (1 + exp(-x));
49 end

display_network.m

  1 function [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)
  2 % This function visualizes filters in matrix A. Each column of A is a
  3 % filter. We will reshape each column into a square image and visualizes
  4 % on each cell of the visualization panel. 
  5 % All other parameters are optional, usually you do not need to worry
  6 % about it.
  7 % opt_normalize:whether we need to normalize the filter so that all of
  8 % them can have similar contrast. Default value is true.
  9 % opt_graycolor: whether we use gray as the heat map. Default is true.
 10 % cols: how many columns are there in the display. Default value is the
 11 % squareroot of the number of columns in A.
 12 % opt_colmajor: you can switch convention to row major for A. In that
 13 % case, each row of A is a filter. Default value is false.
 14 
 15 % opt_normalize:是否需要歸一化的參數。真：每個圖像塊歸一化（即：每個圖像塊元素值除以該圖像塊中像素值絕對值的最大值）；
 16 %                                   假：整幅大圖像一起歸一化（即：每個圖像塊元素值除以整幅圖像中像素值絕對值的最大值）。默認為真。
 17 % opt_graycolor: 該參數決定是否顯示灰度圖。
 18 %                真：顯示灰度圖；假：不顯示灰度圖。默認為真。
 19 % cols:   該參數決定將要顯示的整幅大圖像每一行中小圖像塊的個數。默認為A列數的均方根。
 20 % opt_colmajor:該參數決定將要顯示的整個大圖像中每個小圖像塊是按行從左到右依次排列，還是按列從上到下依次排列
 21 %              真：整個大圖像由每個小圖像塊按列從上到下依次排列組成；
 22 %              假：整個大圖像由每個小圖像塊按行從左到右依次排列組成。默認為假。
 23 
 24 warning off all  %關閉警告
 25 
 26 % 參數的默認值
 27 if ~exist('opt_normalize', 'var') || isempty(opt_normalize)
 28     opt_normalize= true;
 29 end
 30 
 31 if ~exist('opt_graycolor', 'var') || isempty(opt_graycolor)
 32     opt_graycolor= true;
 33 end
 34 
 35 if ~exist('opt_colmajor', 'var') || isempty(opt_colmajor)
 36     opt_colmajor = false;
 37 end
 38 
 39 % 整幅大圖像或整個數據0均值化  rescale
 40 A = A - mean(A(:));
 41 
 42 if opt_graycolor, colormap(gray); end  %如果要顯示灰度圖，就把該圖形的色圖（即：colormap）設置為gray
 43 
 44 % 計算整幅大圖像中每一行中小圖像塊的個數和第一列中小圖像塊的個數，即列數n和行數m  compute rows, cols
 45 [L M]=size(A); % M即為小圖像塊的總數
 46 sz=sqrt(L);  % 每個小圖像塊內像素點的行數和列數
 47 buf=1;         % 用於把每個小圖像塊隔開，即小圖像塊之間的緩沖區。每個小圖像塊的邊緣都是一行和一列像素值為-1的像素點。
 48 if ~exist('cols', 'var') % 如變量cols不存在時
 49     if floor(sqrt(M))^2 ~= M        % 如果M的均方根不是整數，列數n就先暫時取值為M均方根的向右取整
 50         n=ceil(sqrt(M));
 51         while mod(M, n)~=0 && n<1.2*sqrt(M), n=n+1; end % 當M不是n的整數倍且n小於1.2倍的M均方根值時，列數n加1
 52         m=ceil(M/n);                                    % 行數m取值為小圖像塊總數M除以大圖像中每一行中小圖像塊的個數n，再向右取整
 53     else
 54         n=sqrt(M);                  % 如果M的均方根是整數，那m和n都取值為M的均方根
 55         m=n;
 56     end
 57 else
 58     n = cols;           % 如果變量cols存在，就直接令列數n等於cols，行數m為M除以n后向右取整
 59     m = ceil(M/n);
 60 end
 61 
 62 array=-ones(buf+m*(sz+buf),buf+n*(sz+buf));%要保證每個小圖像塊的四周邊緣都是單行和單列像素值為-1的像素點。所以得到這個目標矩陣
 63 
 64 if ~opt_graycolor  % 如果分隔區不顯示黑色，而顯示灰度，那就要是要保證：每個小圖像塊的四周邊緣都是單行和單列像素值為-0.1的像素點
 65     array = 0.1.* array;
 66 end
 67 
 68 
 69 if ~opt_colmajor   % 如果opt_colmajor為假，即：整個大圖像由每個小圖像塊按行從左到右依次排列組成
 70     k=1;            %第k個小圖像塊
 71     for i=1:m       % 行數
 72         for j=1:n   % 列數
 73             if k>M, 
 74                 continue; 
 75             end
 76             clim=max(abs(A(:,k)));
 77             if opt_normalize
 78                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim; %從這可看是n是列數，m是行數
 79             else
 80                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/max(abs(A(:)));
 81             end
 82             k=k+1;
 83         end
 84     end
 85 else        % 如果opt_colmajor為真，即：整個大圖像由每個小圖像塊按列從上到下依次排列組成
 86     k=1;
 87     for j=1:n          %列數
 88         for i=1:m      %行數
 89             if k>M, 
 90                 continue; 
 91             end
 92             clim=max(abs(A(:,k)));
 93             if opt_normalize
 94                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim;
 95             else
 96                 array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz);
 97             end
 98             k=k+1;
 99         end
100     end
101 end
102 
103 if opt_graycolor   % 要顯示灰度圖，此時每個小圖像塊的四周邊緣都是單行和單列像素值為-1的像素點。
104     h=imagesc(array,'EraseMode','none',[-1 1]); %圖形的EraseMode屬性設置為none：即為在該圖像上不做任何擦除，直接在原來圖形上繪制
105 else              % 不顯示灰度圖，此時每個小圖像塊的四周邊緣都是單行和單列像素值為-0.1的像素點。
106     h=imagesc(array,'EraseMode','none',[-1 1]);
107 end
108 axis image off  %去掉坐標軸
109 
110 drawnow;  %刷新屏幕，使圖像可一點一點地顯示
111 
112 warning on all  %打開警告

參考資料：

Deep learning：二十四(stacked autoencoder練習)

http://blog.csdn.net/freeliao/article/details/19618855

http://www.tuicool.com/articles/MnANFn

UFLDL教程

……

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Deep Learning 5_深度學習UFLDL教程：PCA and Whitening_Exercise（斯坦福大學深度學習教程） Deep Learning 12_深度學習UFLDL教程：Sparse Coding_exercise（斯坦福大學深度學習教程） Deep Learning 9_深度學習UFLDL教程：linear decoder_exercise（斯坦福大學深度學習教程） Deep Learning 6_深度學習UFLDL教程：Softmax Regression_Exercise（斯坦福大學深度學習教程） Deep Learning 10_深度學習UFLDL教程：Convolution and Pooling_exercise（斯坦福大學深度學習教程） Deep Learning 11_深度學習UFLDL教程：數據預處理（斯坦福大學深度學習教程） Deep Learning 1_深度學習UFLDL教程：Sparse Autoencoder練習（斯坦福大學深度學習教程）深度學習概述教程--Deep Learning Overview 斯坦福大學機器學習筆記及代碼（一） Deep Learning 深度學習學習教程網站集錦