Deep learning：二十九(Sparse coding練習)

本文轉載自查看原文 2013-04-16 16:41 18947 機器學習/ Deep Learning

　　前言

　　本節主要是練習下斯坦福DL網絡教程UFLDL關於Sparse coding那一部分，具體的網頁教程參考：Exercise:Sparse Coding。該實驗的主要內容是從2w個自然圖像的patches中分別采用sparse coding和拓撲的sparse coding方法進行學習，並觀察學習到的這些圖像基圖像的特征。訓練數據時自然圖片IMAGE，在給出的教程網站上有。

　　實驗基礎

　　Sparse coding的主要是思想學習輸入數據集”基數據”，一旦獲得這些”基數據”，輸入數據集中的每個數據都可以用這些”基數據”的線性組合表示，而稀疏性則體現在這些線性組合系數是系數的，即大部分的值都為0。很顯然，這些”基數據”的尺寸和原始輸入數據的尺寸是相同的，另外”基數據”的個數通常要比每個樣本的維數大。最簡單的理解可以看前面博文提到過的公式：

　　其中的輸入數據x可以分解成基Ф的線性組合，ai為組合系數。不過那只是針對一個數據而已，而在ML領域中都是大數據，因此下面來考慮樣本是矩陣的形式。在前面博文Deep learning：二十六(Sparse coding簡單理解)中我們已經介紹過sparse coding系統非拓撲時的代價函數為：

　　拓撲結構時的代價函數如下：

　　在訓練階段我們的目的是要通過優化算法求出最佳的參數A，因為A就是我們的”基數據”集。但是以上2個代價函數表達式中都有兩個未知的參數矩陣，即A和s，所以不能采用簡單的優化方法。此時一般的優化思想為交叉優化，即先固定一個A來優化s，然后固定該s來優化A，以此類推，等迭代步驟到達預設值時就停止。而在優化過程中首先要解決的就是代價函數對參數矩陣A和s的求導問題。

　　此時的求導涉及到了矩陣范數的求導，一般有2種方法，第一種是將求導問題轉換到矩陣的跡的求導，可以參考前面博文Deep learning：二十七(Sparse coding中關於矩陣的范數求導)。第二種就是利用BP的思想來求，可以參考：Deep learning：二十八(使用BP算法思想求解Sparse coding中矩陣范數導數)一文。

　　代價函數關於權值矩陣A的導數如下（拓撲和非拓撲時結果是一樣的，因為此時這2種情況下代價函數關於A是沒區別的）：

　　非拓撲結構下代價函數關於s的導數如下：

　　拓撲Sparse coding下代價函數關於s的導數為：

　　關於本程序的一點注釋：

如果按照上面公式的和我們的理解，A是由學習到的基向量構成，S為每個樣本在該基分解下的系數。在這里表示前提下，可以這樣定義：A為n*k維，其中的每一列表示的是訓練出來的基向量，S是k*m,其中的每一列是對應輸入樣本的sparse coding分解系數，當然了，此時的X是n*m了。即每一列表示的是一個樣本數據。如果我們反過來表示（雖然這樣理解不對，這里只是用不同的表示方法矩陣而已），即A表示輸入數據X的分解系數（即編碼值），而S是原始數據集訓練出來的基的構成的，那么此時關於A,S,X三者的維度可以這樣定義和解釋：現假設有m個樣本X，每個樣本是個n維的向量，即X為m*n維的矩陣，需要用sparse coding學習k個特征，使得代價函數值最小，則其中的A是m*k維的，A中的第i行表示第i個樣本分解后的系數值，S是k*n維的，S的第i行表示所學習到的第i個基。當然了，在本次實驗和以后類似情況下我們還是用正確的版本，即第一種表示。
在matlab中，右除矩陣A和右乘inv(A)雖然在定義上式一樣的，但是兩者運行的結果有可能不同，右除的精度要高些。
注意拓撲結構下代價函數對s導數公式中的最后一項是點乘符號，也就是矩陣中對應元素的相乘，如果弄成了普通的矩陣乘法則就很難通過gradient checking了。
本程序訓練樣本IMAGE原圖片尺寸512*512，共10張，從這10張大圖片中提取2w張8*8的小patch圖片，這些圖片部分顯示如下：

　　一些Matlab函數：

　　circshift:

　　該函數是將矩陣循環平移的函數，比如說B = circshift(A,shiftsize)是將矩陣A按照shiftsize的方式左右平移，一般hiftsize為一個多維的向量，第一個元素表示上下方向移動（更准確的說是在第一個維度上移動，這里只是考慮是2維矩陣的情況，后面的類似），如果為正表示向下移，第二個元素表示左右方向移動，如果向右表示向右移動。

　　rndperm：

　　該函數是隨機產生一個行向量，比如randperm(n)產生一個n維的行向量，向量元素值為1~n，隨機選取且不重復。而randperm(n,k)表示產生一個長為k的行向量，其元素也是在1到n之間，不能有重復。

　　questdlg：

　　button = questdlg('qstring','title','str1','str2','str3',default)，這是一個對話框，對話框中的內容用qstring表示，標題為title，然后后面3個分別為對應yes,no,cancel按鈕，最后的參數default為默認的對應按鈕。

　　實驗結果：

　　交叉優化參數中，給定s優化A時，由於A有直接的解析解，所以不需要通過lbfgs等優化算法求得，通過令代價函數對A的導函數為0，可以得到解析解為：

　　注意單位矩陣前一定要有個系數（即樣本個數），不然在程序中直接用該方法求得的A是通過不了驗證。

　　此時學習到的非拓撲結果為：

　　上面的結果有點難看，采用的是16*16大小的patch，而非8*8的。　　

　　采用cg優化，256個16*16大小的patch，其結果如下：

　　如果將patch改為8*8,121個特征點，結果如下（這個比較像樣）：

　　如果用lbfgs，256個16*16的，其結果如下（效果很差，說明優化方法對結果有影響）：

　　實驗部分代碼及注釋：

　　sparseCodeingExercise.m:

%% CS294A/CS294W Sparse Coding Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  sparse coding exercise. In this exercise, you will need to modify
%  sparseCodingFeatureCost.m and sparseCodingWeightCost.m. You will also
%  need to modify this file, sparseCodingExercise.m slightly.

% Add the paths to your earlier exercises if necessary
% addpath /path/to/solution

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

addpath minFunc;
numPatches = 20000;   % number of patches
numFeatures = 256;    % number of features to learn
patchDim = 16;         % patch dimension
visibleSize = patchDim * patchDim; %單通道灰度圖，64維，學習121個特征

% dimension of the grouping region (poolDim x poolDim) for topographic sparse coding
poolDim = 3;

% number of patches per batch
batchNumPatches = 2000; %分成10個batch

lambda = 5e-5;  % L1-regularisation parameter (on features)
epsilon = 1e-5; % L1-regularisation epsilon |x| ~ sqrt(x^2 + epsilon)
gamma = 1e-2;   % L2-regularisation parameter (on basis)

%%======================================================================
%% STEP 1: Sample patches

images = load('IMAGES.mat');
images = images.IMAGES;
patches = sampleIMAGES(images, patchDim, numPatches);
display_network(patches(:, 1:64));

%%======================================================================
%% STEP 3: Iterative optimization
%  Once you have implemented the cost functions, you can now optimize for
%  the objective iteratively. The code to do the iterative optimization 
%  using mini-batching and good initialization of the features has already
%  been included for you. 
% 
%  However, you will still need to derive and fill in the analytic solution 
%  for optimizing the weight matrix given the features. 
%  Derive the solution and implement it in the code below, verify the
%  gradient as described in the instructions below, and then run the
%  iterative optimization.

% Initialize options for minFunc
options.Method = 'cg';
options.display = 'off';
options.verbose = 0;

% Initialize matrices
weightMatrix = rand(visibleSize, numFeatures);%64*121
featureMatrix = rand(numFeatures, batchNumPatches);%121*2000

% Initialize grouping matrix
assert(floor(sqrt(numFeatures)) ^2 == numFeatures, 'numFeatures should be a perfect square');
donutDim = floor(sqrt(numFeatures));
assert(donutDim * donutDim == numFeatures,'donutDim^2 must be equal to numFeatures');

groupMatrix = zeros(numFeatures, donutDim, donutDim);%121*11*11
groupNum = 1;
for row = 1:donutDim
    for col = 1:donutDim 
        groupMatrix(groupNum, 1:poolDim, 1:poolDim) = 1;%poolDim=3
        groupNum = groupNum + 1;
        groupMatrix = circshift(groupMatrix, [0 0 -1]);
    end
    groupMatrix = circshift(groupMatrix, [0 -1, 0]);
end
groupMatrix = reshape(groupMatrix, numFeatures, numFeatures);%121*121

if isequal(questdlg('Initialize grouping matrix for topographic or non-topographic sparse coding?', 'Topographic/non-topographic?', 'Non-topographic', 'Topographic', 'Non-topographic'), 'Non-topographic')
    groupMatrix = eye(numFeatures);%非拓撲結構時的groupMatrix矩陣
end

% Initial batch
indices = randperm(numPatches);%1*20000
indices = indices(1:batchNumPatches);%1*2000
batchPatches = patches(:, indices);                           

fprintf('%6s%12s%12s%12s%12s\n','Iter', 'fObj','fResidue','fSparsity','fWeight');
warning off;
for iteration = 1:200   
  %  iteration = 1;
    error = weightMatrix * featureMatrix - batchPatches;
    error = sum(error(:) .^ 2) / batchNumPatches;  %說明重構誤差需要考慮樣本數
    fResidue = error;
    num_batches = size(batchPatches,2);
    R = groupMatrix * (featureMatrix .^ 2);
    R = sqrt(R + epsilon);    
    fSparsity = lambda * sum(R(:));    %稀疏項和權值懲罰項不需要考慮樣本數
    
    fWeight = gamma * sum(weightMatrix(:) .^ 2);
    
    %上面的那些權值都是隨機初始化的
    fprintf('  %4d  %10.4f  %10.4f  %10.4f  %10.4f\n', iteration, fResidue+fSparsity+fWeight, fResidue, fSparsity, fWeight)
               
    % Select a new batch
    indices = randperm(numPatches);
    indices = indices(1:batchNumPatches);
    batchPatches = patches(:, indices);                    
    
    % Reinitialize featureMatrix with respect to the new
    % 對featureMatrix重新初始化，按照網頁教程上介紹的方法進行
    featureMatrix = weightMatrix' * batchPatches;
    normWM = sum(weightMatrix .^ 2)';
    featureMatrix = bsxfun(@rdivide, featureMatrix, normWM); 
    
    % Optimize for feature matrix    
    options.maxIter = 20;
    %給定權值初始值，優化特征值矩陣
    [featureMatrix, cost] = minFunc( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix), ...
                                           featureMatrix(:), options);
    featureMatrix = reshape(featureMatrix, numFeatures, batchNumPatches);                                      
    weightMatrix = (batchPatches*featureMatrix')/(gamma*num_batches*eye(size(featureMatrix,1))+featureMatrix*featureMatrix');
    [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix);
          
end
    figure;
    display_network(weightMatrix);

　　sparseCodingWeightCost.m:

function [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures,  patches, gamma, lambda, epsilon, groupMatrix)
%sparseCodingWeightCost - given the features in featureMatrix, 
%                         computes the cost and gradient with respect to
%                         the weights, given in weightMatrix
% parameters
%   weightMatrix  - the weight matrix. weightMatrix(:, c) is the cth basis
%                   vector.
%   featureMatrix - the feature matrix. featureMatrix(:, c) is the features
%                   for the cth example
%   visibleSize   - number of pixels in the patches
%   numFeatures   - number of features
%   patches       - patches
%   gamma         - weight decay parameter (on weightMatrix)
%   lambda        - L1 sparsity weight (on featureMatrix)
%   epsilon       - L1 sparsity epsilon
%   groupMatrix   - the grouping matrix. groupMatrix(r, :) indicates the
%                   features included in the rth group. groupMatrix(r, c)
%                   is 1 if the cth feature is in the rth group and 0
%                   otherwise.

    if exist('groupMatrix', 'var')
        assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
    else
        groupMatrix = eye(numFeatures);%非拓撲的sparse coding中，相當於groupMatrix為單位對角矩陣
    end

    numExamples = size(patches, 2);%測試代碼時為5

    weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);%其實傳入進來的就是這些東西
    featureMatrix = reshape(featureMatrix, numFeatures, numExamples);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------    
    %% 求目標的代價函數
    delta = weightMatrix*featureMatrix-patches;
    fResidue = sum(sum(delta.^2))./numExamples;%重構誤差
    fWeight = gamma*sum(sum(weightMatrix.^2));%防止基內元素值過大
%     sparsityMatrix = sqrt(groupMatrix*(featureMatrix.^2)+epsilon);
%     fSparsity = lambda*sum(sparsityMatrix(:)); %對特征系數性的懲罰值
%     cost = fResidue+fWeight+fSparsity; %目標的代價函數
    cost = fResidue+fWeight;
    
    %% 求目標代價函數的偏導函數
    grad = (2*weightMatrix*featureMatrix*featureMatrix'-2*patches*featureMatrix')./numExamples+2*gamma*weightMatrix;
    grad = grad(:);
   
end

　　sparseCodingFeatureCost .m:

function [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix)
%sparseCodingFeatureCost - given the weights in weightMatrix,
%                          computes the cost and gradient with respect to
%                          the features, given in featureMatrix
% parameters
%   weightMatrix  - the weight matrix. weightMatrix(:, c) is the cth basis
%                   vector.
%   featureMatrix - the feature matrix. featureMatrix(:, c) is the features
%                   for the cth example
%   visibleSize   - number of pixels in the patches
%   numFeatures   - number of features
%   patches       - patches
%   gamma         - weight decay parameter (on weightMatrix)
%   lambda        - L1 sparsity weight (on featureMatrix)
%   epsilon       - L1 sparsity epsilon
%   groupMatrix   - the grouping matrix. groupMatrix(r, :) indicates the
%                   features included in the rth group. groupMatrix(r, c)
%                   is 1 if the cth feature is in the rth group and 0
%                   otherwise.

    isTopo = 1;
%     L = size(groupMatrix,1);
%     [K M] = size(featureMatrix);
    if exist('groupMatrix', 'var')
        assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
        if(isequal(groupMatrix,eye(numFeatures)));
            isTopo = 0;
        end
    else
        groupMatrix = eye(numFeatures);
         isTopo = 0;
    end
    
    numExamples = size(patches, 2);
    weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
    featureMatrix = reshape(featureMatrix, numFeatures, numExamples);

    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   features given in featureMatrix.     
    %   You may wish to write the non-topographic version, ignoring
    %   the grouping matrix groupMatrix first, and extend the 
    %   non-topographic version to the topographic version later.
    % -------------------- YOUR CODE HERE --------------------
    
    
    %% 求目標的代價函數
    delta = weightMatrix*featureMatrix-patches;
    fResidue = sum(sum(delta.^2))./numExamples;%重構誤差
%     fWeight = gamma*sum(sum(weightMatrix.^2));%防止基內元素值過大
    sparsityMatrix = sqrt(groupMatrix*(featureMatrix.^2)+epsilon);
    fSparsity = lambda*sum(sparsityMatrix(:)); %對特征系數性的懲罰值
%     cost = fResidue++fSparsity+fWeight;%此時A為常量，可以不用
    cost = fResidue++fSparsity;

    %% 求目標代價函數的偏導函數
    gradResidue = (-2*weightMatrix'*patches+2*weightMatrix'*weightMatrix*featureMatrix)./numExamples;
  
    % 非拓撲結構時：
    if ~isTopo
        gradSparsity = lambda*(featureMatrix./sparsityMatrix);
    end
    
    % 拓撲結構時
    if isTopo
%         gradSparsity = lambda*groupMatrix'*(groupMatrix*(featureMatrix.^2)+epsilon).^(-0.5).*featureMatrix;%一定要小心最后一項是內積乘法
        gradSparsity = lambda*groupMatrix'*(groupMatrix*(featureMatrix.^2)+epsilon).^(-0.5).*featureMatrix;%一定要小心最后一項是內積乘法
    end
    grad = gradResidue+gradSparsity;
    grad = grad(:);
    
end

　　sampleIMAGES.m:

function patches = sampleIMAGES(images, patchsize,numpatches)
% sampleIMAGES
% Returns 10000 patches for training

% load IMAGES;    % load images from disk 

%patchsize = 8;  % we'll use 8x8 patches 
%numpatches = 10000;

% Initialize patches with zeros.  Your code will fill in this matrix--one
% column per patch, 10000 columns. 
patches = zeros(patchsize*patchsize, numpatches);

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Fill in the variable called "patches" using data 
%  from IMAGES.  
%  
%  IMAGES is a 3D array containing 10 images
%  For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image,
%  and you can type "imagesc(IMAGES(:,:,6)), colormap gray;" to visualize
%  it. (The contrast on these images look a bit off because they have
%  been preprocessed using using "whitening."  See the lecture notes for
%  more details.) As a second example, IMAGES(21:30,21:30,1) is an image
%  patch corresponding to the pixels in the block (21,21) to (30,30) of
%  Image 1
for imageNum = 1:10%在每張圖片中隨機選取1000個patch，共10000個patch
    [rowNum colNum] = size(images(:,:,imageNum));
    for patchNum = 1:2000%實現每張圖片選取1000個patch
        xPos = randi([1,rowNum-patchsize+1]);
        yPos = randi([1, colNum-patchsize+1]);
        patches(:,(imageNum-1)*2000+patchNum) = reshape(images(xPos:xPos+patchsize-1,yPos:yPos+patchsize-1,...
                                                        imageNum),patchsize*patchsize,1);
    end
end


%% ---------------------------------------------------------------
% For the autoencoder to work well we need to normalize the data
% Specifically, since the output of the network is bounded between [0,1]
% (due to the sigmoid activation function), we have to make sure 
% the range of pixel values is also bounded between [0,1]
% patches = normalizeData(patches);

end


%% ---------------------------------------------------------------
function patches = normalizeData(patches)

% Squash data to [0.1, 0.9] since we use sigmoid as the activation
% function in the output layer

% Remove DC (mean of images). 
patches = bsxfun(@minus, patches, mean(patches));

% Truncate to +/-3 standard deviations and scale to -1 to 1
pstd = 3 * std(patches(:));
patches = max(min(patches, pstd), -pstd) / pstd;%因為根據3sigma法則，95%以上的數據都在該區域內
                                                % 這里轉換后將數據變到了-1到1之間

% Rescale from [-1,1] to [0.1,0.9]
patches = (patches + 1) * 0.4 + 0.1;

end

　　實驗總結：

　　拓撲結構的Sparse coding未完成，跑出來沒有效果，還望有人指導下。

　　2013.5.6：

　　已解決非拓撲下的Sparse coding，那時候出現的問題是因為在代價函數中，重構誤差那一項沒有除樣本數（下面博文回復中網友給的提示），導致代價函數，導數，以及A的解析解都相應錯了。

　　但是拓撲Sparse Coding依舊沒有訓練出來，因為訓練過程中代價函數的值不是遞減的，而是基本無規律。

　　2013.5.14：

　　基本解決了拓撲下的Sparse coding。以前訓練不出特征來主要原因是在sampleIMAGES.m里沒有將最后的patches歸一化注釋掉（個人猜測：采樣前的大圖片是經過白化了的，所以如果后面繼續用那個帶誤差的歸一化，可能引入更大的誤差，導致給定的樣本不適合Sparse coding），另外就是根據群里網友@地皮菜的提示，將優化算法由lbfgs改為cg就可以得出像樣的結果。由此可見，不同的優化算法對最終的結果也是有影響的。

　　參考資料：

Exercise:Sparse Coding

Deep learning：二十六(Sparse coding簡單理解)

Deep learning：二十七(Sparse coding中關於矩陣的范數求導)

Deep learning：二十八(使用BP算法思想求解Sparse coding中矩陣范數導數)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Deep learning：二十六(Sparse coding簡單理解) Deep learning：九(Sparse Autoencoder練習) Deep learning：二十七(Sparse coding中關於矩陣的范數求導) Deep learning：二十八(使用BP算法思想求解Sparse coding中矩陣范數導數) Deep learning：三十九(ICA模型練習) Deep learning：二十四(stacked autoencoder練習) Deep learning：八(Sparse Autoencoder) Deep learning：二十三(Convolution和Pooling練習) Deep learning：二十二(linear decoder練習) Python+Selenium練習（二十九）- 獲取當前頁面全部圖片信息