Deep learning：三十九(ICA模型練習)

本文轉載自查看原文 2013-05-07 22:56 8533 機器學習/ Deep Learning

　　前言：

　　本次主要是練習下ICA模型，關於ICA模型的理論知識可以參考前面的博文：Deep learning：三十三(ICA模型)。本次實驗的內容和步驟可以是參考UFLDL上的教程：Exercise:Independent Component Analysis。本次實驗完成的內容和前面的很多練習類似，即學習STL-10數據庫的ICA特征。當然了，這些數據已經是以patches的形式給出，共2w個patch，8*8大小的。

　　實驗基礎：

　　步驟分為下面幾步：

設置網絡的參數，其中的輸入樣本的維數為8*8*3=192。
對輸入的樣本集進行白化，比如說ZCA白化，但是一定要將其中的參數eplison設置為0。
完成ICA的代價函數和其導數公式。雖然在教程Exercise:Independent Component Analysis中給出的代價函數為：

　（當然了，它還必須滿足權值W是正交矩陣）。

　　但是在UFLDL前面的一個教程Deriving gradients using the backpropagation idea中給出的代價函數卻為：

　　不過我感覺第2個代價函數要有道理些，並且在其教程中還給出了代價函數的偏導公式（這樣實現時，可以偷懶不用推導了），只不過它給出的公式有一個小小的錯誤，我把正確的公式整理如下：

　　錯誤就是公式右邊第一項最左邊的那個應該是W，而不是它的轉置W’，否則程序運行時是有矩陣維數不匹配的情況。

　　4. 最后就是對參數W進行迭代優化了，由於要使W滿足正交性這一要求，所以不能直接像以前那樣采用lbfgs算法，而是每次直接使用梯度下降法進行迭代，迭代完成后采用正交化步驟讓W變成正交矩陣。只是此時文章中所說的學習率alpha是個動態變化的，是按照線性搜索來找到的。W正交性公式為：

　　5. 如果采用上面的代價函數和偏導公式時，用Ng給的code是跑不起來的，程序在線搜索的過程中會陷入死循環。（線搜索沒有研究過，所以完全不懂）。最后在Deep Learning高質量交流群內網友”蜘蛛小俠”的提議下，將代價函數的W加一個特征稀疏性的約束，（注意此時的特征為Wx），然后把Ng的code中的迭代次數改大，比如5000，

其它程序不用更改，即可跑出結果來。

　　此時的代價函數為：

　　偏導為：

　　其中一定要考慮樣本的個數m，否則即使通過了代價函數和其導數的驗證，也不一定能通過W正交投影的驗證。

　　實驗結果：

　　用於訓練的樣本顯示如下：

　　迭代20000次后的結果如下（因為電腦CUP不給力，跑了一天，當然了跑50000次結果會更完美，我就沒時間驗證了）：

　　實驗主要部分代碼及注釋：

ICAExercise.m:

%% CS294A/CS294W Independent Component Analysis (ICA) Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  ICA exercise. In this exercise, you will need to modify
%  orthonormalICACost.m and a small part of this file, ICAExercise.m.

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

numPatches = 20000;
numFeatures = 121;
imageChannels = 3;
patchDim = 8;
visibleSize = patchDim * patchDim * imageChannels;

outputDir = '.';
% 一般情況下都將L1規則項轉換成平方加一個小系數然后開根號的形式，因為L1范數在0處不可微
epsilon = 1e-6; % L1-regularisation epsilon |Wx| ~ sqrt((Wx).^2 + epsilon)

%%======================================================================
%% STEP 1: Sample patches

patches = load('stlSampledPatches.mat');
patches = patches.patches(:, 1:numPatches);
displayColorNetwork(patches(:, 1:100));


%%======================================================================
%% STEP 2: ZCA whiten patches
%  In this step, we ZCA whiten the sampled patches. This is necessary for
%  orthonormal ICA to work.

patches = patches / 255;
meanPatch = mean(patches, 2);
patches = bsxfun(@minus, patches, meanPatch);

sigma = patches * patches';
[u, s, v] = svd(sigma);
ZCAWhite = u * diag(1 ./ sqrt(diag(s))) * u';
patches = ZCAWhite * patches;

%%======================================================================
%% STEP 3: ICA cost functions
%  Implement the cost function for orthornomal ICA (you don't have to 
%  enforce the orthonormality constraint in the cost function) 
%  in the function orthonormalICACost in orthonormalICACost.m.
%  Once you have implemented the function, check the gradient.

% Use less features and smaller patches for speed
% numFeatures = 5;
% patches = patches(1:3, 1:5);
% visibleSize = 3;
% numPatches = 5;
% 
% weightMatrix = rand(numFeatures, visibleSize);
% 
% [cost, grad] = orthonormalICACost(weightMatrix, visibleSize, numFeatures, patches, epsilon);
% 
% numGrad = computeNumericalGradient( @(x) orthonormalICACost(x, visibleSize, numFeatures, patches, epsilon), weightMatrix(:) );
% % Uncomment to display the numeric and analytic gradients side-by-side
% % disp([numGrad grad]); 
% diff = norm(numGrad-grad)/norm(numGrad+grad);
% fprintf('Orthonormal ICA difference: %g\n', diff);
% assert(diff < 1e-7, 'Difference too large. Check your analytic gradients.');
% 
% fprintf('Congratulations! Your gradients seem okay.\n');


%%======================================================================
%% STEP 4: Optimization for orthonormal ICA
%  Optimize for the orthonormal ICA objective, enforcing the orthonormality
%  constraint. Code has been provided to do the gradient descent with a
%  backtracking line search using the orthonormalICACost function 
%  (for more information about backtracking line search, you can read the 
%  appendix of the exercise).
%
%  However, you will need to write code to enforce the orthonormality 
%  constraint by projecting weightMatrix back into the space of matrices 
%  satisfying WW^T  = I.
%
%  Once you are done, you can run the code. 10000 iterations of gradient
%  descent will take around 2 hours, and only a few bases will be
%  completely learned within 10000 iterations. This highlights one of the
%  weaknesses of orthonormal ICA - it is difficult to optimize for the
%  objective function while enforcing the orthonormality constraint - 
%  convergence using gradient descent and projection is very slow.

weightMatrix = rand(numFeatures, visibleSize);%121*192
[cost, grad] = orthonormalICACost(weightMatrix(:), visibleSize, numFeatures, patches, epsilon);
fprintf('%11s%16s%10s\n','Iteration','Cost','t');
startTime = tic();

% Initialize some parameters for the backtracking line search
alpha = 0.5;
t = 0.02;
lastCost = 1e40;

% Do 10000 iterations of gradient descent
for iteration = 1:20000
                       
    grad = reshape(grad, size(weightMatrix));
    newCost = Inf;        
    linearDelta = sum(sum(grad .* grad));
    
    % Perform the backtracking line search
    while 1
        considerWeightMatrix = weightMatrix - alpha * grad;
        % -------------------- YOUR CODE HERE --------------------
        % Instructions:
        %   Write code to project considerWeightMatrix back into the space
        %   of matrices satisfying WW^T = I.
        %   
        %   Once that is done, verify that your projection is correct by 
        %   using the checking code below. After you have verified your
        %   code, comment out the checking code before running the
        %   optimization.
        
        % Project considerWeightMatrix such that it satisfies WW^T = I
%         error('Fill in the code for the projection here');        
        considerWeightMatrix = (considerWeightMatrix*considerWeightMatrix')^(-0.5)*considerWeightMatrix;
        % Verify that the projection is correct
        temp = considerWeightMatrix * considerWeightMatrix';
        temp = temp - eye(numFeatures);
        assert(sum(temp(:).^2) < 1e-23, 'considerWeightMatrix does not satisfy WW^T = I. Check your projection again');
%         error('Projection seems okay. Comment out verification code before running optimization.');
        
        % -------------------- YOUR CODE HERE --------------------                                        

        [newCost, newGrad] = orthonormalICACost(considerWeightMatrix(:), visibleSize, numFeatures, patches, epsilon);
        if newCost >= lastCost - alpha * t * linearDelta
            t = 0.9 * t;
        else
            break;
        end
    end
   
    lastCost = newCost;
    weightMatrix = considerWeightMatrix;
    
    fprintf('  %9d  %14.6f  %8.7g\n', iteration, newCost, t);
    
    t = 1.1 * t;
    
    cost = newCost;
    grad = newGrad;
           
    % Visualize the learned bases as we go along    
    if mod(iteration, 10000) == 0
        duration = toc(startTime);
        % Visualize the learned bases over time in different figures so 
        % we can get a feel for the slow rate of convergence
        figure(floor(iteration /  10000));
        displayColorNetwork(weightMatrix'); 
    end
                   
end

% Visualize the learned bases
displayColorNetwork(weightMatrix');

orthonormalICACost.m:

function [cost, grad] = orthonormalICACost(theta, visibleSize, numFeatures, patches, epsilon)
%orthonormalICACost - compute the cost and gradients for orthonormal ICA
%                     (i.e. compute the cost ||Wx||_1 and its gradient)

    weightMatrix = reshape(theta, numFeatures, visibleSize);
    
    cost = 0;
    grad = zeros(numFeatures, visibleSize);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------     
    %% 法一：
    num_samples = size(patches,2);
%     cost = sum(sum((weightMatrix'*weightMatrix*patches-patches).^2))./num_samples+...
%             sum(sum(sqrt((weightMatrix*patches).^2+epsilon)))./num_samples;
%     grad = (2*weightMatrix*(weightMatrix'*weightMatrix*patches-patches)*patches'+...
%         2*weightMatrix*patches*(weightMatrix'*weightMatrix*patches-patches)')./num_samples+...
%         ((weightMatrix*patches./sqrt((weightMatrix*patches).^2+epsilon))*patches')./num_samples;
    cost = sum(sum((weightMatrix'*weightMatrix*patches-patches).^2))./num_samples+...
            sum(sum(sqrt((weightMatrix*patches).^2+epsilon)));
    grad = (2*weightMatrix*(weightMatrix'*weightMatrix*patches-patches)*patches'+...
        2*weightMatrix*patches*(weightMatrix'*weightMatrix*patches-patches)')./num_samples+...
        (weightMatrix*patches./sqrt((weightMatrix*patches).^2+epsilon))*patches';
    grad = grad(:);
end

　　參考資料：

Deep learning：三十三(ICA模型)

Exercise:Independent Component Analysis

Deriving gradients using the backpropagation idea

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Deep learning：三十三(ICA模型) Deep learning：二十九(Sparse coding練習) deep learning 以及deep learning 常用模型和方法 Deep learning：四(logistic regression練習) Deep learning：二(linear regression練習) Deep learning：九(Sparse Autoencoder練習) Deep learning：十九(RBM簡單理解) Deep learning：六(regularized logistic回歸練習) Deep learning：三(Multivariance Linear Regression練習) Deep learning：五(regularized線性回歸練習)