Deep learning:三十九(ICA模型練習)


 

  前言:

  本次主要是練習下ICA模型,關於ICA模型的理論知識可以參考前面的博文:Deep learning:三十三(ICA模型)。本次實驗的內容和步驟可以是參考UFLDL上的教程:Exercise:Independent Component Analysis。本次實驗完成的內容和前面的很多練習類似,即學習STL-10數據庫的ICA特征。當然了,這些數據已經是以patches的形式給出,共2w個patch,8*8大小的。

 

  實驗基礎:

  步驟分為下面幾步:

  1. 設置網絡的參數,其中的輸入樣本的維數為8*8*3=192。
  2. 對輸入的樣本集進行白化,比如說ZCA白化,但是一定要將其中的參數eplison設置為0。
  3. 完成ICA的代價函數和其導數公式。雖然在教程Exercise:Independent Component Analysis中給出的代價函數為:

   

  (當然了,它還必須滿足權值W是正交矩陣)。

  但是在UFLDL前面的一個教程Deriving gradients using the backpropagation idea中給出的代價函數卻為:

   

  不過我感覺第2個代價函數要有道理些,並且在其教程中還給出了代價函數的偏導公式(這樣實現時,可以偷懶不用推導了),只不過它給出的公式有一個小小的錯誤,我把正確的公式整理如下:

   

  錯誤就是公式右邊第一項最左邊的那個應該是W,而不是它的轉置W’,否則程序運行時是有矩陣維數不匹配的情況。

  4. 最后就是對參數W進行迭代優化了,由於要使W滿足正交性這一要求,所以不能直接像以前那樣采用lbfgs算法,而是每次直接使用梯度下降法進行迭代,迭代完成后采用正交化步驟讓W變成正交矩陣。只是此時文章中所說的學習率alpha是個動態變化的,是按照線性搜索來找到的。W正交性公式為:

  

  5. 如果采用上面的代價函數和偏導公式時,用Ng給的code是跑不起來的,程序在線搜索的過程中會陷入死循環。(線搜索沒有研究過,所以完全不懂)。最后在Deep Learning高質量交流群內網友”蜘蛛小俠”的提議下,將代價函數的W加一個特征稀疏性的約束,(注意此時的特征為Wx),然后把Ng的code中的迭代次數改大,比如5000,

其它程序不用更改,即可跑出結果來。

  此時的代價函數為:

   

  偏導為:

  

  其中一定要考慮樣本的個數m,否則即使通過了代價函數和其導數的驗證,也不一定能通過W正交投影的驗證。

 

  實驗結果:

  用於訓練的樣本顯示如下:

   

  迭代20000次后的結果如下(因為電腦CUP不給力,跑了一天,當然了跑50000次結果會更完美,我就沒時間驗證了):

   

 

  實驗主要部分代碼及注釋:

ICAExercise.m:

%% CS294A/CS294W Independent Component Analysis (ICA) Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  ICA exercise. In this exercise, you will need to modify
%  orthonormalICACost.m and a small part of this file, ICAExercise.m.

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

numPatches = 20000;
numFeatures = 121;
imageChannels = 3;
patchDim = 8;
visibleSize = patchDim * patchDim * imageChannels;

outputDir = '.';
% 一般情況下都將L1規則項轉換成平方加一個小系數然后開根號的形式,因為L1范數在0處不可微
epsilon = 1e-6; % L1-regularisation epsilon |Wx| ~ sqrt((Wx).^2 + epsilon)

%%======================================================================
%% STEP 1: Sample patches

patches = load('stlSampledPatches.mat');
patches = patches.patches(:, 1:numPatches);
displayColorNetwork(patches(:, 1:100));


%%======================================================================
%% STEP 2: ZCA whiten patches
%  In this step, we ZCA whiten the sampled patches. This is necessary for
%  orthonormal ICA to work.

patches = patches / 255;
meanPatch = mean(patches, 2);
patches = bsxfun(@minus, patches, meanPatch);

sigma = patches * patches';
[u, s, v] = svd(sigma);
ZCAWhite = u * diag(1 ./ sqrt(diag(s))) * u';
patches = ZCAWhite * patches;

%%======================================================================
%% STEP 3: ICA cost functions
%  Implement the cost function for orthornomal ICA (you don't have to 
%  enforce the orthonormality constraint in the cost function) 
%  in the function orthonormalICACost in orthonormalICACost.m.
%  Once you have implemented the function, check the gradient.

% Use less features and smaller patches for speed
% numFeatures = 5;
% patches = patches(1:3, 1:5);
% visibleSize = 3;
% numPatches = 5;
% 
% weightMatrix = rand(numFeatures, visibleSize);
% 
% [cost, grad] = orthonormalICACost(weightMatrix, visibleSize, numFeatures, patches, epsilon);
% 
% numGrad = computeNumericalGradient( @(x) orthonormalICACost(x, visibleSize, numFeatures, patches, epsilon), weightMatrix(:) );
% % Uncomment to display the numeric and analytic gradients side-by-side
% % disp([numGrad grad]); 
% diff = norm(numGrad-grad)/norm(numGrad+grad);
% fprintf('Orthonormal ICA difference: %g\n', diff);
% assert(diff < 1e-7, 'Difference too large. Check your analytic gradients.');
% 
% fprintf('Congratulations! Your gradients seem okay.\n');


%%======================================================================
%% STEP 4: Optimization for orthonormal ICA
%  Optimize for the orthonormal ICA objective, enforcing the orthonormality
%  constraint. Code has been provided to do the gradient descent with a
%  backtracking line search using the orthonormalICACost function 
%  (for more information about backtracking line search, you can read the 
%  appendix of the exercise).
%
%  However, you will need to write code to enforce the orthonormality 
%  constraint by projecting weightMatrix back into the space of matrices 
%  satisfying WW^T  = I.
%
%  Once you are done, you can run the code. 10000 iterations of gradient
%  descent will take around 2 hours, and only a few bases will be
%  completely learned within 10000 iterations. This highlights one of the
%  weaknesses of orthonormal ICA - it is difficult to optimize for the
%  objective function while enforcing the orthonormality constraint - 
%  convergence using gradient descent and projection is very slow.

weightMatrix = rand(numFeatures, visibleSize);%121*192
[cost, grad] = orthonormalICACost(weightMatrix(:), visibleSize, numFeatures, patches, epsilon);
fprintf('%11s%16s%10s\n','Iteration','Cost','t');
startTime = tic();

% Initialize some parameters for the backtracking line search
alpha = 0.5;
t = 0.02;
lastCost = 1e40;

% Do 10000 iterations of gradient descent
for iteration = 1:20000
                       
    grad = reshape(grad, size(weightMatrix));
    newCost = Inf;        
    linearDelta = sum(sum(grad .* grad));
    
    % Perform the backtracking line search
    while 1
        considerWeightMatrix = weightMatrix - alpha * grad;
        % -------------------- YOUR CODE HERE --------------------
        % Instructions:
        %   Write code to project considerWeightMatrix back into the space
        %   of matrices satisfying WW^T = I.
        %   
        %   Once that is done, verify that your projection is correct by 
        %   using the checking code below. After you have verified your
        %   code, comment out the checking code before running the
        %   optimization.
        
        % Project considerWeightMatrix such that it satisfies WW^T = I
%         error('Fill in the code for the projection here');        
        considerWeightMatrix = (considerWeightMatrix*considerWeightMatrix')^(-0.5)*considerWeightMatrix;
        % Verify that the projection is correct
        temp = considerWeightMatrix * considerWeightMatrix';
        temp = temp - eye(numFeatures);
        assert(sum(temp(:).^2) < 1e-23, 'considerWeightMatrix does not satisfy WW^T = I. Check your projection again');
%         error('Projection seems okay. Comment out verification code before running optimization.');
        
        % -------------------- YOUR CODE HERE --------------------                                        

        [newCost, newGrad] = orthonormalICACost(considerWeightMatrix(:), visibleSize, numFeatures, patches, epsilon);
        if newCost >= lastCost - alpha * t * linearDelta
            t = 0.9 * t;
        else
            break;
        end
    end
   
    lastCost = newCost;
    weightMatrix = considerWeightMatrix;
    
    fprintf('  %9d  %14.6f  %8.7g\n', iteration, newCost, t);
    
    t = 1.1 * t;
    
    cost = newCost;
    grad = newGrad;
           
    % Visualize the learned bases as we go along    
    if mod(iteration, 10000) == 0
        duration = toc(startTime);
        % Visualize the learned bases over time in different figures so 
        % we can get a feel for the slow rate of convergence
        figure(floor(iteration /  10000));
        displayColorNetwork(weightMatrix'); 
    end
                   
end

% Visualize the learned bases
displayColorNetwork(weightMatrix');

 

orthonormalICACost.m:

function [cost, grad] = orthonormalICACost(theta, visibleSize, numFeatures, patches, epsilon)
%orthonormalICACost - compute the cost and gradients for orthonormal ICA
%                     (i.e. compute the cost ||Wx||_1 and its gradient)

    weightMatrix = reshape(theta, numFeatures, visibleSize);
    
    cost = 0;
    grad = zeros(numFeatures, visibleSize);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------     
    %% 法一:
    num_samples = size(patches,2);
%     cost = sum(sum((weightMatrix'*weightMatrix*patches-patches).^2))./num_samples+...
%             sum(sum(sqrt((weightMatrix*patches).^2+epsilon)))./num_samples;
%     grad = (2*weightMatrix*(weightMatrix'*weightMatrix*patches-patches)*patches'+...
%         2*weightMatrix*patches*(weightMatrix'*weightMatrix*patches-patches)')./num_samples+...
%         ((weightMatrix*patches./sqrt((weightMatrix*patches).^2+epsilon))*patches')./num_samples;
    cost = sum(sum((weightMatrix'*weightMatrix*patches-patches).^2))./num_samples+...
            sum(sum(sqrt((weightMatrix*patches).^2+epsilon)));
    grad = (2*weightMatrix*(weightMatrix'*weightMatrix*patches-patches)*patches'+...
        2*weightMatrix*patches*(weightMatrix'*weightMatrix*patches-patches)')./num_samples+...
        (weightMatrix*patches./sqrt((weightMatrix*patches).^2+epsilon))*patches';
    grad = grad(:);
end

 

 

 

  參考資料:

     Deep learning:三十三(ICA模型)

     Exercise:Independent Component Analysis

     Deriving gradients using the backpropagation idea

 

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM