Deep Learning 12_深度學習UFLDL教程:Sparse Coding_exercise(斯坦福大學深度學習教程)


前言

理論知識UFLDL教程Deep learning:二十六(Sparse coding簡單理解)Deep learning:二十七(Sparse coding中關於矩陣的范數求導)Deep learning:二十九(Sparse coding練習)

實驗環境:win7, matlab2015b,16G內存,2T機械硬盤

本節實驗比較不好理解也不好做,我看很多人最后也沒得出好的結果,所以得花時間仔細理解才行。

實驗內容Exercise:Sparse Coding。從10張512*512的已經白化后的灰度圖像(即:Deep Learning 一_深度學習UFLDL教程:Sparse Autoencoder練習(斯坦福大學深度學習教程)中用到的數據 sparseae_exercise.zip中的IMAGES.mat)中隨機抽取20000張小圖像塊(大小為8*8或16*16),分別通過稀疏編碼(Sparse Coding)和拓撲稀疏編碼(topographic sparse coding)的方法學習它們的特征,並分別顯示出來。

實驗基礎說明

1.稀疏自動編碼器(Sparse Autoencoder)和稀疏編碼(Sparse Coding)的區別

     Ng的解釋:稀疏編碼可以看作是稀疏自編碼方法的一個變形,稀疏編碼試圖直接學習數據的特征集。而稀疏自動編碼是直接學習原始數據。

稀疏編碼算法是一種無監督學習方法,它用來尋找一組“超完備”基向量來更高效地表示樣本數據,這個“超完備”基的維數大於原始數據的維數。而只有一層隱含層的稀疏自動編碼器相當於PCA,只能使我們方便地找到一組“完備”基向量,它的維數小於原始數據的維數。

     ②稀疏編碼算法的目的就是找到一組基向量 \mathbf{\phi}_i ,使得我們能將輸入向量 \mathbf{x} 表示為這些基向量的線性組合:

                       \begin{align}
\mathbf{x} = \sum_{i=1}^k a_i \mathbf{\phi}_{i} 
\end{align},其中x=[x1,x2,x3,……,xn]

     一般情況下要求基的個數k非常大,至少要比x中元素的個數n要大,因此上面這樣的方程就大多情況會有無窮多個解。我們為什么要尋找這樣的“超完備”基呢?因為超完備基的好處是它們能更有效地找出隱含在輸入數據內部的結構與模式。其實在常見的PCA算法中,是可以找到一組基來分解X的,只不過那個基的數目比較小,所以可以得到分解后的系數a是可以唯一確定,而在sparse coding中,k太大,比n大很多,其分解系數a不能唯一確定。為了能唯一確定a,一般的做法是對系數a作一個稀疏性約束,即:要求系數 ai 是稀疏的。也就是說要求:對於一組輸入向量,在a中我們只想有盡可能少的幾個系數遠大於零,其他都等於0或接近於0。a變稀疏從而就會使x變得稀疏。

  如果把稀疏編碼看作稀疏自動編碼的變形,那么上面的ai 對應的是特征集featureMatrix(Ng的表示:s), \mathbf{\phi}_i 對應的是權值矩陣weightMatrix(Ng的表示:A)。實際上本節實驗就是想在有盡可能少的幾個系數(即:s)遠大於零的情況下,求出A,並把它顯示出來。A就是我們要找的“超完備”基。

      在稀疏編碼中,權值矩陣 A 用於將特征 s 映射到原始數據x,而在以前的所有實驗(包括稀疏自動編碼)中,權值矩陣 W 工作的方向相反,是將原始數據 x 映射到特征

2.稀疏編碼的作用?即:為什么要稀疏編碼?

Ng已經回答了:稀疏編碼算法是一種無監督學習方法,它用來尋找一組“超完備”基向量來更高效地表示樣本數據,而超完備基的好處是它們能更有效地找出隱含在輸入數據內部的結構與模式。

 

3.代價函數

非拓撲稀疏編碼(non-topographic sparse coding)時的代價函數:

     

拓撲稀疏編碼(topographic sparse coding)時的代價函數:

  

注意:\lVert x \rVert_k是x的Lk范數,等價於 \left( \sum{ \left| x_i^k \right| } \right) ^{\frac{1}{k}}。L2 范數即大家熟知的歐幾里得范數,即:平方和的開方。L1 范數是向量元素的絕對值之和

在Ng的講解中,這個代價函數表達是不准確的,起碼在重構項中少了要除以元素個數,即:重構項實際上是均方誤差,而不是誤差平方和。

 

4.代價函數的導數

代價函數關於權值矩陣A的導數(拓撲和非拓撲時結果是一樣的,因為此時這2種情況下代價函數關於A是沒區別的):

     

非拓撲稀疏編碼(non-topographic sparse coding)時代價函數關於s的導數:

     

拓撲稀疏編碼(topographic sparse coding)時代價函數關於s的導數為:

   

其中,m為樣本數量。

 上面矩陣求導公式的推導,可參考Deep learning:二十七(Sparse coding中關於矩陣的范數求導)http://www.mathchina.net/dvbbs/dispbbs.asp?boardid=4&Id=3673The Matrix Cookbook(強烈推薦這本書,還有Writing Fast MATLAB Code (by Pascal Getreuer)也非常值得一看)。

 

5.迭代優化時的參數

迭代優化參數時,給定s優化A時,由於A有直接的解析解,所以不需要通過lbfgs等優化算法求得,通過令代價函數對A的導函數為0,可以得到解析解為:

         其中,m為樣本個數。

 

6.注意理解:本節實驗的代價函數是怎么推導出來的?只有在理解它的基礎上,以后才能自己根據自己的模型和需要來推導自己的代價函數。

      實際上,Ng在他的講解(稀疏編碼稀疏編碼自編碼表達)中已經很詳細清楚地講明了這一點。下面的解釋只是為了把話說得更直白一點:

      ①重構項:因為本節實驗的目的是尋找數據X的“超完備”基,並把數據x用“超完備”基表示出來,即:\begin{align}
\mathbf{x} = \sum_{i=1}^k a_i \mathbf{\phi}_{i} 
\end{align},其中\mathbf{\phi}_i 就是它的“超完備”基,其矩陣表示為A。 ai 是數據x的特征集(也就是“超完備”基的系數),其矩陣表示為s。

為了使數據x和之間的誤差最小,那么代價函數必須要包括這兩者的均方差,並且要使這個代價函數最小,即:

最小化

矩陣表示為:

Ng叫這一項為“重構項”,其實也是均方誤差項。所以Ng的表示實際上是有一點不准確的,少了除以樣本數m,當然這只是表示的問題,在他的代碼中是除以m的。

     ②稀疏懲罰項:重構項可以保證稀疏編碼算法能為輸入向量 \mathbf{x} 提供一個高擬合度的線性表達式,即保證數據x和之間的誤差最小,但是我們還要求系數只有很少的幾個非零元素或只有很少的幾個遠大於零的元素,重構項並不能保證這一點,所以為了保證這個我們必須要使下面的式子最小:  ,其中在本節實驗中S(.) 的選擇是L1 范式代價函數 S(a_i)=\left|a_i\right|_1  ,其實也可以選擇對數代價函數 S(a_i)=\log(1+a_i^2) 。

只要上式最小化就能保證系數a只有很少的幾個遠大於零的元素,即保證系數ai的稀疏性。因為ai 的矩陣表示為s,所以稀疏懲罰項矩陣表示為:

 又因為以后我們為求代價函數的最小值,會對代價函數求導,而對s在0點處不可導的(理解:y=|x|在x=0處不可導),所以為了方便以后求代價函數最小值,可把替換為,其中ε 是“平滑參數”("smoothing parameter")或者“稀疏參數”("sparsity parameter") 。

      綜上,稀疏懲罰項為:

 故此時代價函數為:

\begin{align}
\text{minimize}_{a^{(j)}_i,\mathbf{\phi}_{i}} \sum_{j=1}^{m} \left|\left| \mathbf{x}^{(j)} - \sum_{i=1}^k a^{(j)}_i \mathbf{\phi}_{i}\right|\right|^{2} + \lambda \sum_{i=1}^{k}S(a^{(j)}_i)
\end{align}

矩陣表示為:

 

     ③第三項:如只有前面兩項,那么就存在一個問題:如果將系數a(矩陣表示為s)減到很小,且將每個基(矩陣表示為A)的值增加到很大,這樣第一項的代價值基本保持不變,而第二項的稀疏懲罰值卻會相應變小。也就是說只要將所有系數a都減小同時基的值增大,那么就能最小化上面的代價函數。這樣就會得到所有的系數a都變得很小,即:它們都都大於0但接近於0,而這並不滿足我們對a的稀疏性要求:分解系數a中只有少數系數遠遠大於0,而不是大部分系數都比0大(雖然不會大太多)。所以為了防止這種情況發生,我們只需要保證每個A的元素值(即:基向量\mathbf{\phi}_i)都不會變得太大就可以,從而只需要要再加下面一項:

,矩陣表示為:A_j^TA_j \le 1 \; \forall j

也就是說,我們只需要再最小化\sum_r{\sum_c{A_{rc}^2}},它是A中各項的平方和,矩陣表示為:最小化,把它加入到代價函數中就可以保證前面提到的情況不會發生。

所以,代價函數第三項為:

通過這上面三步的推導,我們才得出滿足我們要求的代價函數:

對於拓撲稀疏編碼的代價函數可見Ng的解釋,他已經說很清楚很好理解了。

 

7.注意:因為10張512*512的灰度圖像是已經白化后的圖像,故它的值不是在[0,1]之間。以前實驗的sampleIMAGES函數中要把樣本歸一化到[0,1]之間的原因在於它們都是輸入到稀疏自動編碼器,其要求輸入數據為[0,1]范圍,而本節實驗是把數據輸入到稀疏編碼中,它並不要求數據輸入范圍為[0,1],同時歸一化本身實際上是有誤差的(因為它是根據3sigma法則歸一化的),所以本節實驗修改后的sampleIMAGES函數中,不需要再把隨機抽取20000張小圖像塊歸一化到[0,1]范圍,即:要把原來sampleIMAGES函數中如下代碼注釋掉:patches = normalizeData(patches);。實際上這一步非常重要,因為它直接關系到最后是否可以得到要求的結果圖。具體抽取函數 sampleIMAGES見下面代碼。

 

8.優秀的編程技巧

assert(diff < 1e-8, 'Weight difference too large. Check your weight cost function. ');

用於檢查邏輯不等式diff < 1e-8為真還是假,如為假就顯示:Weight difference too large. Check your weight cost function

 

9.一些Matlab函數

  circshift:

  該函數是將矩陣循環平移的函數,比如說B = circshift(A,shiftsize)是將矩陣A按照shiftsize的方式左右平移,一般hiftsize為一個多維的向量,第一個元素表示上下方向移動(更准確的說是在第一個維度上移動,這里只是考慮是2維矩陣的情況,后面的類似),如果為正表示向下移,第二個元素表示左右方向移動,如果向右表示向右移動。

  rndperm

  該函數是隨機產生一個行向量,比如randperm(n)產生一個n維的行向量,向量元素值為1~n,隨機選取且不重復。而randperm(n,k)表示產生一個長為k的行向量,其元素也是在1到n之間,不能有重復。

  questdlg

  button = questdlg('qstring','title','str1','str2','str3',default),這是一個對話框,對話框中的內容用qstring表示,標題為title,然后后面3個分別為對應yes,no,cancel按鈕,最后的參數default為默認的對應按鈕。

 

疑問

1.分組矩陣V究竟應該怎么樣產生?為什么這樣產生?

Ng對分組矩陣V的產生方法如下:

donutDim = floor(sqrt(numFeatures));
assert(donutDim * donutDim == numFeatures,'donutDim^2 must be equal to numFeatures');

groupMatrix = zeros(numFeatures, donutDim, donutDim);

groupNum = 1;
for row = 1:donutDim
    for col = 1:donutDim
        groupMatrix(groupNum, 1:poolDim, 1:poolDim) = 1;
        groupNum = groupNum + 1;
        groupMatrix = circshift(groupMatrix, [0 0 -1]);
    end
    groupMatrix = circshift(groupMatrix, [0 -1, 0]);
end
%poolDim = 3;

為什么這么產生?沒搞懂!

 

實驗步驟

1.初始化參數,然后加載數據(即:10張512*512的已經白化后的灰度圖),然后從中隨機抽取出20000張小圖像塊。對於本節修改后的抽取函數見 sampleIMAGES.m。

2. 實現稀疏編碼代價函數,並求出代價函數對權值矩陣A的導數,即代價函數對A梯度,然后檢查它的實現是否正確。因為非拓撲稀疏編碼代價函數和拓撲稀疏編碼代價函數對A的導數是一樣的,所以這里不用分開求。

3.實現非拓撲稀疏編碼代價函數,並求出它對特征集s的導數,即代價函數對s梯度,然后檢查它的實現是否正確。

4.實現拓撲稀疏編碼代價函數,並求出它對特征集s的導數,即代價函數對s梯度,然后檢查它的實現是否正確。

5.迭代優化參數A和s(可用lbfgs或cg算法),得到最佳的權值矩陣A,並把它顯示出來。

 

結果

①部分原始數據

②非拓撲稀疏編碼算法,優化算法為lbfgs,特征個數為256,小圖像塊大小為16*16時,提取的“超完備”基為:

③拓撲稀疏編碼算法,優化算法為lbfgs,特征個數為256,小圖像塊大小為16*16時,提取的“超完備”基為:

④拓撲稀疏編碼算法,優化算法為cg,特征個數為256,小圖像塊大小為16*16時,提取的“超完備”基為:

⑤拓撲稀疏編碼算法,優化算法為cg,特征個數為121,小圖像塊大小為8*8時,提取的“超完備”基為:

總結

1.優化算法對結果是有影響的。比如,從上面可看出,對於拓撲稀疏編碼,優化算法為cg,特征個數為121,小圖像塊大小為16*16時,cg算法得到的結果比lbfgs算法更好。

2.小圖像塊大小對結果是有影響的。比如,拓撲稀疏編碼算法,特征個數為121,小圖像塊大小為8*8時,如果用lbfgs算法就得不到結果,而用cg算法就可以得到上面的結果。

3.規律:隨着迭代次數的增加,代價函數值應該是越來越小的,如果不是,就永遠不會得到正確結果。如果代價函數不是越來越小,可能是優化算法的選擇有問題,或者小圖像塊的選擇有問題,具體情況具體分析,有多種原因。當然更本質的原因,還沒弄清楚。

 

代碼

sparseCodingExercise.m

%% CS294A/CS294W Sparse Coding Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  sparse coding exercise. In this exercise, you will need to modify
%  sparseCodingFeatureCost.m and sparseCodingWeightCost.m. You will also
%  need to modify this file, sparseCodingExercise.m slightly.

% Add the paths to your earlier exercises if necessary
% addpath /path/to/solution
addpath minFunc/;
%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

numPatches = 20000;   % number of patches
numFeatures = 256;    % number of features to learn
patchDim = 16;         % patch dimension
visibleSize = patchDim * patchDim; 

% dimension of the grouping region (poolDim x poolDim) for topographic sparse coding
poolDim = 3;

% number of patches per batch
batchNumPatches = 2000; 

lambda = 5e-5;  % L1-regularisation parameter (on features)
epsilon = 1e-5; % L1-regularisation epsilon |x| ~ sqrt(x^2 + epsilon)
gamma = 1e-2;   % L2-regularisation parameter (on basis)

%%======================================================================
%% STEP 1: Sample patches

images = load('IMAGES.mat');
images = images.IMAGES;

patches = sampleIMAGES(images, patchDim, numPatches);
display_network(patches(:, 1:64));

%%======================================================================
%% STEP 2: Implement and check sparse coding cost functions
%  Implement the two sparse coding cost functions and check your gradients.
%  The two cost functions are
%  1) sparseCodingFeatureCost (in sparseCodingFeatureCost.m) for the features 
%     (used when optimizing for s, which is called featureMatrix in this exercise) 
%  2) sparseCodingWeightCost (in sparseCodingWeightCost.m) for the weights
%     (used when optimizing for A, which is called weightMatrix in this exericse)

% We reduce the number of features and number of patches for debugging
debug = false;
if debug
numFeatures = 5;
patches = patches(:, 1:5);
numPatches = 5;

weightMatrix = randn(visibleSize, numFeatures) * 0.005;
featureMatrix = randn(numFeatures, numPatches) * 0.005;

%% STEP 2a: Implement and test weight cost
%  Implement sparseCodingWeightCost in sparseCodingWeightCost.m and check
%  the gradient

[cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon);

numgrad = computeNumericalGradient( @(x) sparseCodingWeightCost(x, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon), weightMatrix(:) );
% Uncomment the blow line to display the numerical and analytic gradients side by side
% disp([numgrad grad]);     
diff = norm(numgrad-grad)/norm(numgrad+grad);
fprintf('Weight difference: %g\n', diff);
assert(diff < 1e-8, 'Weight difference too large. Check your weight cost function. ');

%% STEP 2b: Implement and test feature cost (非拓撲結構non-topographic)
%  Implement sparseCodingFeatureCost in sparseCodingFeatureCost.m and check
%  the gradient. You may wish to implement the non-topographic version of
%  the cost function first, and extend it to the topographic version later.

% Set epsilon to a larger value so checking the gradient numerically makes sense
epsilon = 1e-2;

% We pass in the identity matrix as the grouping matrix, putting each
% feature in a group on its own.
groupMatrix = eye(numFeatures);

[cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix);

numgrad = computeNumericalGradient( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix), featureMatrix(:) );
% Uncomment the blow line to display the numerical and analytic gradients side by side
% disp([numgrad grad]); 
diff = norm(numgrad-grad)/norm(numgrad+grad);
fprintf('Feature difference (non-topographic): %g\n', diff);
assert(diff < 1e-8, 'Feature difference too large. Check your feature cost function. ');

%% STEP 2c: Implement and test feature cost (拓撲結構topographic)
%  Implement sparseCodingFeatureCost in sparseCodingFeatureCost.m and check
%  the gradient. This time, we will pass a random grouping matrix in to
%  check if your costs and gradients are correct for the topographic
%  version.

% Set epsilon to a larger value so checking the gradient numerically makes sense
epsilon = 1e-2;

% This time we pass in a random grouping matrix to check if the grouping is
% correct.
groupMatrix = rand(100, numFeatures);

[cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix);

numgrad = computeNumericalGradient( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix), featureMatrix(:) );
% Uncomment the blow line to display the numerical and analytic gradients side by side
% disp([numgrad grad]); 
diff = norm(numgrad-grad)/norm(numgrad+grad);
fprintf('Feature difference (topographic): %g\n', diff);
assert(diff < 1e-8, 'Feature difference too large. Check your feature cost function. ');
end

%%======================================================================
%% STEP 3: Iterative optimization
%  Once you have implemented the cost functions, you can now optimize for
%  the objective iteratively. The code to do the iterative optimization 
%  using mini-batching and good initialization of the features has already
%  been included for you. 
% 
%  However, you will still need to derive and fill in the analytic solution 
%  for optimizing the weight matrix given the features. 
%  Derive the solution and implement it in the code below, verify the
%  gradient as described in the instructions below, and then run the
%  iterative optimization.


% Initialize options for minFunc
options.Method = 'lbfgs';
options.display = 'off';
options.verbose = 0;

% Initialize matrices
weightMatrix = rand(visibleSize, numFeatures);
featureMatrix = rand(numFeatures, batchNumPatches);

% Initialize grouping matrix
assert(floor(sqrt(numFeatures)) ^2 == numFeatures, 'numFeatures should be a perfect square');
donutDim = floor(sqrt(numFeatures));
assert(donutDim * donutDim == numFeatures,'donutDim^2 must be equal to numFeatures');

groupMatrix = zeros(numFeatures, donutDim, donutDim);

groupNum = 1;
for row = 1:donutDim
    for col = 1:donutDim
        groupMatrix(groupNum, 1:poolDim, 1:poolDim) = 1;
        groupNum = groupNum + 1;
        groupMatrix = circshift(groupMatrix, [0 0 -1]);
    end
    groupMatrix = circshift(groupMatrix, [0 -1, 0]);
end

groupMatrix = reshape(groupMatrix, numFeatures, numFeatures);
if isequal(questdlg('Initialize grouping matrix for topographic or non-topographic sparse coding?', 'Topographic/non-topographic?', 'Non-topographic', 'Topographic', 'Non-topographic'), 'Non-topographic')
    groupMatrix = eye(numFeatures);
end

% Initial batch
indices = randperm(numPatches);
indices = indices(1:batchNumPatches);
batchPatches = patches(:, indices);                           

fprintf('%6s%12s%12s%12s%12s\n','Iter', 'fObj','fResidue','fSparsity','fWeight');

for iteration = 1:200                      
    error = weightMatrix * featureMatrix - batchPatches;
    error = sum(error(:) .^ 2) / batchNumPatches;
    
    fResidue = error;
    
    R = groupMatrix * (featureMatrix .^ 2);
    R = sqrt(R + epsilon);    
    fSparsity = lambda * sum(R(:));    
    
    fWeight = gamma * sum(weightMatrix(:) .^ 2);
    
    fprintf('  %4d  %10.4f  %10.4f  %10.4f  %10.4f\n', iteration, fResidue+fSparsity+fWeight, fResidue, fSparsity, fWeight)
               
    % Select a new batch
    indices = randperm(numPatches);
    indices = indices(1:batchNumPatches);
    batchPatches = patches(:, indices);                    
    
    % Reinitialize featureMatrix with respect to the new batch
    featureMatrix = weightMatrix' * batchPatches;
    normWM = sum(weightMatrix .^ 2)';
    featureMatrix = bsxfun(@rdivide, featureMatrix, normWM); 
    
    % Optimize for feature matrix    
    options.maxIter = 20;
    [featureMatrix, cost] = minFunc( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix), ...
                                           featureMatrix(:), options);
    featureMatrix = reshape(featureMatrix, numFeatures, batchNumPatches);                                      
       
    % Optimize for weight matrix  
%     weightMatrix = zeros(visibleSize, numFeatures);     
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Fill in the analytic solution for weightMatrix that minimizes 
    %   the weight cost here.     
    %   Once that is done, use the code provided below to check that your
    %   closed form solution is correct.
    %   Once you have verified that your closed form solution is correct,
    %   you should comment out the checking code before running the
    %   optimization.
    
    weightMatrix = batchPatches * featureMatrix' / (featureMatrix * featureMatrix' + gamma * batchNumPatches * eye(numFeatures));
    
%     [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix);
%     assert(norm(grad) < 1e-12, 'Weight gradient should be close to 0. Check your closed form solution for weightMatrix again.')
%     error('Weight gradient is okay. Comment out checking code before running optimization.');
    % -------------------- YOUR CODE HERE --------------------   
    
    % Visualize learned basis
    figure(1);
    display_network(weightMatrix);           
end

 

sampleIMAGES.m

function patches = sampleIMAGES(IMAGES, patchsize, numpatches)
% sampleIMAGES
% Returns 10000 patches for training

% load IMAGES;    % load images from disk 
% 
% patchsize = 8;  % we'll use 8x8 patches 
% numpatches = 10000;

% Initialize patches with zeros.  Your code will fill in this matrix--one
% column per patch, 10000 columns. 
patches = zeros(patchsize*patchsize, numpatches);

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Fill in the variable called "patches" using data 
%  from IMAGES.  
%  
%  IMAGES is a 3D array containing 10 images
%  For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image,
%  and you can type "imagesc(IMAGES(:,:,6)), colormap gray;" to visualize
%  it. (The contrast on these images look a bit off because they have
%  been preprocessed using using "whitening."  See the lecture notes for
%  more details.) As a second example, IMAGES(21:30,21:30,1) is an image
%  patch corresponding to the pixels in the block (21,21) to (30,30) of
%  Image 1
tic
image_size=size(IMAGES);
i=randi(image_size(1)-patchsize+1,1,numpatches);%生成元素值隨機為大於0且小於image_size(1)-patchsize+1的1行numpatches矩陣
j=randi(image_size(2)-patchsize+1,1,numpatches);
k=randi(image_size(3),1,numpatches);
for num=1:numpatches
        patches(:,num)=reshape(IMAGES(i(num):i(num)+patchsize-1,j(num):j(num)+patchsize-1,k(num)),1,patchsize*patchsize);
end
toc

%% ---------------------------------------------------------------
% For the autoencoder to work well we need to normalize the data
% Specifically, since the output of the network is bounded between [0,1]
% (due to the sigmoid activation function), we have to make sure 
% the range of pixel values is also bounded between [0,1]
% patches = normalizeData(patches);

end


%% ---------------------------------------------------------------
function patches = normalizeData(patches)

% Squash data to [0.1, 0.9] since we use sigmoid as the activation
% function in the output layer

% Remove DC (mean of images). 把patches數組中的每個元素值都減去mean(patches)
patches = bsxfun(@minus, patches, mean(patches));

% Truncate to +/-3 standard deviations and scale to -1 to 1
pstd = 3 * std(patches(:));%把patches的標准差變為其原來的3倍
patches = max(min(patches, pstd), -pstd) / pstd;%因為根據3sigma法則,95%以上的數據都在該區域內
                                                % 這里轉換后將數據變到了-1到1之間
% Rescale from [-1,1] to [0.1,0.9]
patches = (patches + 1) * 0.4 + 0.1;

end

 

sparseCodingWeightCost.m

function [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures,  patches, gamma, lambda, epsilon, groupMatrix)
%sparseCodingWeightCost - given the features in featureMatrix, 
%                         computes the cost and gradient with respect to
%                         the weights, given in weightMatrix
% parameters
%   weightMatrix  - the weight matrix. weightMatrix(:, c) is the cth basis
%                   vector.
%   featureMatrix - the feature matrix. featureMatrix(:, c) is the features
%                   for the cth example
%   visibleSize   - number of pixels in the patches
%   numFeatures   - number of features
%   patches       - patches
%   gamma         - weight decay parameter (on weightMatrix)
%   lambda        - L1 sparsity weight (on featureMatrix)
%   epsilon       - L1 sparsity epsilon
%   groupMatrix   - the grouping matrix. groupMatrix(r, :) indicates the
%                   features included in the rth group. groupMatrix(r, c)
%                   is 1 if the cth feature is in the rth group and 0
%                   otherwise.

    if exist('groupMatrix', 'var')
        assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
    else
        groupMatrix = eye(numFeatures);
    end

    numExamples = size(patches, 2);

    weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
    featureMatrix = reshape(featureMatrix, numFeatures, numExamples);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------    
 %% 求目標的代價函數
    delta = weightMatrix*featureMatrix-patches;
    fResidue = sum(sum(delta.^2))./numExamples;%重構誤差
    fWeight = gamma*sum(weightMatrix(:).^2);%防止基內元素值過大
    sparsityMatrix = sqrt(groupMatrix*(featureMatrix.^2)+epsilon);
    fSparsity = lambda*sum(sparsityMatrix(:));% 對特征系數性的懲罰值
    cost = fResidue+fWeight+fSparsity; %目標的代價函數
%     cost = fResidue+fWeight;
    
    %% 求目標代價函數的偏導函數
    grad = (2*weightMatrix*featureMatrix*(featureMatrix')-2*patches*featureMatrix')./numExamples+2*gamma*weightMatrix;
    grad = grad(:);
    
    
    
    
end

 

sparseCodingFeatureCost.m

function [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix)
%sparseCodingFeatureCost - given the weights in weightMatrix,
%                          computes the cost and gradient with respect to
%                          the features, given in featureMatrix
% parameters
%   weightMatrix  - the weight matrix. weightMatrix(:, c) is the cth basis
%                   vector.
%   featureMatrix - the feature matrix. featureMatrix(:, c) is the features
%                   for the cth example
%   visibleSize   - number of pixels in the patches
%   numFeatures   - number of features
%   patches       - patches
%   gamma         - weight decay parameter (on weightMatrix)
%   lambda        - L1 sparsity weight (on featureMatrix)
%   epsilon       - L1 sparsity epsilon
%   groupMatrix   - the grouping matrix. groupMatrix(r, :) indicates the
%                   features included in the rth group. groupMatrix(r, c)
%                   is 1 if the cth feature is in the rth group and 0
%                   otherwise.

    if exist('groupMatrix', 'var')
        assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
    else
        groupMatrix = eye(numFeatures);
    end

    numExamples = size(patches, 2);

    weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
    featureMatrix = reshape(featureMatrix, numFeatures, numExamples);

    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   features given in featureMatrix.     
    %   You may wish to write the non-topographic version, ignoring
    %   the grouping matrix groupMatrix first, and extend the 
    %   non-topographic version to the topographic version later.
    % -------------------- YOUR CODE HERE --------------------
    
      %% 求目標的代價函數
    delta = weightMatrix*featureMatrix-patches;
    fResidue = sum(sum(delta.^2))./numExamples;%重構誤差
    fWeight = gamma*sum(weightMatrix(:).^2);%防止基內元素值過大
    sparsityMatrix = sqrt(groupMatrix*(featureMatrix.^2)+epsilon);
    fSparsity = lambda*sum(sparsityMatrix(:)); %對特征系數性的懲罰值
    cost = fResidue++fSparsity+fWeight;%此時A為常量,可以不用
%     cost = fResidue++fSparsity;

    %% 求目標代價函數的偏導函數
    gradResidue = (-2*weightMatrix'*patches+2*(weightMatrix')*weightMatrix*featureMatrix)./numExamples;
  
    % 非拓撲結構時:
    isTopo = 1;  %isTopo = 1時,表示為拓撲結構
    if ~isTopo
        gradSparsity = lambda*(featureMatrix./sparsityMatrix);
        
    % 拓撲結構時
    else
        gradSparsity = lambda*groupMatrix'*(groupMatrix*(featureMatrix.^2)+epsilon).^(-0.5).*featureMatrix;%一定要小心最后一項是內積乘法
    end
    grad = gradResidue+gradSparsity;
    grad = grad(:);
    
    
    
    
    
end

 

 

 

 

 

 

 

 

參考資料

http://blog.csdn.net/zouxy09/article/details/8777094

http://blog.csdn.net/lu597203933/article/details/46489647

UFLDL教程

Deep learning:二十六(Sparse coding簡單理解)

Deep learning:二十七(Sparse coding中關於矩陣的范數求導)

Deep learning:二十九(Sparse coding練習)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM