前言
本文是基於Exercise:PCA and Whitening的練習。
理論知識見:UFLDL教程。
實驗內容:從10張512*512自然圖像中隨機選取10000個12*12的圖像塊(patch),然后對這些patch進行99%的方差保留的PCA計算,最后對這些patch做PCA Whitening和ZCA Whitening,並進行比較。
實驗步驟及結果
1.加載圖像數據,得到10000個圖像塊為原始數據x,它是144*10000的矩陣,隨機顯示200個圖像塊,其結果如下:
2.把它的每個圖像塊0均值歸一化。
3.PCA降維過程的第一步:求歸一化后的原始數據x的協方差矩陣sigma,然后用svd對sigma求出它的U,即原始數據的特征向量或基,再把x投影或旋轉到基的方向上,得到新數據xRot。
4.檢查PCA實現的第一步是否正確:只需要把xRot的協方差矩陣顯示出來。如果是正確的,就會顯示出一條直線對角穿過藍色背景的圖片。結果如下:
5.根據要保留99%方差的要求計算出要保留的主成份個數k。
6.PCA降維過程的第二步:保留xRot的前k個成份,后面的全置為0,得到數據xTilde,基U乘以數據xTilde的前k個成份(即:前k行)就得降維后數據xHat。xHat顯示結果如下:
為了對比,有0均值歸一化后未降維前的數據顯示如下:
7.對0均值歸一化后的數據x實現PCA Whitening,得到PCA白化后的數據xPCAWhite,其顯示結果如下:
8.檢查PCA白化是否規整化:顯示數據xPCAWhite的協方差矩陣。如未規整化,則數據xPCAWhite的協方差矩陣是一個恆等矩陣;如已規整化,則數據xPCAWhite的協方差矩陣的對角線上的值接近於1且依次變小。所以,如未規整化,把epsilon置為0或接近於0,就會得到一條紅線對角穿過藍色背景圖片;如已規整化,就會得到就會得到一條從紅色漸變到藍色的線對角穿過藍色背景的圖片。顯示結果如下:
9.在PCA Whitening的基礎上實現ZCAWhitening,得到的數據xZCAWhite=U* xPCAWhite。因為前面已經檢查過PCA白化,而zca白化是在pca的基礎上做的,故這一步不需要再檢查。ZCA白化的結果顯示如下:
對比PCA白化結果,可以看出,ZCA白化更接近原始數據。
與其相對應的歸一化原始數據顯示如下:
代碼
pca_gen.m
close all; % clear all; %%================================================================ %% Step 0a: Load data % Here we provide the code to load natural image data into x. % x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to % the raw image data from the kth 12x12 image patch sampled. % You do not need to change the code below. x = sampleIMAGESRAW(); figure('name','Raw images'); randsel = randi(size(x,2),200,1); % A random selection of samples for visualization display_network(x(:,randsel)); %%================================================================ %% Step 0b: Zero-mean the data (by row) % You can make use of the mean and repmat/bsxfun functions. % -------------------- YOUR CODE HERE -------------------- avg = mean(x, 1); %x的每一列的均值 x = x - repmat(avg, size(x, 1), 1); %%================================================================ %% Step 1a: Implement PCA to obtain xRot % Implement PCA to obtain xRot, the matrix in which the data is expressed % with respect to the eigenbasis of sigma, which is the matrix U. % -------------------- YOUR CODE HERE -------------------- xRot = zeros(size(x)); % You need to compute this sigma = x * x' / size(x, 2); [U,S,V]=svd(sigma); xRot=U'*x; %%================================================================ %% Step 1b: Check your implementation of PCA % The covariance matrix for the data expressed with respect to the basis U % should be a diagonal matrix with non-zero entries only along the main % diagonal. We will verify this here. % Write code to compute the covariance matrix, covar. % When visualised as an image, you should see a straight line across the % diagonal (non-zero entries) against a blue background (zero entries). % -------------------- YOUR CODE HERE -------------------- covar = zeros(size(x, 1)); % You need to compute this covar = xRot * xRot' / size(xRot, 2); % Visualise the covariance matrix. You should see a line across the % diagonal against a blue background. figure('name','Visualisation of covariance matrix'); imagesc(covar); %%================================================================ %% Step 2: Find k, the number of components to retain % Write code to determine k, the number of components to retain in order % to retain at least 99% of the variance. % -------------------- YOUR CODE HERE -------------------- k = 0; % Set k accordingly sum_k=0; sum=trace(S); for k=1:size(S,1) sum_k=sum_k+S(k,k); if(sum_k/sum>=0.99) %0.9 break; end end %%================================================================ %% Step 3: Implement PCA with dimension reduction % Now that you have found k, you can reduce the dimension of the data by % discarding the remaining dimensions. In this way, you can represent the % data in k dimensions instead of the original 144, which will save you % computational time when running learning algorithms on the reduced % representation. % % Following the dimension reduction, invert the PCA transformation to produce % the matrix xHat, the dimension-reduced data with respect to the original basis. % Visualise the data and compare it to the raw data. You will observe that % there is little loss due to throwing away the principal components that % correspond to dimensions with low variation. % -------------------- YOUR CODE HERE -------------------- xHat = zeros(size(x));% You need to compute this xTilde = U(:,1:k)' * x; xHat(1:k,:)=xTilde; xHat=U*xHat; % Visualise the data, and compare it to the raw data % You should observe that the raw and processed data are of comparable quality. % For comparison, you may wish to generate a PCA reduced image which % retains only 90% of the variance. figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']); display_network(xHat(:,randsel)); figure('name','Raw images'); display_network(x(:,randsel)); %%================================================================ %% Step 4a: Implement PCA with whitening and regularisation % Implement PCA with whitening and regularisation to produce the matrix % xPCAWhite. epsilon = 0.1; xPCAWhite = zeros(size(x)); % -------------------- YOUR CODE HERE -------------------- xPCAWhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x; figure('name','PCA whitened images'); display_network(xPCAWhite(:,randsel)); %%================================================================ %% Step 4b: Check your implementation of PCA whitening % 檢查PCA白化是否規整化。如未規整化,則協方差矩陣是一個恆等矩陣;如已規整化,則其協方差矩陣的對角線上的值接近於1且依次變小。 % Check your implementation of PCA whitening with and without regularisation. % PCA whitening without regularisation results a covariance matrix % that is equal to the identity matrix. PCA whitening with regularisation % results in a covariance matrix with diagonal entries starting close to % 1 and gradually becoming smaller. We will verify these properties here. % Write code to compute the covariance matrix, covar. % % 如未規整化,把epsilon置為0或接近於0,就會得到一條紅線對角穿過藍色背景圖片。 % 如已規整化,就會得到就會得到一條從紅色漸變到藍色的線對角穿過藍色背景的圖片。 % Without regularisation (set epsilon to 0 or close to 0), % when visualised as an image, you should see a red line across the % diagonal (one entries) against a blue background (zero entries). % With regularisation, you should see a red line that slowly turns % blue across the diagonal, corresponding to the one entries slowly % becoming smaller. % -------------------- YOUR CODE HERE -------------------- covar = zeros(size(xPCAWhite, 1)); covar = xPCAWhite * xPCAWhite' / size(xPCAWhite, 2); % Visualise the covariance matrix. You should see a red line across the % diagonal against a blue background. figure('name','Visualisation of covariance matrix'); imagesc(covar); %%================================================================ %% Step 5: Implement ZCA whitening % Now implement ZCA whitening to produce the matrix xZCAWhite. % Visualise the data and compare it to the raw data. You should observe % that whitening results in, among other things, enhanced edges. xZCAWhite = zeros(size(x)); % -------------------- YOUR CODE HERE -------------------- xZCAWhite=U * diag(1./sqrt(diag(S) + epsilon)) * U' * x; % Visualise the data, and compare it to the raw data. % You should observe that the whitened images have enhanced edges. figure('name','ZCA whitened images'); display_network(xZCAWhite(:,randsel)); figure('name','Raw images'); display_network(x(:,randsel));
參考資料:
http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial
Deep Learning三:預處理之主成分分析與白化_總結(斯坦福大學UFLDL深度學習教程)
Deep learning:十二(PCA和whitening在二自然圖像中的練習)