主成分分析(PCA)是用來提升無監督特征學習速度的數據降維算法。看過下文大致可以知道,PCA本質是對角化協方差矩陣,目的是讓維度之間的相關性最小(降噪),保留下來的維度能量最大(去冗余),PCA在圖像數據的降維上很實用,因為圖像數據相鄰元素的相關性是很高的。
為了方便解釋,我們以二維數據降一維為例(實際應用可能需要把數據從256降到50):
需要注意的是,兩個特征值經過了預處理,其均值為零,方差相等,下文會解釋其原因,不過在圖像處理上,方差的預處理過程就沒必要了。
從上圖可以看出,數據主要向兩個方向變化,即u1 u2,PCA算法需要用到協方差的運算,(方差是同一類或者說同一維數據的離散性,而協方差則是不同維度數據的相關性,協方差為0說明二者獨立不相關,為負就是負相關,為正就是正相關)。協方差的公式為:
PCA用到的公式為:
這也就是為什么需要將數據均值預處理為零的原因,注意等號左邊的不是求和符號,讀作sigma,是協方差矩陣的標准符號。
可以證明數據變化的主方向u1就是協方差矩陣的主特征向量,而u2就是次特征向量(原理有興趣的同學請看這里)。求協方差矩陣以及其特征向量的過程在各種線代庫都有提供,不需要手動實現,Matlab中可以使用[U,S,D]=svd(sigma)(簡單解釋下sigma做參數的原因,PCA關心的是矩陣x的變化方向,其轉化結果要具有最大方差,也就可以通過求協方差的變化方向來確定),svd函數是用來求矩陣奇異值分解的,這里用來求特征值大致是因為實數域對稱矩陣的特征值的絕對值和奇異值相等。
這里我們是要降為一維,所以就使用u1就好,也就是說,降到幾維就使用前幾個特征向量。
上圖對應了n個特征向量,特征值的標准符號是 ,在主成分個數選擇中會用到,其公式為:
。
旋轉
對於本例來說,旋轉數據就是將數據重新放在以 和 為軸的坐標系里,表達式為:
旋轉后的坐標如下圖:
矩陣U有正交性,滿足 ,所以要將旋轉后的數據還原,只需左乘U即可:
, 驗證:
。
降維
有了上面的鋪墊,降維操作就容易理解了,本例的公式為:
推廣到一般情況就是取矩陣U的前K個特征向量就可以降到K維了,可以這樣看,PCA算法的特征向量越靠后對數據的影響越小,降維就是取其中靠前的重要數據,從而達到下圖的效果:
降維后的坐標系:
主成分個數選擇
主成分個數K的選擇顯然是最重要的了,我們的做法是在給定一個百分比,可以是99%或者95%等,自己定,然后計算前K個特征值的和占所有特征值的百分比,如果超過了給定的百分比則可以選定這K個值作為主成分個數。
對圖像數據使用PCA
正如之前提及的,使用PCA算法通常需要對數據做預處理使每個特征的均值為零、方差相近。但是對於大部分圖像類型,我們不需要預處理過程,因為理論上圖像任意部分的統計性質都和其它部分相同,圖像的這種特性被稱作平穩性。
總的來說,即使不對圖像做方差歸一化操作,也可以滿足不同特征的方差值彼此相近(對於音頻如聲譜,文本如詞袋向量,通常也不進行方差歸一化)。實際上,PCA對於輸入數據具有縮放不變性,無論輸入數據值被 如何放大縮小,返回的特征向量都一樣。
至於均值規整為0,則要依賴於不同的應用處理,例如物體檢測,我們通常並不關心亮度這個特征,因為亮度並不影響物體檢測的結果,所以這里的規整操作就是把將樣本每個亮度值減去樣本的平均亮度值,如下圖公式:
, for all
請注意:1)對每個輸入圖像塊都要單獨執行上面兩個步驟,2)這里的
是指圖像塊
的平均亮度值。尤其需要注意的是,這和為每個像素
單獨估算均值是兩個完全不同的概念。
如果處理的圖像並非自然圖像(比如,手寫文字,或者白背景正中擺放單獨物體),其他規整化操作就值得考慮了,而哪種做法最合適也取決於具體應用場合。但對自然圖像而言,對每幅圖像進行上述的零均值規整化,是默認而合理的處理。
白化
白化是數據預處理過程,用來降低輸入的冗余性,起作用可以概括為:
- 減少特征之間的相關性
- 特征方差相近
依然以二維的例子來解釋,實際上前文旋轉操作 的操作已經消除了輸入特征的相關性,再看一下之前的坐標系圖:
其協方差矩陣為:
(注意,本文中所有的協方差計算,輸入特征的均值默認為零,當然,即使均值不為零,下面介紹的理論依然通用)
觀察上面的矩陣,對角元素的值為lambda1和lambda2以及次對角元素為0絕非偶然,由次對角0知道 和
不相關,滿足降低不同維度的相關性這一條。
為了使輸入特征具有單位方差,我們可以做如下操作以實現白化,這里lambda實際上就等於旋轉后的矩陣的方差值:
白化后的坐標圖:
這個操作之后,協方差矩陣就變為了單位矩陣I,此時的 就稱為PCA白化后的版本。
注意,這里並沒有對數據進行降維,只是做了旋轉操作,如果想結合降維,只需保留xPCAWhite的前K項即可。
ZCA白化
首先應該知道,如果R是任意正交矩陣,那么 仍然具有單位協方差。在ZCA白化中,令R=U(由上文知,U本身就是正交矩陣),其定義為:
也就是在原來PCA結果上(不降維)左乘一個特征向量矩陣,ZCA白化的坐標系:
可以證明,對所有可能的R,這種旋轉使得 盡可能地接近原始數據x。
當使用ZCA白化時,我們通常保留數據的全部n個維度,不嘗試去降低他的維度。
正則化
實踐中需要實現PCA白化或ZCA白化時,有時一些特征值 在數值上接近於0,這樣在縮放步驟時我們除以
將導致除以一個接近0的值;這可能使數據上溢 (賦為大數值)或造成數值不穩定。因而在實踐中,我們使用少量的正則化實現這個縮放過程,即在取平方根和倒數之前給特征值加上一個很小的常數
:
當 x 在區間 上時, 一般取值為
。
對圖像來說, 這里加上 ,對輸入圖像也有一些平滑(或低通濾波)的作用。這樣處理還能消除在圖像的像素信息獲取過程中產生的噪聲,改善學習到的特征(細節超出了本文的范圍)。
ZCA 白化是一種數據預處理方法,它將數據從 映射到 。 事實證明這也是一種生物眼睛(視網膜)處理圖像的粗糙模型。具體而言,當你的眼睛感知圖像時,由於一幅圖像中相鄰的部分在亮度上十分相關,大多數臨近的“像素”在眼中被感知為相近的值。因此,如果人眼需要分別傳輸每個像素值(通過視覺神經)到大腦中,會非常不划算。取而代之的是,視網膜進行一個與ZCA中相似的去相關操作 (這是由視網膜上的ON-型和OFF-型光感受器細胞將光信號轉變為神經信號完成的)。由此得到對輸入圖像的更低冗余的表示,並將它傳輸到大腦。
練習
pca_2d
close all %%================================================================ %% Step 0: Load data % We have provided the code to load data from pcaData.txt into x. % x is a 2 * 45 matrix, where the kth column x(:,k) corresponds to % the kth data point.Here we provide the code to load natural image data into x. % You do not need to change the code below. x = load('pcaData.txt','-ascii'); figure(1); scatter(x(1, :), x(2, :)); title('Raw data'); %%================================================================ %% Step 1a: Implement PCA to obtain U % Implement PCA to obtain the rotation matrix U, which is the eigenbasis % sigma. % -------------------- YOUR CODE HERE -------------------- u = zeros(size(x, 1)); % You need to compute this conv = (1/size(x,1))*(x'*x);
[u,s,d] = svd(conv); % -------------------------------------------------------- hold on plot([0 u(1,1)], [0 u(2,1)]); plot([0 u(1,2)], [0 u(2,2)]); scatter(x(1, :), x(2, :)); hold off %%================================================================ %% Step 1b: Compute xRot, the projection on to the eigenbasis % Now, compute xRot by projecting the data on to the basis defined % by U. Visualize the points by performing a scatter plot. % -------------------- YOUR CODE HERE -------------------- xRot = zeros(size(x)); % You need to compute this xRot = u'*x; % -------------------------------------------------------- % Visualise the covariance matrix. You should see a line across the % diagonal against a blue background. figure(2); scatter(xRot(1, :), xRot(2, :)); title('xRot'); %%================================================================ %% Step 2: Reduce the number of dimensions from 2 to 1. % Compute xRot again (this time projecting to 1 dimension). % Then, compute xHat by projecting the xRot back onto the original axes % to see the effect of dimension reduction % -------------------- YOUR CODE HERE -------------------- k = 1; % Use k = 1 and project the data onto the first eigenbasis xHat = zeros(size(x)); % You need to compute this xRot = u(:,1)'*x; xHat = u*[xRot;zeros(1,size(x,2))]; % -------------------------------------------------------- figure(3); scatter(xHat(1, :), xHat(2, :)); title('xHat'); %%================================================================ %% Step 3: PCA Whitening % Complute xPCAWhite and plot the results. epsilon = 1e-5; % -------------------- YOUR CODE HERE -------------------- xPCAWhite = zeros(size(x)); % You need to compute this
xRot = u'*x; xPCAWhite = diag(1./sqrt(diag(s) + epsilon)) * xRot; % -------------------------------------------------------- figure(4); scatter(xPCAWhite(1, :), xPCAWhite(2, :)); title('xPCAWhite'); %%================================================================ %% Step 3: ZCA Whitening % Complute xZCAWhite and plot the results. % -------------------- YOUR CODE HERE -------------------- xZCAWhite = zeros(size(x)); % You need to compute this xZCAWhite = u*xPCAWhite; % -------------------------------------------------------- figure(5); scatter(xZCAWhite(1, :), xZCAWhite(2, :)); title('xZCAWhite'); %% Congratulations! When you have reached this point, you are done! % You can now move onto the next PCA exercise. :)
PCA and Whitening
%%================================================================ %% Step 0a: Load data % Here we provide the code to load natural image data into x. % x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to % the raw image data from the kth 12x12 image patch sampled. % You do not need to change the code below. x = sampleIMAGESRAW(); figure('name','Raw images'); randsel = randi(size(x,2),200,1); % A random selection of samples for visualization display_network(x(:,randsel)); %%================================================================ %% Step 0b: Zero-mean the data (by row) % You can make use of the mean and repmat/bsxfun functions. % -------------------- YOUR CODE HERE -------------------- x = bsxfun(@minus,x,mean(x,1)); % use bsxfun %x = x-repmat(mean(x,1),size(x,1),1); % use repmat %%================================================================ %% Step 1a: Implement PCA to obtain xRot % Implement PCA to obtain xRot, the matrix in which the data is expressed % with respect to the eigenbasis of sigma, which is the matrix U. % -------------------- YOUR CODE HERE -------------------- xRot = zeros(size(x)); % You need to compute this [U,S,D] = svd(x*x'*(1/size(x,2))); xRot = U'*x; %%================================================================ %% Step 1b: Check your implementation of PCA % The covariance matrix for the data expressed with respect to the basis U % should be a diagonal matrix with non-zero entries only along the main % diagonal. We will verify this here. % Write code to compute the covariance matrix, covar. % When visualised as an image, you should see a straight line across the % diagonal (non-zero entries) against a blue background (zero entries). % -------------------- YOUR CODE HERE -------------------- covar = zeros(size(x, 1)); % You need to compute this covar = xRot*xRot'*(1/size(xRot,2)); % equal to S % Visualise the covariance matrix. You should see a line across the % diagonal against a blue background. figure('name','Visualisation of covariance matrix'); imagesc(covar); %%================================================================ %% Step 2: Find k, the number of components to retain % Write code to determine k, the number of components to retain in order % to retain at least 99% of the variance. % -------------------- YOUR CODE HERE -------------------- k = 1; % Set k accordingly diag_vec = diag(S); k=length(diag_vec((cumsum(diag_vec)./sum(diag_vec))<0.99)); %%================================================================ %% Step 3: Implement PCA with dimension reduction % Now that you have found k, you can reduce the dimension of the data by % discarding the remaining dimensions. In this way, you can represent the % data in k dimensions instead of the original 144, which will save you % computational time when running learning algorithms on the reduced % representation. % % Following the dimension reduction, invert the PCA transformation to produce % the matrix xHat, the dimension-reduced data with respect to the original basis. % Visualise the data and compare it to the raw data. You will observe that % there is little loss due to throwing away the principal components that % correspond to dimensions with low variation. % -------------------- YOUR CODE HERE -------------------- xHat = zeros(size(x)); % You need to compute this xReduction = U(:,1:k)' * x; xHat(1:k,:)=xReduction; xHat=U*xHat; % Visualise the data, and compare it to the raw data % You should observe that the raw and processed data are of comparable quality. % For comparison, you may wish to generate a PCA reduced image which % retains only 90% of the variance. figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']); display_network(xHat(:,randsel)); figure('name','Raw images'); display_network(x(:,randsel)); %%================================================================ %% Step 4a: Implement PCA with whitening and regularisation % Implement PCA with whitening and regularisation to produce the matrix % xPCAWhite. epsilon = 0.1; xPCAWhite = zeros(size(x)); % -------------------- YOUR CODE HERE -------------------- xPCAWhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x; %%================================================================ %% Step 4b: Check your implementation of PCA whitening % Check your implementation of PCA whitening with and without regularisation. % PCA whitening without regularisation results a covariance matrix % that is equal to the identity matrix. PCA whitening with regularisation % results in a covariance matrix with diagonal entries starting close to % 1 and gradually becoming smaller. We will verify these properties here. % Write code to compute the covariance matrix, covar. % % Without regularisation (set epsilon to 0 or close to 0), % when visualised as an image, you should see a red line across the % diagonal (one entries) against a blue background (zero entries). % With regularisation, you should see a red line that slowly turns % blue across the diagonal, corresponding to the one entries slowly % becoming smaller. % -------------------- YOUR CODE HERE -------------------- covar = zeros(size(xPCAWhite, 1)); covar = xPCAWhite*xPCAWhite'*(1./size(xPCAWhite,2)); % Visualise the covariance matrix. You should see a red line across the % diagonal against a blue background. figure('name','Visualisation of covariance matrix'); imagesc(covar); %%================================================================ %% Step 5: Implement ZCA whitening % Now implement ZCA whitening to produce the matrix xZCAWhite. % Visualise the data and compare it to the raw data. You should observe % that whitening results in, among other things, enhanced edges. xZCAWhite = zeros(size(x)); % -------------------- YOUR CODE HERE -------------------- xZCAWhite=U * diag(1./sqrt(diag(S) + epsilon)) * U' * x; % Visualise the data, and compare it to the raw data. % You should observe that the whitened images have enhanced edges. figure('name','ZCA whitened images'); display_network(xZCAWhite(:,randsel)); figure('name','Raw images'); display_network(x(:,randsel));