MATLAB實例:PCA(主成成分分析)詳解


MATLAB實例:PCA(主成成分分析)詳解

作者:凱魯嘎吉 - 博客園 http://www.cnblogs.com/kailugaji/

1. 主成成分分析

 

 

2. MATLAB解釋

詳細信息請看:Principal component analysis of raw data - mathworks

[coeff,score,latent,tsquared,explained,mu] = pca(X)

coeff = pca(X) returns the principal component coefficients, also known as loadings, for the n-by-p data matrix X. Rows of X correspond to observations and columns correspond to variables.

The coefficient matrix is p-by-p.

Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance.

By default, pca centers the data and uses the singular value decomposition (SVD) algorithm.

coeff = pca(X,Name,Value) returns any of the output arguments in the previous syntaxes using additional options for computation and handling of special data types, specified by one or more Name,Value pair arguments.

For example, you can specify the number of principal components pca returns or an algorithm other than SVD to use.

[coeff,score,latent] = pca(___) also returns the principal component scores in score and the principal component variances in latent.

You can use any of the input arguments in the previous syntaxes.

Principal component scores are the representations of X in the principal component space. Rows of score correspond to observations, and columns correspond to components.

The principal component variances are the eigenvalues of the covariance matrix of X.

[coeff,score,latent,tsquared] = pca(___) also returns the Hotelling's T-squared statistic for each observation in X.

[coeff,score,latent,tsquared,explained,mu] = pca(___) also returns explained, the percentage of the total variance explained by each principal component and mu, the estimated mean of each variable in X.

coeff: X矩陣所對應的協方差矩陣的所有特征向量組成的矩陣,即變換矩陣或投影矩陣,coeff每列代表一個特征值所對應的特征向量,列的排列方式對應着特征值從大到小排序。

source: 表示原數據在各主成分向量上的投影。但注意:是原數據經過中心化后在主成分向量上的投影。

latent: 是一個列向量,主成分方差,也就是各特征向量對應的特征值,按照從大到小進行排列。

tsquared: X中每個觀察值的Hotelling的T平方統計量。Hotelling的T平方統計量(T-Squared Statistic)是每個觀察值的標准化分數的平方和,以列向量的形式返回。

explained: 由每個主成分解釋的總方差的百分比,每一個主成分所貢獻的比例。explained = 100*latent/sum(latent)。

mu: 每個變量X的估計平均值。

3. MATLAB程序

3.1 方法一:指定降維后低維空間的維度k

function [data_PCA, COEFF, sum_explained]=pca_demo_1(data,k)
% k:前k個主成分
data=zscore(data);  %歸一化數據
[COEFF,SCORE,latent,tsquared,explained,mu]=pca(data);
latent1=100*latent/sum(latent);%將latent特征值總和統一為100,便於觀察貢獻率
data= bsxfun(@minus,data,mean(data,1));
data_PCA=data*COEFF(:,1:k);
pareto(latent1);%調用matla畫圖 pareto僅繪制累積分布的前95%,因此y中的部分元素並未顯示
xlabel('Principal Component');
ylabel('Variance Explained (%)');
% 圖中的線表示的累積變量解釋程度
print(gcf,'-dpng','PCA.png');
sum_explained=sum(explained(1:k));

3.2 方法二:指定貢獻率percent_threshold

function [data_PCA, COEFF, sum_explained, n]=pca_demo_2(data)
%用percent_threshold決定保留xx%的貢獻率
percent_threshold=95;   %百分比閾值,用於決定保留的主成分個數;
data=zscore(data);  %歸一化數據
[COEFF,SCORE,latent,tsquared,explained,mu]=pca(data);
latent1=100*latent/sum(latent);%將latent特征值總和統一為100,便於觀察貢獻率
A=length(latent1);
percents=0;                          %累積百分比
for n=1:A
    percents=percents+latent1(n);
    if percents>percent_threshold
        break;
    end
end
data= bsxfun(@minus,data,mean(data,1));
data_PCA=data*COEFF(:,1:n);
pareto(latent1);%調用matla畫圖 pareto僅繪制累積分布的前95%,因此y中的部分元素並未顯示
xlabel('Principal Component');
ylabel('Variance Explained (%)');
% 圖中的線表示的累積變量解釋程度
print(gcf,'-dpng','PCA.png');
sum_explained=sum(explained(1:n));

4. 結果

數據來源於MATLAB自帶的數據集hald

>> load hald
>> [data_PCA, COEFF, sum_explained]=pca_demo_1(ingredients,2)

data_PCA =

  -1.467237802258083  -1.903035708425560
  -2.135828746398875  -0.238353702721984
   1.129870473833422  -0.183877154192583
  -0.659895489750766  -1.576774209965747
   0.358764556470351  -0.483537878558994
   0.966639639692207  -0.169944028103651
   0.930705117077330   2.134816511997477
  -2.232137996884836   0.691670682875924
  -0.351515595975561   1.432245069443404
   1.662543014130206  -1.828096643220118
  -1.640179952926685   1.295112751426928
   1.692594091826333   0.392248821530480
   1.745678691164958   0.437525487914425


COEFF =

   0.475955172748970  -0.508979384806410   0.675500187964285   0.241052184051094
   0.563870242191994   0.413931487136985  -0.314420442819292   0.641756074427213
  -0.394066533909303   0.604969078471439   0.637691091806566   0.268466110294533
  -0.547931191260863  -0.451235109330016  -0.195420962611708   0.676734019481284


sum_explained =

  95.294252628439153

>> [data_PCA, COEFF, sum_explained, n]=pca_demo_2(ingredients)

data_PCA =

  -1.467237802258083  -1.903035708425560
  -2.135828746398875  -0.238353702721984
   1.129870473833422  -0.183877154192583
  -0.659895489750766  -1.576774209965747
   0.358764556470351  -0.483537878558994
   0.966639639692207  -0.169944028103651
   0.930705117077330   2.134816511997477
  -2.232137996884836   0.691670682875924
  -0.351515595975561   1.432245069443404
   1.662543014130206  -1.828096643220118
  -1.640179952926685   1.295112751426928
   1.692594091826333   0.392248821530480
   1.745678691164958   0.437525487914425


COEFF =

   0.475955172748970  -0.508979384806410   0.675500187964285   0.241052184051094
   0.563870242191994   0.413931487136985  -0.314420442819292   0.641756074427213
  -0.394066533909303   0.604969078471439   0.637691091806566   0.268466110294533
  -0.547931191260863  -0.451235109330016  -0.195420962611708   0.676734019481284


sum_explained =

  95.294252628439153


n =

     2

5. 參考

[1] 周志華,《機器學習》.

[2] MATLAB實例:PCA降維


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM