R語言PCA分析


學生身體4 項指標的主成份分析 

excel數據

 

學生序號 x1身高 x2體重 x3胸圍 x4坐高
1 148 41 72 78
2 139 34 71 76
3 160 49 77 86
4 149 36 67 79
5 159 45 80 86
6 142 31 66 76
7 153 43 76 83
8 150 43 77 79
9 151 42 77 80
10 139 31 68 74
11 140 29 64 74
12 161 47 78 84
13 158 49 78 83
14 140 33 67 77
15 137 31 66 73
16 152 35 73 79
17 149 47 82 79
18 145 35 70 77
19 160 47 74 87
20 156 44 78 85
21 151 42 73 82
22 147 38 73 78
23 157 39 68 80
24 147 30 65 75
25 157 48 80 88
26 151 36 74 80
27 144 36 68 76
28 141 30 67 76
29 139 32 68 73
30 148 38 70 78

復制數據到剪切板

數據讀入R軟件

 

> d=read.table("clipboard",header=T)
> d
x1身高 x2體重 x3胸圍 x4坐高
1 148 41 72 78
2 139 34 71 76
3 160 49 77 86
4 149 36 67 79
5 159 45 80 86
6 142 31 66 76
7 153 43 76 83
8 150 43 77 79
9 151 42 77 80
10 139 31 68 74
11 140 29 64 74
12 161 47 78 84
13 158 49 78 83
14 140 33 67 77
15 137 31 66 73
16 152 35 73 79
17 149 47 82 79
18 145 35 70 77
19 160 47 74 87
20 156 44 78 85
21 151 42 73 82
22 147 38 73 78
23 157 39 68 80
24 147 30 65 75
25 157 48 80 88
26 151 36 74 80
27 144 36 68 76
28 141 30 67 76
29 139 32 68 73
30 148 38 70 78

原始數據標准化
> sd=scale(d)

標准化數據展示 復制到剪切板
> sd
x1身高 x2體重 x3胸圍 x4坐高
[1,] -0.1366952 0.35602486 -0.04530114 -0.31999814
[2,] -1.3669516 -0.72752905 -0.23944887 -0.78828809
[3,] 1.5036468 1.59437218 0.92543751 1.55316168
[4,] 0.0000000 -0.41794222 -1.01603978 -0.08585316
[5,] 1.3669516 0.97519852 1.50788070 1.55316168
[6,] -0.9568661 -1.19190930 -1.21018751 -0.78828809
[7,] 0.5467806 0.66561169 0.73128978 0.85072675
[8,] 0.1366952 0.66561169 0.92543751 -0.08585316
[9,] 0.2733903 0.51081827 0.92543751 0.14829182
[10,] -1.3669516 -1.19190930 -0.82189205 -1.25657805
[11,] -1.2302564 -1.50149613 -1.59848297 -1.25657805
[12,] 1.6403419 1.28478535 1.11958524 1.08487173
[13,] 1.2302564 1.59437218 1.11958524 0.85072675
[14,] -1.2302564 -0.88232247 -1.01603978 -0.55414311
[15,] -1.6403419 -1.19190930 -1.21018751 -1.49072302
[16,] 0.4100855 -0.57273564 0.14884659 -0.08585316
[17,] 0.0000000 1.28478535 1.89617616 -0.08585316
[18,] -0.5467806 -0.57273564 -0.43359660 -0.55414311
[19,] 1.5036468 1.28478535 0.34299432 1.78730666
[20,] 0.9568661 0.82040510 1.11958524 1.31901671
[21,] 0.2733903 0.51081827 0.14884659 0.61658177
[22,] -0.2733903 -0.10835539 0.14884659 -0.31999814
[23,] 1.0935613 0.04643802 -0.82189205 0.14829182
[24,] -0.2733903 -1.34670271 -1.40433524 -1.02243307
[25,] 1.0935613 1.43957876 1.50788070 2.02145164
[26,] 0.2733903 -0.41794222 0.34299432 0.14829182
[27,] -0.6834758 -0.41794222 -0.82189205 -0.78828809
[28,] -1.0935613 -1.34670271 -1.01603978 -0.78828809
[29,] -1.3669516 -1.03711588 -0.82189205 -1.49072302
[30,] -0.1366952 -0.10835539 -0.43359660 -0.31999814
attr(,"scaled:center")
x1身高 x2體重 x3胸圍 x4坐高
149.00000 38.70000 72.23333 79.36667
attr(,"scaled:scale")
x1身高 x2體重 x3胸圍 x4坐高
7.315548 6.460223 5.150717 4.270858

讀取標准化數據
> d=read.table("clipboard",header=T)

主成分分析
> pca=princomp(d,cor=T)

碎石圖
> screeplot(pca,type="line",main="碎石圖",lwd=2)
>

主成分1貢獻率較高

求相關矩陣

> dcor=cor(d)

輸出

> dcor
               x1身高       x2體重       x3胸圍       x4坐高
x1身高 1.0000000 0.8631621 0.7321119 0.9204624
x2體重 0.8631621 1.0000000 0.8965058 0.8827313
x3胸圍 0.7321119 0.8965058 1.0000000 0.7828827
x4坐高 0.9204624 0.8827313 0.7828827 1.0000000

相關矩陣的特征向量 特征值
> deig=eigen(dcor)

輸出

>deig
$values
[1] 3.54109800 0.31338316 0.07940895 0.06610989

 

$vectors
[,1] [,2] [,3] [,4]
[1,] -0.4969661 0.5432128 -0.4496271 0.5057471
[2,] -0.5145705 -0.2102455 -0.4623300 -0.6908436
[3,] -0.4809007 -0.7246214 0.1751765 0.4614884
[4,] -0.5069285 0.3682941 0.7439083 -0.2323433

 

輸出特征值
> deig$values
[1] 3.54109800 0.31338316 0.07940895 0.06610989


> sumeigv=sum(deig$values)
> sumeigv
[1] 4

求前2個主成分的累積方差貢獻率
> sum(deig$value[1:2])/4
[1] 0.9636203
> sum(deig$value[1:1])/4
[1] 0.8852745

第一主成份有88.53%的方差貢獻率,前兩個主成份累計貢獻率更高達96.36%,故只需前兩個主成份就能很好地概括這組數據.

輸出前兩個主成分的載荷系數(特征向量)
> pca$loadings[,1:2]
              Comp.1     Comp.2
x1身高 -0.4969661 0.5432128
x2體重 -0.5145705 -0.2102455
x3胸圍 -0.4809007 -0.7246214
x4坐高 -0.5069285 0.3682941

-----------------------------------------

z1=-0.4969661 x1+-0.5145705 x2 +-0.4809007x3+-0.5069285x4

z2=0.5432128 x1+-0.2102455 x2 +-0.7246214x3+0.3682941x4

z= 3.54109800/4 z1 + 0.31338316/4 z2=0.8852745 z1 +0.07834579 Z2

=0.8852745(-0.4969661 x1+-0.5145705 x2 +-0.4809007x3+-0.5069285x4)

+0.07834579 (0.5432128 x1+-0.2102455 x2 +-0.7246214x3+0.3682941x4)

 

-----------------------------------------

計算主成分C1和C2的系數b1 和b2:
> deig$values[1]/4;deig$values[2]/4
[1] 0.8852745
[1] 0.07834579

綜合得分函數C 為:
C=(b1*C1+b2*C2)/(b1+b2)=0.9187*C1+0.0813*C2

輸出前2 個主成分的得分
> s=pca$scores[,1:2]

計算綜合得分
> c=s[1:30,1]*0.918696+s[1:30,2]*0.0813

> s[1:30,1]
[1] 0.06990950 1.59526340 -2.84793151 0.75996988 -2.73966777 2.10583168
[7] -1.42105591 -0.82583977 -0.93464402 2.36463820 2.83741916 -2.60851224
[13] -2.44253342 1.86630669 2.81347421 0.06392983 -1.55561022 1.07392251
[19] -2.52174212 -2.14072377 -0.79624422 0.28708321 -0.25151075 2.05706032
[25] -3.08596855 -0.16367555 1.37265053 2.16097778 2.40434827 0.50287468

輸出綜合得分信息
> cbind(s,c)
          Comp.1       Comp.2           c
[1,] 0.06990950 -0.23813701 0.04486504
[2,] 1.59526340 -0.71847399 1.40715017
[3,] -2.84793151 0.38956679 -2.58471151
[4,] 0.75996988 0.80604335 0.76371262
[5,] -2.73966777 0.01718087 -2.51552502
[6,] 2.10583168 0.32284393 1.96086635
[7,] -1.42105591 -0.06053165 -1.31043961
[8,] -0.82583977 -0.78102576 -0.82219309
[9,] -0.93464402 -0.58469242 -0.90618922
[10,] 2.36463820 -0.36532199 2.14268298
[11,] 2.83741916 0.34875841 2.63507969
[12,] -2.60851224 0.21278728 -2.37913015
[13,] -2.44253342 -0.16769496 -2.25757928
[14,] 1.86630669 0.05021384 1.71865087
[15,] 2.81347421 -0.31790107 2.55888214
[16,] 0.06392983 0.20718448 0.07557617
[17,] -1.55561022 -1.70439674 -1.56770034
[18,] 1.07392251 -0.06763418 0.98110965
[19,] -2.52174212 0.97274301 -2.23763039
[20,] -2.14072377 0.02217881 -1.96487123
[21,] -0.79624422 0.16307887 -0.71824807
[22,] 0.28708321 -0.35744666 0.23468178
[23,] -0.25151075 1.25555188 -0.12898555
[24,] 2.05706032 0.78894494 1.95395431
[25,] -3.08596855 -0.05775318 -2.83976229
[26,] -0.16367555 0.04317932 -0.14685759
[27,] 1.37265053 0.02220972 1.26285420
[28,] 2.16097778 0.13733233 1.99644676
[29,] 2.40434827 -0.48613137 2.16934265
[30,] 0.50287468 0.14734317 0.47396795
>

 排序

[11,] 2.83741916 0.34875841 2.63507969
[15,] 2.81347421 -0.31790107 2.55888214
[29,] 2.40434827 -0.48613137 2.16934265
[10,] 2.3646382 -0.36532199 2.14268298
[28,] 2.16097778 0.13733233 1.99644676
[6,] 2.10583168 0.32284393 1.96086635
[24,] 2.05706032 0.78894494 1.95395431
[14,] 1.86630669 0.05021384 1.71865087
[2,] 1.5952634 -0.71847399 1.40715017
[27,] 1.37265053 0.02220972 1.2628542
[18,] 1.07392251 -0.06763418 0.98110965
[4,] 0.75996988 0.80604335 0.76371262
[30,] 0.50287468 0.14734317 0.47396795
[22,] 0.28708321 -0.35744666 0.23468178
[16,] 0.06392983 0.20718448 0.07557617
[1,] 0.0699095 -0.23813701 0.04486504
[23,] -0.25151075 1.25555188 -0.12898555
[26,] -0.16367555 0.04317932 -0.14685759
[21,] -0.79624422 0.16307887 -0.71824807
[8,] -0.82583977 -0.78102576 -0.82219309
[9,] -0.93464402 -0.58469242 -0.90618922
[7,] -1.42105591 -0.06053165 -1.31043961
[17,] -1.55561022 -1.70439674 -1.56770034
[20,] -2.14072377 0.02217881 -1.96487123
[19,] -2.52174212 0.97274301 -2.23763039
[13,] -2.44253342 -0.16769496 -2.25757928
[12,] -2.60851224 0.21278728 -2.37913015
[5,] -2.73966777 0.01718087 -2.51552502
[3,] -2.84793151 0.38956679 -2.58471151
[25,] -3.08596855 -0.05775318 -2.83976229


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM