學生身體4 項指標的主成份分析
excel數據
學生序號 | x1身高 | x2體重 | x3胸圍 | x4坐高 |
1 | 148 | 41 | 72 | 78 |
2 | 139 | 34 | 71 | 76 |
3 | 160 | 49 | 77 | 86 |
4 | 149 | 36 | 67 | 79 |
5 | 159 | 45 | 80 | 86 |
6 | 142 | 31 | 66 | 76 |
7 | 153 | 43 | 76 | 83 |
8 | 150 | 43 | 77 | 79 |
9 | 151 | 42 | 77 | 80 |
10 | 139 | 31 | 68 | 74 |
11 | 140 | 29 | 64 | 74 |
12 | 161 | 47 | 78 | 84 |
13 | 158 | 49 | 78 | 83 |
14 | 140 | 33 | 67 | 77 |
15 | 137 | 31 | 66 | 73 |
16 | 152 | 35 | 73 | 79 |
17 | 149 | 47 | 82 | 79 |
18 | 145 | 35 | 70 | 77 |
19 | 160 | 47 | 74 | 87 |
20 | 156 | 44 | 78 | 85 |
21 | 151 | 42 | 73 | 82 |
22 | 147 | 38 | 73 | 78 |
23 | 157 | 39 | 68 | 80 |
24 | 147 | 30 | 65 | 75 |
25 | 157 | 48 | 80 | 88 |
26 | 151 | 36 | 74 | 80 |
27 | 144 | 36 | 68 | 76 |
28 | 141 | 30 | 67 | 76 |
29 | 139 | 32 | 68 | 73 |
30 | 148 | 38 | 70 | 78 |
復制數據到剪切板
數據讀入R軟件
> d=read.table("clipboard",header=T)
> d
x1身高 x2體重 x3胸圍 x4坐高
1 148 41 72 78
2 139 34 71 76
3 160 49 77 86
4 149 36 67 79
5 159 45 80 86
6 142 31 66 76
7 153 43 76 83
8 150 43 77 79
9 151 42 77 80
10 139 31 68 74
11 140 29 64 74
12 161 47 78 84
13 158 49 78 83
14 140 33 67 77
15 137 31 66 73
16 152 35 73 79
17 149 47 82 79
18 145 35 70 77
19 160 47 74 87
20 156 44 78 85
21 151 42 73 82
22 147 38 73 78
23 157 39 68 80
24 147 30 65 75
25 157 48 80 88
26 151 36 74 80
27 144 36 68 76
28 141 30 67 76
29 139 32 68 73
30 148 38 70 78
原始數據標准化
> sd=scale(d)
標准化數據展示 復制到剪切板
> sd
x1身高 x2體重 x3胸圍 x4坐高
[1,] -0.1366952 0.35602486 -0.04530114 -0.31999814
[2,] -1.3669516 -0.72752905 -0.23944887 -0.78828809
[3,] 1.5036468 1.59437218 0.92543751 1.55316168
[4,] 0.0000000 -0.41794222 -1.01603978 -0.08585316
[5,] 1.3669516 0.97519852 1.50788070 1.55316168
[6,] -0.9568661 -1.19190930 -1.21018751 -0.78828809
[7,] 0.5467806 0.66561169 0.73128978 0.85072675
[8,] 0.1366952 0.66561169 0.92543751 -0.08585316
[9,] 0.2733903 0.51081827 0.92543751 0.14829182
[10,] -1.3669516 -1.19190930 -0.82189205 -1.25657805
[11,] -1.2302564 -1.50149613 -1.59848297 -1.25657805
[12,] 1.6403419 1.28478535 1.11958524 1.08487173
[13,] 1.2302564 1.59437218 1.11958524 0.85072675
[14,] -1.2302564 -0.88232247 -1.01603978 -0.55414311
[15,] -1.6403419 -1.19190930 -1.21018751 -1.49072302
[16,] 0.4100855 -0.57273564 0.14884659 -0.08585316
[17,] 0.0000000 1.28478535 1.89617616 -0.08585316
[18,] -0.5467806 -0.57273564 -0.43359660 -0.55414311
[19,] 1.5036468 1.28478535 0.34299432 1.78730666
[20,] 0.9568661 0.82040510 1.11958524 1.31901671
[21,] 0.2733903 0.51081827 0.14884659 0.61658177
[22,] -0.2733903 -0.10835539 0.14884659 -0.31999814
[23,] 1.0935613 0.04643802 -0.82189205 0.14829182
[24,] -0.2733903 -1.34670271 -1.40433524 -1.02243307
[25,] 1.0935613 1.43957876 1.50788070 2.02145164
[26,] 0.2733903 -0.41794222 0.34299432 0.14829182
[27,] -0.6834758 -0.41794222 -0.82189205 -0.78828809
[28,] -1.0935613 -1.34670271 -1.01603978 -0.78828809
[29,] -1.3669516 -1.03711588 -0.82189205 -1.49072302
[30,] -0.1366952 -0.10835539 -0.43359660 -0.31999814
attr(,"scaled:center")
x1身高 x2體重 x3胸圍 x4坐高
149.00000 38.70000 72.23333 79.36667
attr(,"scaled:scale")
x1身高 x2體重 x3胸圍 x4坐高
7.315548 6.460223 5.150717 4.270858
讀取標准化數據
> d=read.table("clipboard",header=T)
主成分分析
> pca=princomp(d,cor=T)
碎石圖
> screeplot(pca,type="line",main="碎石圖",lwd=2)
>
主成分1貢獻率較高
求相關矩陣
> dcor=cor(d)
輸出
> dcor
x1身高 x2體重 x3胸圍 x4坐高
x1身高 1.0000000 0.8631621 0.7321119 0.9204624
x2體重 0.8631621 1.0000000 0.8965058 0.8827313
x3胸圍 0.7321119 0.8965058 1.0000000 0.7828827
x4坐高 0.9204624 0.8827313 0.7828827 1.0000000
求相關矩陣的特征向量 特征值
> deig=eigen(dcor)
輸出
>deig
$values
[1] 3.54109800 0.31338316 0.07940895 0.06610989
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.4969661 0.5432128 -0.4496271 0.5057471
[2,] -0.5145705 -0.2102455 -0.4623300 -0.6908436
[3,] -0.4809007 -0.7246214 0.1751765 0.4614884
[4,] -0.5069285 0.3682941 0.7439083 -0.2323433
輸出特征值
> deig$values
[1] 3.54109800 0.31338316 0.07940895 0.06610989
> sumeigv=sum(deig$values)
> sumeigv
[1] 4
求前2個主成分的累積方差貢獻率
> sum(deig$value[1:2])/4
[1] 0.9636203
> sum(deig$value[1:1])/4
[1] 0.8852745
第一主成份有88.53%的方差貢獻率,前兩個主成份累計貢獻率更高達96.36%,故只需前兩個主成份就能很好地概括這組數據.
輸出前兩個主成分的載荷系數(特征向量)
> pca$loadings[,1:2]
Comp.1 Comp.2
x1身高 -0.4969661 0.5432128
x2體重 -0.5145705 -0.2102455
x3胸圍 -0.4809007 -0.7246214
x4坐高 -0.5069285 0.3682941
-----------------------------------------
z1=-0.4969661 x1+-0.5145705 x2 +-0.4809007x3+-0.5069285x4
z2=0.5432128 x1+-0.2102455 x2 +-0.7246214x3+0.3682941x4
z= 3.54109800/4 z1 + 0.31338316/4 z2=0.8852745 z1 +0.07834579 Z2
=0.8852745(-0.4969661 x1+-0.5145705 x2 +-0.4809007x3+-0.5069285x4)
+0.07834579 (0.5432128 x1+-0.2102455 x2 +-0.7246214x3+0.3682941x4)
-----------------------------------------
計算主成分C1和C2的系數b1 和b2:
> deig$values[1]/4;deig$values[2]/4
[1] 0.8852745
[1] 0.07834579
綜合得分函數C 為:
C=(b1*C1+b2*C2)/(b1+b2)=0.9187*C1+0.0813*C2
輸出前2 個主成分的得分
> s=pca$scores[,1:2]
計算綜合得分
> c=s[1:30,1]*0.918696+s[1:30,2]*0.0813
> s[1:30,1]
[1] 0.06990950 1.59526340 -2.84793151 0.75996988 -2.73966777 2.10583168
[7] -1.42105591 -0.82583977 -0.93464402 2.36463820 2.83741916 -2.60851224
[13] -2.44253342 1.86630669 2.81347421 0.06392983 -1.55561022 1.07392251
[19] -2.52174212 -2.14072377 -0.79624422 0.28708321 -0.25151075 2.05706032
[25] -3.08596855 -0.16367555 1.37265053 2.16097778 2.40434827 0.50287468
輸出綜合得分信息
> cbind(s,c)
Comp.1 Comp.2 c
[1,] 0.06990950 -0.23813701 0.04486504
[2,] 1.59526340 -0.71847399 1.40715017
[3,] -2.84793151 0.38956679 -2.58471151
[4,] 0.75996988 0.80604335 0.76371262
[5,] -2.73966777 0.01718087 -2.51552502
[6,] 2.10583168 0.32284393 1.96086635
[7,] -1.42105591 -0.06053165 -1.31043961
[8,] -0.82583977 -0.78102576 -0.82219309
[9,] -0.93464402 -0.58469242 -0.90618922
[10,] 2.36463820 -0.36532199 2.14268298
[11,] 2.83741916 0.34875841 2.63507969
[12,] -2.60851224 0.21278728 -2.37913015
[13,] -2.44253342 -0.16769496 -2.25757928
[14,] 1.86630669 0.05021384 1.71865087
[15,] 2.81347421 -0.31790107 2.55888214
[16,] 0.06392983 0.20718448 0.07557617
[17,] -1.55561022 -1.70439674 -1.56770034
[18,] 1.07392251 -0.06763418 0.98110965
[19,] -2.52174212 0.97274301 -2.23763039
[20,] -2.14072377 0.02217881 -1.96487123
[21,] -0.79624422 0.16307887 -0.71824807
[22,] 0.28708321 -0.35744666 0.23468178
[23,] -0.25151075 1.25555188 -0.12898555
[24,] 2.05706032 0.78894494 1.95395431
[25,] -3.08596855 -0.05775318 -2.83976229
[26,] -0.16367555 0.04317932 -0.14685759
[27,] 1.37265053 0.02220972 1.26285420
[28,] 2.16097778 0.13733233 1.99644676
[29,] 2.40434827 -0.48613137 2.16934265
[30,] 0.50287468 0.14734317 0.47396795
>
排序
[11,] | 2.83741916 | 0.34875841 | 2.63507969 |
[15,] | 2.81347421 | -0.31790107 | 2.55888214 |
[29,] | 2.40434827 | -0.48613137 | 2.16934265 |
[10,] | 2.3646382 | -0.36532199 | 2.14268298 |
[28,] | 2.16097778 | 0.13733233 | 1.99644676 |
[6,] | 2.10583168 | 0.32284393 | 1.96086635 |
[24,] | 2.05706032 | 0.78894494 | 1.95395431 |
[14,] | 1.86630669 | 0.05021384 | 1.71865087 |
[2,] | 1.5952634 | -0.71847399 | 1.40715017 |
[27,] | 1.37265053 | 0.02220972 | 1.2628542 |
[18,] | 1.07392251 | -0.06763418 | 0.98110965 |
[4,] | 0.75996988 | 0.80604335 | 0.76371262 |
[30,] | 0.50287468 | 0.14734317 | 0.47396795 |
[22,] | 0.28708321 | -0.35744666 | 0.23468178 |
[16,] | 0.06392983 | 0.20718448 | 0.07557617 |
[1,] | 0.0699095 | -0.23813701 | 0.04486504 |
[23,] | -0.25151075 | 1.25555188 | -0.12898555 |
[26,] | -0.16367555 | 0.04317932 | -0.14685759 |
[21,] | -0.79624422 | 0.16307887 | -0.71824807 |
[8,] | -0.82583977 | -0.78102576 | -0.82219309 |
[9,] | -0.93464402 | -0.58469242 | -0.90618922 |
[7,] | -1.42105591 | -0.06053165 | -1.31043961 |
[17,] | -1.55561022 | -1.70439674 | -1.56770034 |
[20,] | -2.14072377 | 0.02217881 | -1.96487123 |
[19,] | -2.52174212 | 0.97274301 | -2.23763039 |
[13,] | -2.44253342 | -0.16769496 | -2.25757928 |
[12,] | -2.60851224 | 0.21278728 | -2.37913015 |
[5,] | -2.73966777 | 0.01718087 | -2.51552502 |
[3,] | -2.84793151 | 0.38956679 | -2.58471151 |
[25,] | -3.08596855 | -0.05775318 | -2.83976229 |