1.判斷是否適合做主成份分析,變量標准化
Kaiser-Meyer-Olkin抽樣充分性測度也是用於測量變量之間相關關系的強弱的重要指標,是通過比較兩個變量的相關系數與偏相關系數得到的。
KMO介於0於1之間。KMO越高,表明變量的共性越強。如果偏相關系數相對於相關系數比較高,則KMO比較低,主成分分析不能起到很好的數據約化效果。
根據Kaiser(1974),一般的判斷標准如下:
0.00-0.49,不能接受(unacceptable);
0.50-0.59,非常差(miserable);
0.60-0.69,勉強接受(mediocre);
0.70-0.79,可以接受(middling);
0.80-0.89,比較好(meritorious);
0.90-1.00,非常好(marvelous)。
SMC即一個變量與其他所有變量的復相關系數的平方,也就是復回歸方程的可決系數。
SMC比較高表明變量的線性關系越強,共性越強,主成分分析就越合適。
. estat smc
. estat kmo
. estat anti//暫時不知道這個有什么用
得到結果,說明變量之間有較強的相關性,適合做主成份分析。
Squared multiple correlations of variables with all other variables ----------------------- Variable | smc -------------+--------- x1 | 0.8923 x2 | 0.9862 y1 | 0.9657 y2 | 0.9897 y3 | 0.9910 y4 | 0.9898 y5 | 0.9769 y6 | 0.9859 y7 | 0.9735 -----------------------
變量標准化
. egen z1=std(x1)
2.對變量進行主成份分析
. pca x1 x2 y1 y2 y3 y4 y5 y6 y7 . pca x1 x2 y1 y2 y3 y4 y5 y6 y7, comp(1)
得到下面兩個表格,第一個表格中的各項分別為特征根、difference這個不知道是啥、方差貢獻率、累積方差貢獻率。
*第二個表格即為因子載荷矩陣,它和SPSS中的成份矩陣和成份得分系數矩陣的關系為:
成份矩陣/sqrt(對應的特征值)=因子載荷矩陣=sqrt(對應的特征值)*成份得分系數矩陣
*系數越大,說明主成份對該變量的代表性越大。
Principal components/correlation Number of obs = 19 Number of comp. = 9 Trace = 9 Rotation: (unrotated = principal) Rho = 1.0000 -------------------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 | 7.57604 6.59246 0.8418 0.8418 Comp2 | .983579 .731224 0.1093 0.9511 Comp3 | .252355 .162221 0.0280 0.9791 Comp4 | .0901337 .0323568 0.0100 0.9891 Comp5 | .0577769 .0387149 0.0064 0.9955 Comp6 | .019062 .00931458 0.0021 0.9977 Comp7 | .00974741 .00259494 0.0011 0.9987 Comp8 | .00715247 .00299772 0.0008 0.9995 Comp9 | .00415475 . 0.0005 1.0000 -------------------------------------------------------------------------- Principal components (eigenvectors) ---------------------------------------------------------------------------------------------------------------------- Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7 Comp8 Comp9 | Unexplained -------------+------------------------------------------------------------------------------------------+------------- x1 | 0.1292 0.9388 0.1499 0.0240 0.0387 0.1398 0.2098 0.0776 0.0884 | 0 x2 | 0.3485 0.2337 -0.2455 0.1139 0.1515 -0.4559 -0.6523 -0.2378 -0.1946 | 0 y1 | 0.3482 -0.0578 0.4193 0.1836 -0.7127 0.1420 -0.2687 0.2227 -0.1264 | 0 y2 | 0.3476 -0.1604 0.4115 0.3539 0.1732 -0.1441 0.2073 -0.4811 0.4834 | 0 y3 | 0.3528 -0.1002 0.3289 -0.3145 0.3512 0.2787 0.1233 -0.2021 -0.6335 | 0 y4 | 0.3566 -0.1297 0.1355 -0.1226 0.3995 -0.2039 -0.0372 0.7516 0.2350 | 0 y5 | 0.3505 -0.0056 -0.2152 -0.7536 -0.3081 -0.0449 0.0658 -0.2047 0.3460 | 0 y6 | 0.3523 -0.0477 -0.4099 0.2705 -0.2076 -0.3276 0.6130 0.0922 -0.3127 | 0 y7 | 0.3482 -0.0761 -0.4809 0.2693 0.1291 0.7093 -0.1366 0.0146 0.1750 | 0 ----------------------------------------------------------------------------------------------------------------------
. estat loading,cnorm(eigen)
利用上述命令可以得到SPSS中的成分矩陣
Principal component loadings (unrotated) component normalization: sum of squares(column) = eigenvalue -------------------------------------------------------------------------------------------------------- | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7 Comp8 Comp9 -------------+------------------------------------------------------------------------------------------ x1 | .3556 .9311 .07533 .007206 .009293 .0193 .02071 .006566 .005701 x2 | .9591 .2318 -.1233 .03421 .03642 -.06295 -.0644 -.02011 -.01254 y1 | .9584 -.05736 .2106 .05512 -.1713 .0196 -.02653 .01884 -.008146 y2 | .9568 -.159 .2067 .1062 .04163 -.0199 .02047 -.04069 .03116 y3 | .9712 -.09934 .1652 -.09441 .08441 .03848 .01218 -.01709 -.04083 y4 | .9814 -.1286 .06808 -.03679 .09602 -.02815 -.00367 .06357 .01515 y5 | .9647 -.005542 -.1081 -.2262 -.07406 -.006196 .006492 -.01731 .0223 y6 | .9696 -.04732 -.2059 .08121 -.04991 -.04523 .06052 .007799 -.02015 y7 | .9584 -.07548 -.2416 .08084 .03102 .09793 -.01348 .001237 .01128 -------------------------------------------------------------------------------------------------------- .
3.畫碎石圖
. screeplot
4.畫載荷圖
. loadingplot
5.因子分析
. factor x1 x2 y1 y2 y3 y4 y5 y6 y7, pcf
(obs=19) Factor analysis/correlation Number of obs = 19 Method: principal-component factors Retained factors = 1 Rotation: (unrotated) Number of params = 9 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 7.57604 6.59246 0.8418 0.8418 Factor2 | 0.98358 0.73122 0.1093 0.9511 Factor3 | 0.25235 0.16222 0.0280 0.9791 Factor4 | 0.09013 0.03236 0.0100 0.9891 Factor5 | 0.05778 0.03871 0.0064 0.9955 Factor6 | 0.01906 0.00931 0.0021 0.9977 Factor7 | 0.00975 0.00259 0.0011 0.9987 Factor8 | 0.00715 0.00300 0.0008 0.9995 Factor9 | 0.00415 . 0.0005 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(36) = 358.55 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances --------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- x1 | 0.3556 | 0.8736 x2 | 0.9591 | 0.0801 y1 | 0.9584 | 0.0816 y2 | 0.9568 | 0.0845 y3 | 0.9712 | 0.0568 y4 | 0.9814 | 0.0368 y5 | 0.9647 | 0.0693 y6 | 0.9696 | 0.0599 y7 | 0.9584 | 0.0815 ---------------------------------------
利用predict命令可以直接得到SPSS中的成分得分系數矩陣,也就是基於factor命令將變量標准化
. predict f1 (regression scoring assumed) Scoring coefficients (method = regression) ------------------------ Variable | Factor1 -------------+---------- x1 | 0.04693 x2 | 0.12660 y1 | 0.12650 y2 | 0.12630 y3 | 0.12819 y4 | 0.12954 y5 | 0.12734 y6 | 0.12798 y7 | 0.12651 ------------------------