對於面板數據,我們有多種估計方法,包括混合OLS、固定效應(FE)、隨機效應(RE)和最小二乘虛擬變量(LSDV)等等。不過,我們最為常用的估計方法那自然還是固定效應(組內估計),固定效應模型的Stata官方命令是xtreg
,但它有時候其實並沒有那么好用(如對數據格式有要求,運行速度慢等),我們經常使用的固定效應估計命令還有reg
、areg
和reghdfe
。
xtreg
xtreg,fe
是固定效應模型的官方命令,使用這一命令估計出來的系數是最為純正的固定效應估計量(組內估計量)。xtreg
對數據格式有嚴格要求,要求必須是面板數據,在使用xtreg命令之前,我們首先需要使用xtset
命令進行面板數據聲明,定義截面(個體)維度和時間維度。一旦在xtreg
命令后加上選項fe
,那就表示使用固定效應組內估計方法進行估計,並且默認個體固定效應定義在xtset
所設定的截面維度上。至於時間固定效應,需要引入虛擬變量i.year
來表示不同的時間。
下面使用林毅夫老師(1992)的AER論文《Rural Reforms and Agricultural Growth in China》(中國的農村改革與農業增長)所使用的數據lin_1992.dta,給大家演示一下該命令的用法和估計結果。
. xtset province year panel variable: province (strongly balanced) time variable: year, 70 to 87 delta: 1 unit . xtreg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, fe vce(cluster province) Fixed-effects (within) regression Number of obs = 476 Group variable: province Number of groups = 28 R-sq: Obs per group: within = 0.8932 min = 17 between = 0.6596 avg = 17.0 overall = 0.7156 max = 17 F(23,27) = 949.82 corr(u_i, Xb) = -0.3425 Prob > F = 0.0000 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------ | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ltlan | .5833594 .1745834 3.34 0.002 .2251439 .9415749 ltwlab | .1514909 .0585107 2.59 0.015 .0314368 .271545 ltpow | .0971114 .090911 1.07 0.295 -.0894225 .2836453 ltfer | .1693346 .0438098 3.87 0.001 .0794444 .2592248 hrs | .1503752 .0587581 2.56 0.016 .0298136 .2709368 mci | .1978373 .0810587 2.44 0.022 .0315186 .364156 ngca | .7784081 .4016301 1.94 0.063 -.0456688 1.602485 | year | 71 | -.0240404 .023366 -1.03 0.313 -.0719836 .0239027 72 | -.1323624 .0404832 -3.27 0.003 -.2154272 -.0492977 73 | -.0377336 .0357883 -1.05 0.301 -.111165 .0356979 74 | .0058554 .0500774 0.12 0.908 -.096895 .1086058 75 | .0096731 .0566898 0.17 0.866 -.1066448 .1259911 76 | -.0476465 .061423 -0.78 0.445 -.1736761 .0783832 77 | -.0869336 .0680579 -1.28 0.212 -.2265767 .0527096 78 | -.0325205 .0766428 -0.42 0.675 -.1897785 .1247376 79 | -.0076332 .0833462 -0.09 0.928 -.1786454 .163379 81 | -.093479 .1093614 -0.85 0.400 -.3178701 .1309121 82 | -.0447862 .1207405 -0.37 0.714 -.2925251 .2029528 83 | -.0309435 .1377207 -0.22 0.824 -.313523 .2516361 84 | .0442535 .1428764 0.31 0.759 -.2489048 .3374117 85 | -.0033372 .1561209 -0.02 0.983 -.3236709 .3169965 86 | .00484 .157992 0.03 0.976 -.3193329 .3290129 87 | .0386475 .1639608 0.24 0.815 -.2977723 .3750674 | _cons | 2.651286 .7738994 3.43 0.002 1.063376 4.239196 -------------+---------------------------------------------------------------- sigma_u | .29344594 sigma_e | .09930555 rho | .89724523 (fraction of variance due to u_i) ------------------------------------------------------------------------------
reg
通過在回歸方程中引入虛擬變量來代表不同的個體,可以起到和固定效應組內估計方法(FE)同樣的效果(已經被證明)。這種方法被稱之為最小二乘虛擬變量方法(LSDV),一些教材和論文也把這種方法稱之為固定效應估計方法。它的好處是可以得到對個體異質性的估計(FE是通過組內變換消去個體異質性
),但如果個體
很大,那么需要引入很多虛擬變量,自由度損失太多,還可能超出Stata所允許的解釋變量個數。
LSDV方法的Stata命令是reg i.id i.year
,其中,id是個體變量,year是時間變量,reg
命令對數據格式沒有要求,因而使用起來更為靈活,只是會生成一大長串虛擬變量估計結果。
. reg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.province i.year, vce(cluster province) Linear regression Number of obs = 476 F(22, 27) = . Prob > F = . R-squared = 0.9695 Root MSE = .09931 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------- | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- ltlan | .5833594 .1800436 3.24 0.003 .2139404 .9527783 ltwlab | .1514909 .0603407 2.51 0.018 .027682 .2752998 ltpow | .0971114 .0937543 1.04 0.309 -.0952565 .2894792 ltfer | .1693346 .0451799 3.75 0.001 .0766331 .2620362 hrs | .1503752 .0605958 2.48 0.020 .026043 .2747075 mci | .1978373 .0835939 2.37 0.025 .0263169 .3693578 ngca | .7784081 .4141914 1.88 0.071 -.0714423 1.628259 | province | beijing | -.1865095 .1172887 -1.59 0.123 -.427166 .054147 fujian | .0434646 .0473107 0.92 0.366 -.0536089 .1405381 gansu | -.7945197 .1228202 -6.47 0.000 -1.046526 -.5425134 guangdong | -.0278664 .0609608 -0.46 0.651 -.1529476 .0972149 guangxi | -.2539549 .0614801 -4.13 0.000 -.3801015 -.1278082 guizhou | -.2526439 .0598147 -4.22 0.000 -.3753736 -.1299142 hebei | -.270106 .0948694 -2.85 0.008 -.4647619 -.07545 heilongjiang | -.0926732 .26542 -0.35 0.730 -.63727 .4519237 henan | -.0920743 .0396983 -2.32 0.028 -.1735284 -.0106201 hubei | .1024438 .0368811 2.78 0.010 .0267701 .1781176 hunan | -.0434275 .0581142 -0.75 0.461 -.1626679 .0758129 jiangsu | .1153335 .0352061 3.28 0.003 .0430965 .1875705 jiangxi | -.1401737 .0596644 -2.35 0.026 -.2625949 -.0177525 jilin | -.1783839 .2109985 -0.85 0.405 -.6113171 .2545493 liaoning | -.2517315 .1563399 -1.61 0.119 -.5725145 .0690515 neimong | -.8860432 .2325209 -3.81 0.001 -1.363137 -.4089498 ningxia | -.8489859 .1732579 -4.90 0.000 -1.204482 -.49349 qinghai | -.6982553 .1268849 -5.50 0.000 -.9586017 -.4379089 shaanxi | -.320607 .0887091 -3.61 0.001 -.502623 -.1385911 shangdong | .0040812 .0547494 0.07 0.941 -.1082554 .1164177 shanghai | .0864336 .0982642 0.88 0.387 -.1151878 .288055 shanxi | -.5005347 .1388718 -3.60 0.001 -.785476 -.2155934 sichuan | .0335563 .0392453 0.86 0.400 -.0469685 .1140811 tianjin | -.3011 .1049208 -2.87 0.008 -.5163796 -.0858203 xinjiang | -.3740561 .2053926 -1.82 0.080 -.7954869 .0473746 yunnan | -.2854833 .0590488 -4.83 0.000 -.4066415 -.1643251 zhejiang | .1615248 .0760427 2.12 0.043 .0054981 .3175515 | year | 71 | -.0240404 .0240968 -1.00 0.327 -.073483 .0254022 72 | -.1323624 .0417494 -3.17 0.004 -.2180251 -.0466998 73 | -.0377336 .0369076 -1.02 0.316 -.1134616 .0379945 74 | .0058554 .0516436 0.11 0.911 -.1001086 .1118193 75 | .0096731 .0584628 0.17 0.870 -.1102827 .129629 76 | -.0476465 .0633441 -0.75 0.458 -.1776178 .0823249 77 | -.0869336 .0701864 -1.24 0.226 -.2309442 .057077 78 | -.0325205 .0790398 -0.41 0.684 -.1946968 .1296559 79 | -.0076332 .0859529 -0.09 0.930 -.1839939 .1687275 81 | -.093479 .1127818 -0.83 0.414 -.324888 .1379301 82 | -.0447862 .1245167 -0.36 0.722 -.3002733 .210701 83 | -.0309435 .142028 -0.22 0.829 -.3223608 .2604739 84 | .0442535 .147345 0.30 0.766 -.2580735 .3465804 85 | -.0033372 .1610037 -0.02 0.984 -.3336895 .3270151 86 | .00484 .1629333 0.03 0.977 -.3294716 .3391516 87 | .0386475 .1690888 0.23 0.821 -.3082941 .3855891 | _cons | 2.874582 .7510459 3.83 0.001 1.333563 4.415601 -------------------------------------------------------------------------------
areg
areg
命令是對reg
命令的改進和優化,其對數據結構也沒有要求。有些時候我們想在回歸中控制很多虛擬變量(i.id
這種),但又不想生成虛擬變量,不想報告虛擬變量的回歸結果,那么就可以使用areg
命令,只需在選項absorb()
的括號里加入你想要控制的類別變量就好。因此,我們也可以使用areg
命令實現固定效應的估計,因為固定效應組內估計與LSDV效果是等價的。
不過absorb()
的括號里只能加一個變量,如果想要估計雙向固定效應或是更高維度固定效應,那么就還是要使用使用i.var
的方式引入虛擬變量。
. areg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, absorb(province) vce(cluster province) Linear regression, absorbing indicators Number of obs = 476 Absorbed variable: province No. of categories = 28 F( 23, 27) = 893.08 Prob > F = 0.0000 R-squared = 0.9695 Adj R-squared = 0.9659 Root MSE = 0.0993 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------ | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ltlan | .5833594 .1800436 3.24 0.003 .2139404 .9527783 ltwlab | .1514909 .0603407 2.51 0.018 .027682 .2752998 ltpow | .0971114 .0937543 1.04 0.309 -.0952565 .2894792 ltfer | .1693346 .0451799 3.75 0.001 .0766331 .2620362 hrs | .1503752 .0605958 2.48 0.020 .026043 .2747075 mci | .1978373 .0835939 2.37 0.025 .0263169 .3693578 ngca | .7784081 .4141914 1.88 0.071 -.0714423 1.628259 | year | 71 | -.0240404 .0240968 -1.00 0.327 -.073483 .0254022 72 | -.1323624 .0417494 -3.17 0.004 -.2180251 -.0466998 73 | -.0377336 .0369076 -1.02 0.316 -.1134616 .0379945 74 | .0058554 .0516436 0.11 0.911 -.1001086 .1118193 75 | .0096731 .0584628 0.17 0.870 -.1102827 .129629 76 | -.0476465 .0633441 -0.75 0.458 -.1776178 .0823249 77 | -.0869336 .0701864 -1.24 0.226 -.2309442 .057077 78 | -.0325205 .0790398 -0.41 0.684 -.1946968 .1296559 79 | -.0076332 .0859529 -0.09 0.930 -.1839939 .1687275 81 | -.093479 .1127818 -0.83 0.414 -.324888 .1379301 82 | -.0447862 .1245167 -0.36 0.722 -.3002733 .210701 83 | -.0309435 .142028 -0.22 0.829 -.3223608 .2604739 84 | .0442535 .147345 0.30 0.766 -.2580735 .3465804 85 | -.0033372 .1610037 -0.02 0.984 -.3336895 .3270151 86 | .00484 .1629333 0.03 0.977 -.3294716 .3391516 87 | .0386475 .1690888 0.23 0.821 -.3082941 .3855891 | _cons | 2.651286 .7981036 3.32 0.003 1.013713 4.288859 ------------------------------------------------------------------------------
備注:如果出現matsize too small
set matsize 5000
reghdfe
reghdfe
主要用於實現多維固定效應線性回歸。有些時候,我們需要控制多個維度(如城市-行業-年度)的固定效應,xtreg
等命令也OK,但運行速度會很慢,reghdfe
解決的就是這一痛點,其在運行速度方面遠遠優於xtreg
等命令。reghdfe
是一個外部命令,作者是Sergio Correia,有關這一命令的更多介紹詳見github作者主頁(https://github.com/sergiocorreia/reghdfe),大家在使用之前需要安裝(ssc install reghdfe
)。
reghdfe
命令可以包含多維固定效應,只需 absorb (var1,var2,var3,...)
,不需要使用i.var
的方式引入虛擬變量,相比xtreg
等命令方便許多,並且不會匯報一大長串虛擬變量回歸結果。
. reghdfe ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca, absorb(year province) vce(cluster province) (MWFE estimator converged in 2 iterations) HDFE Linear regression Number of obs = 476 Absorbing 2 HDFE groups F( 7, 27) = 229.56 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.9695 Adj R-squared = 0.9658 Within R-sq. = 0.6751 Number of clusters (province) = 28 Root MSE = 0.0994 (Std. Err. adjusted for 28 clusters in province) ------------------------------------------------------------------------------ | Robust ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ltlan | .5833594 .1745834 3.34 0.002 .2251439 .9415749 ltwlab | .1514909 .0585107 2.59 0.015 .0314368 .271545 ltpow | .0971114 .090911 1.07 0.295 -.0894225 .2836453 ltfer | .1693346 .0438098 3.87 0.001 .0794444 .2592248 hrs | .1503752 .0587581 2.56 0.016 .0298136 .2709368 mci | .1978373 .0810587 2.44 0.022 .0315186 .364156 ngca | .7784081 .4016301 1.94 0.063 -.0456688 1.602485 _cons | 2.625513 .7307092 3.59 0.001 1.126221 4.124804 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| year | 17 0 17 | province | 28 28 0 *| -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation
eghdfe y x, absorb(id year industry) 可以實現控制多維固定效應
reghdfe y x, absorb(year#industry) 實現控制交乘固定效應
reghdfe也可以同時對標准誤進行聚類
總結
從表格展示的回歸結果可以發現,xtreg
,reg
,areg
和reghdfe
四個命令估計的系數大小是一致的,只是標准誤會有略微差異。其中,xtreg
和reghdfe
命令估計得到的標准誤是一致的,它們背后的估計方法是固定效應,而reg
和areg
命令估計得到的標准誤是一致的,因為這兩個命令背后的估計方法是特殊的混合OLS(LSDV方法)。