Stata操作
工具變量法的難點在於找到一個合適的工具變量並說明其合理性,Stata操作其實相當簡單,只需一行命令就可以搞定,我們通常使用的工具變量法的Stata命令主要就是ivregress
命令和ivreg2
命令。
ivregress
命令
ivregress
命令是Stata自帶的命令,支持兩階段最小二乘(2SLS)、廣義矩估計(GMM)和有限信息最大似然估計(LIML)三種工具變量估計方法,我們最常使用的是兩階段最小二乘法(2SLS),因為2SLS最能體現工具變量的實質,並且在球形擾動項的情況下,2SLS是最有效率的工具變量法。
顧名思義,兩階段最小二乘法(2SLS)需要做兩個回歸:
(1)第一階段回歸:用內生解釋變量對工具變量和控制變量回歸,得到擬合值。
(2)第二階段回歸:用被解釋變量對第一階段回歸的擬合值和控制變量進行回歸。
如果要使用2SLS方法,我們只需在ivregress
后面加上2sls
即可,然后將內生解釋變量lnjinshipop
和工具變量bprvdist
放在一個小括號中,用=
號連接。選項first
表示報告第一階段回歸結果,選項cluster()
表示使用聚類穩健的標准誤。
ivregress 2sls lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)
第一階段回歸結果
First-stage regressions ----------------------- Number of obs = 274 No. of clusters = 28 F( 7, 239) = 85.27 Prob > F = 0.0000 R-squared = 0.6487 Adj R-squared = 0.5988 Root MSE = 0.4442 ------------------------------------------------------------------------------ | Robust lnjinshipop | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnnightlight | .183385 .0682506 2.69 0.008 .0489354 .3178346 lncoastdist | .0350333 .077158 0.45 0.650 -.1169634 .1870299 tri | 1.06676 .5637082 1.89 0.060 -.0437105 2.177231 suitability | -.0769726 .0549697 -1.40 0.163 -.1852596 .0313144 lnpopdensity | .196144 .0843727 2.32 0.021 .0299349 .3623532 urbanrates | 3.352916 1.687109 1.99 0.048 .029414 6.676419 | provid | 12 | .2051006 .0551604 3.72 0.000 .096438 .3137632 13 | -1.890425 .0951146 -19.88 0.000 -2.077795 -1.703055 ...... 64 | -1.301895 .1581021 -8.23 0.000 -1.613346 -.9904433 | bprvdist | -.0846917 .0107859 -7.85 0.000 -.1059393 -.0634441 _cons | 2.126233 .9791046 2.17 0.031 .1974567 4.05501 ------------------------------------------------------------------------------
從表中可以看出,工具變量bprvdist
的系數為-0.085,標准誤為0.011,在1%的水平上顯著。
第二階段回歸結果
Instrumental variables (2SLS) regression Number of obs = 274 Wald chi2(34) = 13.62 Prob > chi2 = 0.9993 R-squared = 0.7874 Root MSE = .05434 (Std. Err. adjusted for 28 clusters in provid) ------------------------------------------------------------------------------ | Robust lneduyear | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnjinshipop | .0834221 .0116773 7.14 0.000 .060535 .1063092 lnnightlight | .0592777 .0102138 5.80 0.000 .0392589 .0792964 lncoastdist | .0093399 .0118791 0.79 0.432 -.0139426 .0326224 tri | -.1044346 .063917 -1.63 0.102 -.2297097 .0208404 suitability | -.0008944 .0125824 -0.07 0.943 -.0255555 .0237666 lnpopdensity | -.0472648 .0115102 -4.11 0.000 -.0698245 -.0247051 urbanrates | -.0353689 .1784687 -0.20 0.843 -.3851612 .3144233 | provid | 12 | -.10732 .0088755 -12.09 0.000 -.1247157 -.0899243 13 | .0244675 .0276005 0.89 0.375 -.0296284 .0785634 ...... 64 | -.0740796 .0350354 -2.11 0.034 -.1427477 -.0054116 | _cons | 2.014146 .1490819 13.51 0.000 1.721951 2.306341 ------------------------------------------------------------------------------
第二階段回歸結果就是使用工具變量法對內生性問題進行“治療”之后得到的估計結果,很多論文一般也只報告第二階段回歸結果。從表中可以看出,進士密度lnjinshipop
的系數為0.083,標准誤為0.012,在1%的水平上顯著。
弱工具變量檢驗
檢驗工具變量的有效性,首先要檢驗工具變量的相關性。存在弱工具變量問題時,2SLS 估計不僅難以矯正OLS 估計的偏差,反因有更大的標准差而有更低的效率, 導致“治療比疾病本身更壞”。我們可以使用estat firststage
命令對第一階段的結果進行分析,以判斷是否存在弱工具變量問題。
. estat firststage,forcenonrobust First-stage regression summary statistics -------------------------------------------------------------------------- | Adjusted Partial Robust Variable | R-sq. R-sq. R-sq. F(1,27) Prob > F -------------+------------------------------------------------------------ lnjinshipop | 0.6487 0.5988 0.3276 61.914 0.0000 -------------------------------------------------------------------------- (F statistic adjusted for 28 clusters in provid) Minimum eigenvalue statistic = 116.419 Critical Values # of endogenous regressors: 1 Ho: Instruments are weak # of excluded instruments: 1 --------------------------------------------------------------------- | 5% 10% 20% 30% 2SLS relative bias | (not available) -----------------------------------+--------------------------------- | 10% 15% 20% 25% 2SLS Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53 LIML Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53 ---------------------------------------------------------------------
結果顯示,偏=0.3276(偏
是扣除了其他外生變量后工具變量的解釋力度),說明工具變量
bprvdist
對內生變量lnjinshipop
有很強的解釋力度。F統計量=61.914>10,根據經驗准則可以判斷,我們的工具變量不是一個弱工具變量。除此之外,estat firststage
命令還會報告一個最小特征值統計量(Minimum eigenvalue statistic
),一般大於2SLS Size of nominal 5% Wald test
中10%對應的臨界值就表明不存在弱工具變量問題。
特別說明:豪斯曼檢驗(即內生性檢驗)和過度識別檢驗(即外生性檢驗)在此就不予以介紹了,因為這兩個檢驗都是很“雞肋”的,並且這里只有一個工具變量,也沒法做過度識別檢驗,后面我將專門寫一篇推文對這兩個檢驗進行詳細說明。
ivregress 2sls Y X1 X2 X3 X4 (X1= Z1 Z2), robust
estat overid
ivreg2
命令
ivreg2
命令是對ivregress
命令的改進和優化,功能更加強大,支持的估計方法更多(默認使用2SLS),並且會直接報告工具變量的幾個統計檢驗結果。
ivreg2
命令是一個外部命令,所以使用之前需要安裝(ssc install ivreg2
)。它的語法格式和ivregress
基本一致:
ivreg2 lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)
第一階段回歸結果
First-stage regression of lnjinshipop: Statistics robust to heteroskedasticity and clustering on provid Number of obs = 274 Number of clusters (provid) = 28 ------------------------------------------------------------------------------ | Robust lnjinshipop | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bprvdist | -.0846917 .0107633 -7.87 0.000 -.1058948 -.0634886 lnnightlight | .183385 .0681077 2.69 0.008 .0492169 .3175531 lncoastdist | .0350333 .0769964 0.45 0.650 -.116645 .1867116 tri | 1.06676 .5625276 1.90 0.059 -.0413849 2.174906 suitability | -.0769726 .0548546 -1.40 0.162 -.1850328 .0310876 lnpopdensity | .196144 .084196 2.33 0.021 .030283 .3620051 urbanrates | 3.352916 1.683576 1.99 0.048 .0363742 6.669459 | provid | 12 | .2051006 .0550449 3.73 0.000 .0966656 .3135357 13 | -1.890425 .0949154 -19.92 0.000 -2.077402 -1.703447 ...... 64 | -1.301895 .157771 -8.25 0.000 -1.612694 -.9910956 | _cons | 2.126233 .9770541 2.18 0.031 .201496 4.050971 ------------------------------------------------------------------------------
第二階段回歸結果
IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity and clustering on provid Number of clusters (provid) = 28 Number of obs = 274 F( 34, 27) = 0.34 Prob > F = 0.9984 Total (centered) SS = 3.805186838 Centered R2 = 0.7874 Total (uncentered) SS = 1282.736705 Uncentered R2 = 0.9994 Residual SS = .8089841945 Root MSE = .05434 ------------------------------------------------------------------------------ | Robust lneduyear | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnjinshipop | .0834221 .0116773 7.14 0.000 .060535 .1063092 lnnightlight | .0592777 .0102138 5.80 0.000 .0392589 .0792964 lncoastdist | .0093399 .0118791 0.79 0.432 -.0139426 .0326224 tri | -.1044346 .063917 -1.63 0.102 -.2297097 .0208404 suitability | -.0008944 .0125824 -0.07 0.943 -.0255555 .0237666 lnpopdensity | -.0472648 .0115102 -4.11 0.000 -.0698245 -.0247051 urbanrates | -.0353689 .1784687 -0.20 0.843 -.3851612 .3144233 | provid | 12 | -.10732 .0088755 -12.09 0.000 -.1247157 -.0899243 13 | .0244675 .0276005 0.89 0.375 -.0296284 .0785634 ...... 64 | -.0740796 .0350354 -2.11 0.034 -.1427477 -.0054116 | _cons | 2.014146 .1490819 13.51 0.000 1.721951 2.306341 ------------------------------------------------------------------------------
弱工具變量檢驗結果
ivreg2
命令會直接報告Cragg-Donald Wald F
統計量和Kleibergen-Paap Wald rk F
統計量兩個用於弱工具變量檢驗的統計量,其中Cragg-Donald Wald F
統計量假設擾動項iid
(獨立同分布),而Kleibergen-Paap Wald rk F
統計量沒有做擾動項iid
的假設。需要注意的是,這里的原假設是所選工具變量是弱工具變量,當兩個F統計量大於Stock-Yogo weak ID test critical values
中10%偏誤的臨界值時,可以拒絕原假設,認為不存在弱工具變量問題。
Weak identification test Ho: equation is weakly identified Cragg-Donald Wald F statistic 116.42 Kleibergen-Paap Wald rk F statistic 61.91 Stock-Yogo weak ID test critical values for K1=1 and L1=1: 10% maximal IV size 16.38 15% maximal IV size 8.96 20% maximal IV size 6.66 25% maximal IV size 5.53 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
補充:xtivreg2命令
ivreg2
和 xtivreg2
之間的差異,與reg
和 xtreg
之間的差異大體類似。
xtset id year xtivreg2 ys k (n=l2.n l3.n), fe first savefp(first) outreg2 [first second] using xxx.doc, tstat bdec(3) tdec(2) replace
xtivreg2會報告過度檢驗的結果,同ivreg2
添加時間固定效應建議使用xtivreg
xtset stkcd year xtreg Ln_geodistance_ew rdls ewdistance_mean $control i.year,fe est store first outreg2[first] using first.doc,tstat bdec(3) tdec(3) addstat(Ajusted R2,`e(r2_a)') replace xtivreg ln_Cash_ratio1 (Ln_geodistance_ew=rdls ewdistance_mean) $control i.year,fe //xtivreg2 加入不了i.year outreg2 using xxx1.doc,cttop(second) tstat bdec(3) tdec(2)
如果進行過度檢驗則需要
xtoverid
添加字符串類型固定效應則
xtset stkcd year xi:xtreg Ln_geodistance_ew rdls ewdistance_mean $control i.industry2,fe est store first outreg2[first] using first.doc,tstat bdec(3) tdec(3) addstat(Ajusted R2,`e(r2_a)') replace xi:xtivreg ln_Cash_ratio1 (Ln_geodistance_ew=rdls ewdistance_mean) $control i.industry2,fe //xtivreg2 加入不了i.year outreg2 using xxx1.doc,cttop(second) tstat bdec(3) tdec(2)