1. joinby 命令:多對多的匹配
*輸入數據 clear input group str3 x1 1 "A" 1 "B" 1 "C" 1 "D" end save file1.dta,replace clear input group str3 x2 1 "M" 1 "N" end save file2.dta,replace
*merge 多對多匹配
use file1.dta,clear merge m:m group using file2.dta list, clean noobs
*joinby 多對多匹配
use file1.dta,clear joinby group using file2.dta list, clean noobs
merge
命令多對多匹配結果:
group x1 x2 _merge 1 A M matched (3) 1 B N matched (3) 1 C N matched (3) 1 D N matched (3)
可以看出,merge
命令的多對多合並是有問題的,其會以較少數據文件的最后一行值 (比如這里的 file2.dta 的最后一行數據「group1,x2=N」) 進行重復合並。
joinby
命令多對多匹配結果:
group x1 x2 1 A N 1 A M 1 B M 1 B N 1 C N 1 C M 1 D N 1 D M
可以看出,joinby
命令顯然更符合我們的要求。關於 joinby
命令更多詳細介紹,請查看幫助文件 help joinby
。
2. nearmrg 命令:相似值的匹配
*生成一份數據
sysuse auto.dta, clear keep make price mpg keep if make == "Toyota Celica" | /// make == "BMW 320i" | /// make == "Cad. Seville" | /// make == "Pont. Grand Prix" | /// make == "Datsun 210" rename make make2 save "using.dta", replace list, clean noobs
列出數據:
make2 price mpg Cad. Seville 15,906 21 Pont. Grand Prix 5,222 19 BMW 320i 9,735 25 Datsun 210 4,589 35 Toyota Celica 5,899 18
然后,我們將該數據與 auto.dta 進行合並,並找出 using.dta 數據中價格浮動在 $50 上下的數據。
sysuse auto.dta, clear nearmrg using "using.dta", upper nearvar(price) genmatch(usingmatch) limit(50) keep make price mpg make2 _merge usingmatch list, clean noobs
make price mpg make2 _merge usingm~h Datsun 210 4,589 35 Datsun 210 matched (3) 4,589 Buick Regal 5,189 20 Pont. Grand Prix matched (3) 5,222 Pont. Grand Prix 5,222 19 Pont. Grand Prix matched (3) 5,222 Olds Cutl Supr 5,172 19 Pont. Grand Prix matched (3) 5,222 Dodge Magnum 5,886 16 Toyota Celica matched (3) 5,899 Toyota Celica 5,899 18 Toyota Celica matched (3) 5,899 BMW 320i 9,735 25 BMW 320i matched (3) 9,735 Audi 5000 9,690 17 BMW 320i matched (3) 9,735 Cad. Seville 15,906 21 Cad. Seville matched (3) 15,906
可以看出,using data 中原有 5 行數據,合並后變成了 9 行數據。之所以如此,是因為 auto.dta 中價格浮動在 50 之內的數據都被保留了下來。