1.duplicates有哪些命令?
2.如何刪除重復值?
3.如果我要根據其中的幾個變量作為重復標准,怎么寫?
- 報告某個變量出現的次數
duplicates report [varlist] [if] [in]
- 列出重復的變量
duplicates list [varlist] [if] [in] [,options]
- 判斷是否有重復值;生成一個新變量,當某一行數據為重復值時,生成的新變量值為1,否則為0
duplicates tag [varlist] [if] [in], generate (newvar)
- 刪掉重復值,同時保留下每一組重復值中的第一行數據
duplicates drop [if] [in]
duplicates drop id year
//報錯,因為不是完全重復
這是因為stata認為這樣刪除會讓你丟失關於age的信息,所以它不允許。那如果age這個變量恰好是你不需要用的變量,這時候你可以加上force選項,這樣id和year重復的兩行就被刪除了。
force specifies that observations duplicated with respect to a named varlist be dropped. The force option is required when such a varlist is given as a reminder that information may be lost by dropping observations, given that those observations may differ on any variable not included in varlist(鑒於這些觀察結果可能對varlist中未包含的任何變量有所不同。).