1.duplicates有哪些命令?
2.如何删除重复值?
3.如果我要根据其中的几个变量作为重复标准,怎么写?
- 报告某个变量出现的次数
duplicates report [varlist] [if] [in]
- 列出重复的变量
duplicates list [varlist] [if] [in] [,options]
- 判断是否有重复值;生成一个新变量,当某一行数据为重复值时,生成的新变量值为1,否则为0
duplicates tag [varlist] [if] [in], generate (newvar)
- 删掉重复值,同时保留下每一组重复值中的第一行数据
duplicates drop [if] [in]
duplicates drop id year
//报错,因为不是完全重复
这是因为stata认为这样删除会让你丢失关于age的信息,所以它不允许。那如果age这个变量恰好是你不需要用的变量,这时候你可以加上force选项,这样id和year重复的两行就被删除了。
force specifies that observations duplicated with respect to a named varlist be dropped. The force option is required when such a varlist is given as a reminder that information may be lost by dropping observations, given that those observations may differ on any variable not included in varlist(鉴于这些观察结果可能对varlist中未包含的任何变量有所不同。).