消除重復數據是數據清洗的頭等大事,下面介紹比較常用的幾種去重方式。
Distinct
刪除重復的員工信息
select distinct(empno) from emp
Rowid
利用rowid
結合max
和min
函數快速去重
select e.* from emp e where e.rowid = (select max(e.rowid) from emp e
利用rowid
結合max
和min
函數快速刪除重復數據
delete e.* from emp e where e.rowid < (select max(e.rowid) from emp e
Group by
select deptno from emp group by deptno;
Row_number()
row_number是通過標記排號方式去重,如果有2條或以上的重復數據,直接篩選刪除即可。
1.查看重復數據
select d.id,d.outer_code from dict_depts_source d order by outer_code
2.標識重復數據
select d.id,d.outer_code,row_number() over(partition by outer_code order by outer_code) row_flag from dict_depts_source d
3.刪除重復數據
delete from dict_depts_source where id in(
select id from(select d.id,d.outer_code,row_number() over(partition by outer_code order by outer_code) row_flag from dict_depts_source d)t
where t.row_flag > 1)
4.檢查刪除效果
select d.id,d.outer_code,row_number() over(partition by outer_code order by outer_code) row_flag from dict_depts_source d