shoes表結構
在此表中,shoes_name可能有重復,本篇博客記錄如何去除重復數據。
1.首先要知道哪些數據是重復的, 可用group by 聚集函數找到:
SELECT shoes_name,count(*) from shoes GROUP BY shoes_name having COUNT(*)>1
注:having 一般和group連用,用來限制查到的結果,這里的意思是將shoes表按shoes_name組,count(*)計算每組的條數,hiving限制顯示條數大於1的結果,即有重復的數據。
2.根據第一步中獲得的shoes_name來獲得所有重復的數據
SELECT * from shoes WHERE shoes_name IN( SELECT * from ( SELECT shoes_name from shoes GROUP BY shoes_name having COUNT(*)>1) t1 )
3.因為刪除時我們要保留id最小的數據行,所以我們要查找最小的id。
SELECT id from shoes WHERE id in ( SELECT * from ( select MIN(id) from shoes GROUP BY shoes_name having COUNT(*)>1 )t2 )
4.刪除這些重復數據,只保留最小的table_id
DELETE from shoes where shoes_name IN( SELECT * from( SELECT shoes_name FROM shoes GROUP BY shoes_name having COUNT(*)>1 )t1 ) AND id not IN( SELECT * from ( select MIN(id) from shoes GROUP BY shoes_name having COUNT(*)>1 )t2 )