Mysql數據庫中查詢重復數據和去重數據 , 刪除重復數據的sql及分析
數據庫中有重復數據時,用到哪些sql語句? 建表:
x
CREATE TABLE `user` (
`id` bigint(255) NOT NULL AUTO_INCREMENT,
`name` varchar(20) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '名稱',
`age` int(2) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
這里有若干數據,並摻雜了重復數據
1. 查看過濾重復后的數據
思路, group by 分組可以對多個列進行分組, 分組后可以過濾掉重復的數據 這里在mysql5.7以上版本會報錯,因為不支持select那些group by和聚合函數之外的字段 sql語句:
x
SELECT id,`name`,age,count(1)
FROM user GROUP BY `name`,age
這里要么把id去掉,要么選擇臨時方案:
set @@global.sql_mode ='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';
set @@SESSION.sql_mode ='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';
SELECT id,`name`,age,count(1)
FROM user GROUP BY `name`,age;
2. 查看重復的數據
剛剛的語句已經把每個組對應的count數查詢出來了,那么count>1的自然是重復的數據
SELECT id,`name`,age,count(1) as c
FROM user GROUP BY `name`,age having c > 1
3. 刪除重復的數據留下一條
思路: 剛剛已經把重復的數據查詢出來了,包括id, 那么查詢出每個重復組中的唯一一個id,也就是x,就可以delete … id not in (x)
上面說雖然5.7以上版本默認不支持查詢group by 以外的字段,比如id,但是聚合函數還是可以的 子語句1:
x
SELECT MIN(id) FROM user
GROUP BY name,age
查詢出來的id就是我們需要留下的不重復的數據的id
按理來說只要: delete from user where id not in 子語句1
x
DELETE FROM user
WHERE id NOT IN (
SELECT MIN(id) FROM user
GROUP BY name,age
)
但是報錯了
x
DELETE FROM user
WHERE id NOT IN (
SELECT MIN(id) FROM user
GROUP BY name,age
)
> 1093 - You can't specify target table 'user' for update in FROM clause
> 時間: 0.007s
因為在mysql中,不能在一條Sql語句中,即查詢這些數據,同時修改這些數據
解決方法:select的結果再通過一個中間表temp進行select多一次,就可以避免這個錯誤
x
DELETE FROM user
WHERE id NOT IN (
SELECT temp.min_id FROM (
SELECT MIN(id) min_id FROM user
GROUP BY name,age
)AS temp
);
select * from user;
刪除成功: