今天一個同學問我mysql去除重復數據,自己做了個測試順便記錄下:
查看表結構:
mysql> desc testdelete;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| one | varchar(40) | YES | | NULL | |
| two | varchar(40) | YES | | NULL | |
| three | varchar(40) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
4 rows in set (0.10 sec)
表的數據:
mysql> select * from testdelete;
+----+------+------+-------+
| id | one | two | three |
+----+------+------+-------+
| 1 | A | A | A |
| 2 | B | B | B |
| 3 | C | C | C |
| 4 | D | D | D |
| 5 | E | E | E |
| 6 | A | A | B |
| 12 | A | A | A |
| 13 | A | A | A |
| 14 | A | A | A |
| 15 | A | A | A |
+----+------+------+-------+
10 rows in set (0.00 sec)
接下來進行測試:
1.根據one列查詢重復的數據(根據單列判斷重復)
SELECT * FROM testdelete WHERE ONE IN (SELECT ONE FROM testdelete GROUP BY ONE HAVING COUNT(ONE) > 1)
結果:
2.刪除表中的重復記錄:(根據單列刪除且保留ID最小的一條)
DELETE
FROM testdelete
WHERE ONE IN(SELECT
ONE
FROM testdelete
GROUP BY ONE
HAVING COUNT(ONE) > 1)
AND id NOT IN(SELECT
MIN(id)
FROM testdelete
GROUP BY ONE
HAVING COUNT(ONE) > 1)
報錯:
原因:大概是因為不能直接在查詢的語句中進行操作。
解決辦法:將查詢包裝一層:
DELETE
FROM testdelete
WHERE ONE IN(SELECT ONE FROM (SELECT ONE FROM testdelete GROUP BY ONE HAVING COUNT(ONE) > 1) a) AND id NOT IN(SELECT * FROM (SELECT MIN(id) FROM testdelete GROUP BY ONE HAVING COUNT(ONE) > 1) b)
結果:
(5 row(s) affected)
Execution Time : 00:00:00:094
Transfer Time : 00:00:00:000
Total Time : 00:00:00:094
再次查看數據:
將數據還原。
3.根據one,two,three判斷重復:(根據單多判斷重復)
SELECT * FROM testdelete a
WHERE (a.one,a.two,a.three) IN (SELECT ONE,two,three FROM testdelete GROUP BY ONE,two,three HAVING COUNT(*) > 1)
結果;
4.刪除表中的重復數據(根據多列進行刪除且保留ID最小的一條)
DELETE FROM testdelete WHERE (ONE,two,three)IN(SELECT ONE, two, three FROM (SELECT ONE, two, three FROM testdelete GROUP BY ONE,two,three HAVING COUNT( * ) > 1) a) AND id NOT IN(SELECT MIN(id) FROM (SELECT MIN(id) AS id FROM testdelete GROUP BY ONE,two,three HAVING COUNT( * ) > 1) b)
結果:
(4 row(s) affected)
Execution Time : 00:00:00:125
Transfer Time : 00:00:00:000
Total Time : 00:00:00:125
查看數據:
數據還原
5. 查找表中多余的重復記錄(多個字段),不包含id最小的記錄 (根據多個字段查重復不包含id最小的)
SELECT * FROM testdelete a WHERE (a.one,a.two,a.three)IN(SELECT ONE, two, three FROM testdelete GROUP BY ONE,two,three HAVING COUNT( * ) > 1) AND id NOT IN(SELECT MIN(id) AS id FROM testdelete GROUP BY ONE,two,three HAVING COUNT( * ) > 1)
結果: