對於有大量重復數據的表添加唯一索引

本文轉載自查看原文 2013-09-13 15:07 6036 MySQL/ alter ignore/ 唯一索引/ alter ignore table/ unique index

遇到如題的這么一個場景：需要在MySQL的一張innodb引擎的表(tableA)上添加一個唯一索引(idx_col1_u)。但是表中已經有大量重復數據，對於每個key(col1)，有的重復2行，有的重復N行。

此時，做數據的手工清理，或者SQL處理無疑是非常耗時的。

1. Alter ignore table come to help

印象中MySQL有一個獨有的 alter ignore add unique index的語法。

語法如下：

ALTER [ONLINE | OFFLINE] [IGNORE] TABLE tbl_name

行為類似於insert ignore，即遇到沖突的unique數據則直接拋棄而不報錯。對於加唯一索引的情況來說就是建一張空表，然后加上唯一索引，將老數據用insert ignore語法插入到新表中，遇到沖突則拋棄數據。

文檔中對於alter ignore的注釋：詳見：http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

2. #1062 - Duplicate entry

然而在執行了 alter ignore table tableA add unique index idx_col1_u (col1) 后，還是報了以下錯誤：

#1062 - Duplicate entry '111' for key 'col1'.

不是會自動丟棄重復數據么？世界觀被顛覆了。查了下資料原來是alter ignore的語法不支持innodb。

得知alter ignore的實現完全取決於存儲引擎的內部實現，而不是server端強制的，具體描述如下：

For ALTER TABLE with the IGNORE keyword, IGNORE is now part of the
information provided to the storage engine. It is up to the storage
engine whether to use this when choosing between the in-place or copy
algorithm for altering the table. For InnoDB index operations, IGNORE 
is not used if the index is unique, so the copy algorithm is used

詳見：http://bugs.mysql.com/bug.php?id=40344

3. 解決方案

當然解決這個問題的tricky的方法還是有的，也比較直白粗暴。具體如下：

ALTER TABLE tableA ENGINE MyISAM;
ALTER IGNORE TABLE tableA ADD UNIQUE INDEX idx_col1_u (col1)
ALTER TABLE table ENGINE InnoDB;

updated in 2013-09-26:

@jyzhou 分享提到，可以不用改成MyISAM，而直接使用set old_alter_table = 1; 的方法。具體做法如下：

set old_alter_table = 1;

ALTER IGNORE TABLE tableA ADD UNIQUE INDEX idx_col1_u (col1)

具體原理：http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_old_alter_table

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Mysql 使用sql刪除同表中重復數據並加唯一索引 mysql刪除多個重復數據，多個字段添加唯一性索引 Oracle 數據庫表中已有重復數據添加唯一鍵（唯一約束） mysql使用唯一索引避免插入重復數據 MySQL 創建唯一索引忽略對已經重復數據的檢查 mysql 刪除重復數據建立聯合唯一索引 MySql使用聯合唯一索引和replace into方法進行重復數據update非重復數據insert mybatis批量插入數據 mybatis批量插入數據 excel 統計大量不重復數據的個數