mysql中數據去重和優化

本文轉載自查看原文 2013-05-15 15:00 16587 隨手記下

更改表user_info的主鍵uid為自增的id后，忘了設置原來主鍵uid屬性為unique，結果導致產生uid重復的記錄。為此需要清理后來插入的重復記錄。

基本方法可以參考后面的附上的資料，但是由於mysql不支持同時對一個表進行操作，即子查詢和要進行的操作不能是同一個表，因此需要通過零時表中轉一下。

寫在前面：數據量大時，一定要多涉及的關鍵字段創建索引！！！否則很慢很慢很慢，慢到想死的心都有了

1 單字段重復

生成零時表，其中uid是需要去重的字段

create table tmp_uid as (select uid from user_info group by uid having count(uid))

create table tmp_id as (select min(id) from user_info group by uid having count(uid))

數據量大時一定要為uid創建索引

create index index_uid on tmp_uid

create index index_id on tmp_id

刪除多余的重復記錄，保留重復項中id最小的

delete from user_info
where id not in (select id from tmp_id)
and uid in (select uid from tmp_uid)

2.多字段重復

由uid的重復間接的導致了relationship中的記錄重復，故繼續去重。先介紹正常處理流程，在介紹本人根據自身數據特點實踐的更加有效的方法！

2.1一般方法

基本的同上面：

生成零時表

create table tmp_relation as (select source,target from relationship group by source,target having count(*)>1)

create table tmp_relationship_id as (select min(id) as id from relationship group by source,target having count(*)>1)

創建索引

create index index_id on tmp_relationship_id

刪除

delete from relationship
where id not in (select id from tmp_relationship_id)
and (source,target) in (select source,target from relationship)

2.2 實踐出真知

實踐中發現上面的刪除字段重復的方法，由於沒有辦法為多字段重建索引，導致數據量大時效率極低，低到無法忍受。最后，受不了等了半天沒反應的狀況，本人決定，另辟蹊徑。

考慮到，估計同一記錄的重復次數比較低。一般為2，或3，重復次數比較集中。所以可以嘗試直接刪除重復項中最大的，直到刪除到不重復，這時其id自然也是當時重復的里邊最小的。

大致流程如下：

1）選擇每個重復項中id最大的一個記錄

create table tmp_relation_id2 as (select max(id) from relationship group by source,target having count(*)>1)

2）創建索引（僅需在第一次時執行）

create index index_id on tmp_relation_id2

3）刪除重復項中id最大的記錄

delete from relationship where id in (select id from tmp_relation_id2)

4）刪除臨時表

drop table tmp_relation_id2

重復上述步驟1），2），3），4），直到創建的臨時表中不存在記錄就結束（對於重復次數的數據，比較高效）

參考：

查詢及刪除重復記錄的方法 http://wenku.baidu.com/view/d4b3f134b90d6c85ec3ac6ad.html

查詢及刪除重復記錄的方法

(一) 1、查找表中多余的重復記錄，重復記錄是根據單個字段（peopleId）來判斷 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

2、刪除表中多余的重復記錄，重復記錄是根據單個字段（peopleId）來判斷，只留有rowid最小的記錄 delete from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1) and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )>1)

3、查找表中多余的重復記錄（多個字段） select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

4、刪除表中多余的重復記錄（多個字段），只留有rowid最小的記錄 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 去重查詢表mysql 中數據 mysql去重數據 MySQL中在原表中做數據去重（按日期去重，保留id最小的記錄） mysql對表中數據根據某一字段去重 Mysql數據表去重 mysql數據庫之去重 mysql之數據去重並記錄總數 MySql數據庫去重 list對象中的數據如何去重呢？ mySql數據重復數據去重