Mysql刪除重復的數據


最近在做一個多線程的爬蟲程序,由於隊列中有重復的數據,盡管程序中有判斷不存在則插入,但由於多個線程並發,導致數據庫中存在部分重復的數據。

 

 程序中的bug已經修復,但重新爬一遍耗時耗力,於是就選擇刪除重復的數據,只保留一條有效數據

解決的思路就是根據確定其數據唯一的聚合字段進行分組,然后只保留一條有效數據

1.查詢重復數據

select * FROM ZYZBBData
WHERE (code,year,report_type) IN (SELECT
                          code,
                          year,
                          report_type
                        FROM (SELECT
                                code,
                                year,
                                report_type
                              FROM ZYZBBData
                              GROUP BY code,year,report_type
                              HAVING COUNT( * ) > 1) a)

 

 2.只保留Id最小的1條數據,過濾出要被刪除的數據

select * FROM ZYZBBData
WHERE (code,year,report_type) IN (SELECT
                          code,
                          year,
                          report_type
                        FROM (SELECT
                                code,
                                year,
                                report_type
                              FROM ZYZBBData
                              GROUP BY code,year,report_type
                              HAVING COUNT( * ) > 1) a)
    AND id NOT IN(SELECT
                    id
                  FROM (SELECT
                          MIN(id) AS id
                        FROM ZYZBBData
                        GROUP BY code,year,report_type
                        HAVING COUNT( * ) > 1) b)

 

3.刪除重復的數據

DELETE
FROM ZYZBBData
WHERE (code,year,report_type) IN (SELECT
                          code,
                          year,
                          report_type
                        FROM (SELECT
                                code,
                                year,
                                report_type
                              FROM ZYZBBData
                              GROUP BY code,year,report_type
                              HAVING COUNT( * ) > 1) a)
    AND id NOT IN(SELECT
                   id
                  FROM (SELECT
                          MIN(id) AS id
                        FROM ZYZBBData
                        GROUP BY code,year,report_type
                        HAVING COUNT( * ) > 1) b)

 

 數據正常

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM