一:重復數據根據單個字段進行判斷
1、首先,查詢表中多余的數據,由關鍵字段(name)來查詢。
select * from OA_ADDRESS_BOOK where name in (select name from OA_ADDRESS_BOOK group by name having count(name)>1)
2、刪除表中重復數據,重復數據是根據單個字段(Name)來判斷,只留有rowid最小的記錄
delete from OA_ADDRESS_BOOK where (Name) in
(select Name from OA_ADDRESS_BOOK group by Name having count(Name) >1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name having count(Name)>1)
二:重復數據根據多個字段進行判斷
1、首先,查詢表中重復數據,由關鍵字段(Name,UNIT_ID)來查詢。
select * from OA_ADDRESS_BOOK book1 where (book1.name,book1.unit_id) in
(select book2.name,book2.unit_id from OA_ADDRESS_BOOK book2 group by book2.name,book2.unit_id having count(*)>1)
2、刪除表中重復數據,重復數據是根據多個字段(Name,UNIT_ID)來判斷,只留有rowid最小的記錄
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)
3、查詢表中重復數據,重復數據是根據多個字段(Name,UNIT_ID)來判斷,不包含rowid最小的記錄
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)
1. 問題描述
BBSCOMMENT表為BBSDETAIL的從表,記錄商戶評價信息。因為數據倒騰來倒騰去的,有很多重復數據。表結構如下:
COMMENT_ID NOT NULL NUMBER --主鍵
DETAIL_ID NOT NULL NUMBER --外鍵,引用BBSDETAIL表
COMMENT_BODY NOT NULL VARCHAR2(500) --評價內容
--其它字段忽略
其中主鍵是沒有重復的,重復的是DETAIL_ID+COMMENT_BODY+......等信息,就是某些商家的評價信息有重復。
2. 解決步驟
2.1 查找表中多余的重復記錄
--查詢出所有有重復的數據 select DETAIL_ID,COMMENT_BODY,count(*) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*)>1 order by DETAIL_ID, COMMENT_BODY; --1955條
2.2 顯示了所有的非冗余的數據
--這一條命令顯示了所有的非冗余的數據 select min(COMMENT_ID) as COMMENT_ID,DETAIL_ID,COMMENT_BODY from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY; --21453條,之所以此值不等於表總記錄數-1955,是因為1955條記錄中,有的重復了不止一次。
2.3 如果記錄數量少(千級別),可以把上面的語句做成子查詢然后直接刪除
--如果表數據量不是很大(1千條以內),可以把上面的語句做成子查詢然后直接刪除 delete from BBSCOMMENT where COMMENT_ID not in( select min(COMMENT_ID) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY ); --782秒,在我這里,2萬條記錄,重復記錄2千多(太慢了!!)
2.4 另一種刪除方法
--這條語句也能夠實現上述功能,但不好測試了,數據已經被我刪除了 --刪除條件一:有重復數據的記錄;條件二:保留最小rowid的記錄。 delete from BBSCOMMENT a where (a.DETAIL_ID,a.COMMENT_BODY) in(select DETAIL_ID,COMMENT_BODY from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*) > 1) and rowid not in (select min(rowid) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*)>1);
2.5 大數據量還是用PL/SQL方便快捷
declare --定義存儲結構 type bbscomment_type is record ( comment_id BBSCOMMENT.COMMENT_ID%type, detail_id BBSCOMMENT.DETAIL_ID%type, comment_body BBSCOMMENT.COMMENT_BODY%type ); bbscomment_record bbscomment_type; --可供比較的變量 v_comment_id BBSCOMMENT.COMMENT_ID%type; v_detail_id BBSCOMMENT.DETAIL_ID%type; v_comment_body BBSCOMMENT.COMMENT_BODY%type; --其它變量 v_batch_size integer := 5000; v_counter integer := 0; cursor cur_dupl is --取出所有有重復的記錄 select COMMENT_ID, DETAIL_ID, COMMENT_BODY from BBSCOMMENT where(DETAIL_ID, COMMENT_BODY) in ( --這些記錄有重復 select DETAIL_ID, COMMENT_BODY from BBSCOMMENT group by DETAIL_ID, COMMENT_BODY having count(*) > 1) order by DETAIL_ID, COMMENT_BODY; begin for bbscomment_record in cur_dupl loop if v_detail_id is null or (bbscomment_record.detail_id != v_detail_id or nvl(bbscomment_record.comment_body, ' ') != nvl(v_comment_body, ' ')) then --首次進入、換記錄了,都重新賦值 v_detail_id := bbscomment_record.detail_id; v_comment_body := bbscomment_record.comment_body; else --其它記錄刪除 delete from BBSCOMMENT where COMMENT_ID = bbscomment_record.comment_id; v_counter := v_counter + 1; if mod(v_counter, v_batch_size) = 0 then --每多少條提交一次 commit; end if; end if; end loop; if v_counter > 0 then --最后一次提交 commit; end if; dbms_output.put_line(to_char(v_counter)||'條記錄被刪除!'); exception when others then dbms_output.put_line('sqlerrm-->' ||sqlerrm); rollback; end;