注意以下幾種數據比對方式適用的前提條件:
1. 所要比對的表的結構是一致的
2. 比對過程中源端和 目標端 表上的數據都是靜態的,沒有任何DML修改
方式1:
假設你所要進行數據比對的數據庫其中有一個版本為11g且該表上有相應的主鍵索引(primary key index)或者唯一非空索引(unique key ¬ null)的話,那么恭喜你! 你可以借助11g 新引入的專門做數據對比的PL/SQL Package dbms_comparison來實現數據校驗的目的,如以下演示:
Source 源端版本為11gR2 :
conn maclean/maclean
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com & www.askmaclean.com
drop table test1;
create table test1 tablespace users as select object_id t1,object_name t2 from dba_objects where object_id is not null;
alter table test1 add primary key(t1);
exec dbms_stats.gather_table_stats('MACLEAN','TEST1',cascade=>TRUE);
create database link maclean connect to maclean identified by maclean using 'G10R21';
Database link created.
以上源端數據庫版本為11.2.0.3 , 源表結構為test1(t1 number primary key,t2 varchar2(128),透過dblink鏈接到版本為10.2.0.1的目標端
conn maclean/maclean
SQL> select * from v$version
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
create table test2 tablespace users as select object_id t1,object_name t2
from dba_objects where object_id is not null;
alter table test2 add primary key(t1);
exec dbms_stats.gather_table_stats('MACLEAN','TEST2',cascade=>TRUE);
目標端版本為10.2.0.1 , 表結構為test2(t1 number primary key,t2 varchar2(128))。
注意這里2張表上均必須有相同的主鍵索引或者偽主鍵索引(pseudoprimary key偽主鍵要求是唯一鍵且所有的成員列均是非空NOT NULL)。
實際創建comparison對象,並實施校驗:
begin
dbms_comparison.create_comparison(comparison_name => 'MACLEAN_TEST_COM',
schema_name => 'MACLEAN',
object_name => 'TEST1',
dblink_name => 'MACLEAN',
remote_schema_name => 'MACLEAN',
remote_object_name => 'TEST2',
scan_mode => dbms_comparison.CMP_SCAN_MODE_FULL);
end;
PL/SQL procedure successfully completed.
SQL> set linesize 80 pagesize 1400
SQL> select * from user_comparison where comparison_name='MACLEAN_TEST_COM';
COMPARISON_NAME COMPA SCHEMA_NAME
------------------------------ ----- ------------------------------
OBJECT_NAME OBJECT_TYPE REMOTE_SCHEMA_NAME
------------------------------ ----------------- ------------------------------
REMOTE_OBJECT_NAME REMOTE_OBJECT_TYP
------------------------------ -----------------
DBLINK_NAME
--------------------------------------------------------------------------------
SCAN_MODE SCAN_PERCENT
--------- ------------
CYCLIC_INDEX_VALUE
--------------------------------------------------------------------------------
NULL_VALUE
--------------------------------------------------------------------------------
LOCAL_CONVERGE_TAG
--------------------------------------------------------------------------------
REMOTE_CONVERGE_TAG
--------------------------------------------------------------------------------
MAX_NUM_BUCKETS MIN_ROWS_IN_BUCKET
--------------- ------------------
LAST_UPDATE_TIME
---------------------------------------------------------------------------
MACLEAN_TEST_COM TABLE MACLEAN
TEST1 TABLE MACLEAN
TEST2 TABLE
MACLEAN
FULL
ORA$STREAMS$NV
1000 10000
20-DEC-11 01.08.44.562092 PM
利用dbms_comparison.create_comparison創建comparison后,新建的comparison會出現在user_comparison視圖中;
以上我們完成了comparison的創建,但實際的校驗仍未發生我們利用10046事件監控這個數據對比過程:
conn maclean/maclean
set timing on;
alter system flush shared_pool;
alter session set events '10046 trace name context forever,level 8';
set serveroutput on
DECLARE
retval dbms_comparison.comparison_type;
BEGIN
IF dbms_comparison.compare('MACLEAN_TEST_COM', retval, perform_row_dif => TRUE) THEN
dbms_output.put_line('No Differences');
ELSE
dbms_output.put_line('Differences Found');
END IF;
END;
/
Differences Found =====> 返回結果為Differences Found,說明數據存在差異並不一致
PL/SQL procedure successfully completed.
Elapsed: 00:00:10.87
===========================10046 tkprof result =========================
SELECT MIN("T1"), MAX("T1")
FROM
"MACLEAN"."TEST1"
SELECT MIN("T1"), MAX("T1")
FROM
"MACLEAN"."TEST2"@MACLEAN
SELECT COUNT(1)
FROM
"MACLEAN"."TEST1" s WHERE ("T1" >= :scan_min AND "T1" <= :scan_max )
SELECT COUNT(1)
FROM
"MACLEAN"."TEST2"@MACLEAN s WHERE ("T1" >= :scan_min AND "T1" <= :scan_max )
SELECT q.wb1, min(q."T1") min_range1, max(q."T1") max_range1, count(*)
num_rows, sum(q.s_hash) sum_range_hash
FROM
(SELECT /*+ FULL(s) */ width_bucket(s."T1", :scan_min1, :scan_max_inc1,
:num_buckets) wb1, s."T1", ora_hash(NVL(to_char(s."T1"), 'ORA$STREAMS$NV'),
4294967295, ora_hash(NVL((s."T2"), 'ORA$STREAMS$NV'), 4294967295, 0))
s_hash FROM "MACLEAN"."TEST1" s WHERE (s."T1">=:scan_min1 AND s."T1"<=
:scan_max1) ) q GROUP BY q.wb1 ORDER BY q.wb1
SELECT /*+ REMOTE_MAPPED */ q.wb1, min(q."T1") min_range1, max(q."T1")
max_range1, count(*) num_rows, sum(q.s_hash) sum_range_hash
FROM
(SELECT /*+ FULL(s) REMOTE_MAPPED */ width_bucket(s."T1", :scan_min1,
:scan_max_inc1, :num_buckets) wb1, s."T1", ora_hash(NVL(to_char(s."T1"),
'ORA$STREAMS$NV'), 4294967295, ora_hash(NVL((s."T2"), 'ORA$STREAMS$NV'),
4294967295, 0)) s_hash FROM "MACLEAN"."TEST2"@MACLEAN s WHERE (s."T1">=
:scan_min1 AND s."T1"<=:scan_max1) ) q GROUP BY q.wb1 ORDER BY q.wb1
SELECT /*+ FULL(P) +*/ * FROM "MACLEAN"."TEST2" P
SELECT /*+ FULL ("A1") */
WIDTH_BUCKET("A1"."T1", :SCAN_MIN1, :SCAN_MAX_INC1, :NUM_BUCKETS),
MIN("A1"."T1"),
MAX("A1"."T1"),
COUNT(*),
SUM(ORA_HASH(NVL(TO_CHAR("A1"."T1"), 'ORA$STREAMS$NV'),
4294967295,
ORA_HASH(NVL("A1"."T2", 'ORA$STREAMS$NV'), 4294967295, 0)))
FROM "MACLEAN"."TEST2" "A1"
WHERE "A1"."T1" >= :SCAN_MIN1
AND "A1"."T1" <= :SCAN_MAX1
GROUP BY WIDTH_BUCKET("A1"."T1", :SCAN_MIN1, :SCAN_MAX_INC1, :NUM_BUCKETS)
ORDER BY WIDTH_BUCKET("A1"."T1", :SCAN_MIN1, :SCAN_MAX_INC1, :NUM_BUCKETS)
SELECT ROWID, "T1", "T2"
FROM "MACLEAN"."TEST2" "R"
WHERE "T1" >= :1
AND "T1" <= :2
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 126 | 3528 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST2 | 126 | 3528 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C006255 | 227 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER(:1)<=TO_NUMBER(:2))
3 - access("T1">=TO_NUMBER(:1) AND "T1"<=TO_NUMBER(:2))
SELECT ll.l_rowid, rr.r_rowid, NVL(ll."T1", rr."T1") idx_val
FROM
(SELECT l.rowid l_rowid, l."T1", ora_hash(NVL(to_char(l."T1"),
'ORA$STREAMS$NV'), 4294967295, ora_hash(NVL((l."T2"), 'ORA$STREAMS$NV'),
4294967295, 0)) l_hash FROM "MACLEAN"."TEST1" l WHERE l."T1">=:scan_min1
AND l."T1"<=:scan_max1 ) ll FULL OUTER JOIN (SELECT /*+ NO_MERGE
REMOTE_MAPPED */ r.rowid r_rowid, r."T1", ora_hash(NVL(to_char(r."T1"),
'ORA$STREAMS$NV'), 4294967295, ora_hash(NVL((r."T2"), 'ORA$STREAMS$NV'),
4294967295, 0)) r_hash FROM "MACLEAN"."TEST2"@MACLEAN r WHERE r."T1">=
:scan_min1 AND r."T1"<=:scan_max1 ) rr ON ll."T1"=rr."T1" WHERE ll.l_hash
IS NULL OR rr.r_hash IS NULL OR ll.l_hash <> rr.r_hash
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 190 | 754K| 9 (12)| 00:00:01 | | |
|* 1 | VIEW | VW_FOJ_0 | 190 | 754K| 9 (12)| 00:00:01 | | |
|* 2 | HASH JOIN FULL OUTER | | 190 | 754K| 9 (12)| 00:00:01 | | |
| 3 | VIEW | | 190 | 7220 | 4 (0)| 00:00:01 | | |
|* 4 | FILTER | | | | | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| TEST1 | 190 | 5510 | 4 (0)| 00:00:01 | | |
|* 6 | INDEX RANGE SCAN | SYS_C0013098 | 341 | | 2 (0)| 00:00:01 | | |
| 7 | VIEW | | 126 | 495K| 4 (0)| 00:00:01 | | |
| 8 | REMOTE | TEST2 | 126 | 3528 | 4 (0)| 00:00:01 | MACLE~ | R->S |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LL"."L_HASH" IS NULL OR "RR"."R_HASH" IS NULL OR "LL"."L_HASH"<>"RR"."R_HASH")
2 - access("LL"."T1"="RR"."T1")
4 - filter(TO_NUMBER(:SCAN_MIN1)<=TO_NUMBER(:SCAN_MAX1))
6 - access("L"."T1">=TO_NUMBER(:SCAN_MIN1) AND "L"."T1"<=TO_NUMBER(:SCAN_MAX1))
Remote SQL Information (identified by operation id):
----------------------------------------------------
8 - SELECT ROWID,"T1","T2" FROM "MACLEAN"."TEST2" "R" WHERE "T1">=:1 AND "T1"<=:2 (accessing
'MACLEAN' )
可以看到以上過程中雖然沒有避免對TEST1、TEST2表的全表掃描(FULL TABLE SCAN), 但是好在實際參與HASH JOIN FULL OUTER 的僅是訪問索引后獲得的少量數據,所以效率還是挺高的。
此外可以通過user_comparison_row_dif了解實際那些row存在差異,如:
SQL> set linesize 80 pagesize 1400 SQL> select * 2 from user_comparison_row_dif 3 where comparison_name = 'MACLEAN_TEST_COM' 4 and rownum < 2; COMPARISON_NAME SCAN_ID LOCAL_ROWID REMOTE_ROWID ------------------------------ ---------- ------------------ ------------------ INDEX_VALUE -------------------------------------------------------------------------------- STA LAST_UPDATE_TIME --- --------------------------------------------------------------------------- MACLEAN_TEST_COM 42 AAATWGAAEAAANBrAAB AAANJrAAEAAB8AMAAd 46 DIF 20-DEC-11 01.18.08.917257 PM
以上利用dbms_comparison包完成了一次簡單的數據比對,該方法適用於11g以上版本且要求表上有主鍵索引或非空唯一索引, 且不支持以下數據類型字段的比對
- LONG
- LONG RAW
- ROWID
- UROWID
- CLOB
- NCLOB
- BLOB
- BFILE
- User-defined types (including object types, REFs, varrays, and nested tables)
- Oracle-supplied types (including any types, XML types, spatial types, and media types)
若要比對存有以上類型字段的表,那么需要在create_comparison時指定column_list參數排除掉這些類型的字段。
方法1 dbms_comparison的優勢在於可以提供詳細的比較信息,且在有適當索引的前提下效率較高。
缺點在於有數據庫版本的要求(at least 11gR1), 且也不支持LONG 、CLOB等字段的比較。
方式2:
利用minus Query 對比數據
這可以說是操作上最簡單的一種方法,如:
select * from test1 minus select * from test2@maclean; ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT| ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 75816 | 3527K| | 1163 (40)| 00:00:14 | | | | 1 | MINUS | | | | | | | | | | 2 | SORT UNIQUE | | 75816 | 2147K| 2984K| 710 (1)| 00:00:09 | | | | 3 | TABLE ACCESS FULL| TEST1 | 75816 | 2147K| | 104 (1)| 00:00:02 | | | | 4 | SORT UNIQUE | | 50467 | 1379K| 1800K| 453 (1)| 00:00:06 | | | | 5 | REMOTE | TEST2 | 50467 | 1379K| | 56 (0)| 00:00:01 | MACLE~ | R->S | ----------------------------------------------------------------------------------------------------- Remote SQL Information (identified by operation id): ---------------------------------------------------- 5 - SELECT "T1","T2" FROM "TEST2" "TEST2" (accessing 'MACLEAN' ) Select * from (select 'MACLEAN.TEST1' "Row Source", a.* from (select /*+ FULL(Tbl1) */ T1, T2 from MACLEAN.TEST1 Tbl1 minus select /*+ FULL(Tbl2) */ T1, T2 from MACLEAN.TEST2@"MACLEAN" Tbl2) A union all select 'MACLEAN.TEST2@"MACLEAN"', b.* from (select /*+ FULL(Tbl2) */ T1, T2 from MACLEAN.TEST2@"MACLEAN" Tbl2 minus select /*+ FULL(Tbl1) */ T1, T2 from MACLEAN.TEST1 Tbl1) B) Order by 1;
MINUS Clause會導致2張表均在本地被全表掃描(TABLE FULL SCAN),且要求發生SORT排序。 若所對比的表上有大量的數據,那么排序的代價將會是非常大的, 因此這種方法的效率不高。
方式2 MINUS的優點在於操作簡便,特別適合於小表之間的數據檢驗。
缺點在於 由於SORT排序可能導致在大數據量的情況下效率很低, 且同樣不支持LOB 和 LONG 這樣的大對象。
方式3:
使用not exists子句,如:
select *
from test1 a
where not exists (select 1
from test2 b
where a.t1 = b.t1
and a.t2 = b.t2);
no rows selected
Elapsed: 00:00:00.06
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 75816 | 7996K| | 691 (1)| 00:00:09 |
|* 1 | HASH JOIN ANTI | | 75816 | 7996K| 3040K| 691 (1)| 00:00:09 |
| 2 | TABLE ACCESS FULL| TEST1 | 75816 | 2147K| | 104 (1)| 00:00:02 |
| 3 | TABLE ACCESS FULL| TEST2 | 77512 | 5979K| | 104 (1)| 00:00:02 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."T1"="B"."T1" AND "A"."T2"="B"."T2")
照理說在數據量較大的情況下not exists使用的HASH JOIN ANTI是在性能上是優於MINUS操作的, 但是當所要比較的表身處不同的2個數據庫(distributed query)時將無法使用HASH JOIN ANTI,而會使用FILTER OPERATION這種效率極低的操作:
select *
from test1 a
where not exists (select 1
from test2@maclean b
where a.t1 = b.t1
and a.t2 = b.t2)
no rows selected
Elapsed: 00:01:05.76
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 75816 | 2147K| 147K (1)| 00:29:31 | | |
|* 1 | FILTER | | | | | | | |
| 2 | TABLE ACCESS FULL| TEST1 | 75816 | 2147K| 104 (1)| 00:00:02 | | |
| 3 | REMOTE | TEST2 | 1 | 29 | 2 (0)| 00:00:01 | MACLE~ | R->S |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "B" WHERE "B"."T1"=:B1 AND "B"."T2"=:B2))
Remote SQL Information (identified by operation id):
----------------------------------------------------
3 - SELECT "T1","T2" FROM "TEST2" "B" WHERE "T1"=:1 AND "T2"=:2 (accessing
'MACLEAN' )
可以從以上執行計划看到FILTER 操作是十分昂貴的。
補充:
有網友反映可以通過增加 unnest hint 讓CBO優化器在遠程子查詢有效的情況下整體考慮整個查詢塊,這樣可以讓執行計划用上HASH JOIN RIGHT ANTI, 這是我一開始沒有考慮到的。
select *
from test1 a
where not exists (select /*+ unnset */
1
from test2@maclean b
where a.t1 = b.t1
and a.t2 = b.t2);
PLAN_TABLE_OUTPUT
------------------------------------------
Plan hash value: 1776635653
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT|
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 79815 | 12M| | 594 (1)| 00:00:08 | | |
|* 1 | HASH JOIN RIGHT ANTI| | 79815 | 12M| 1816K| 594 (1)| 00:00:08 | | |
| 2 | REMOTE | TEST2 | 20420 | 1575K| | 56 (0)| 00:00:01 | MACLE~ | R->S |
| 3 | TABLE ACCESS FULL | TEST1 | 79815 | 6157K| | 104 (1)| 00:00:02 | | |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."T1"="B"."T1" AND "A"."T2"="B"."T2")
Remote SQL Information (identified by operation id):
----------------------------------------------------
2 - SELECT "T1","T2" FROM "TEST2" "B" (accessing 'MACLEAN' )
在此基礎上加入ordered hint 可以讓執行計划使用HASH JOIN ANTI
select /*+ ordered */ *
from test1 a
where not exists (select /*+ unnset */
1
from test2@maclean b
where a.t1 = b.t1
and a.t2 = b.t2);
PLAN_TABLE_OUTPUT
--------------------------------------------------
Plan hash value: 3089912131
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Inst |IN-OUT|
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 79815 | 12M| | 594 (1)| 00:00:08 | | |
|* 1 | HASH JOIN ANTI | | 79815 | 12M| 7096K| 594 (1)| 00:00:08 | | |
| 2 | TABLE ACCESS FULL| TEST1 | 79815 | 6157K| | 104 (1)| 00:00:02 | | |
| 3 | REMOTE | TEST2 | 20420 | 1575K| | 56 (0)| 00:00:01 | MACLE~ | R->S |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."T1"="B"."T1" AND "A"."T2"="B"."T2")
Remote SQL Information (identified by operation id):
----------------------------------------------------
3 - SELECT "T1","T2" FROM "TEST2" "B" (accessing 'MACLEAN' )
方式3 的優點在於操作簡便, 且當需要對比的表位於同一數據庫時效率要比MINUS方式高,但如果是distributed query分布式查詢則效率可能會因FILTER操作而急劇下降,這時候需要我們手動添加unnest這樣的SQL提示,以保證執行計划使用HASH JOIN ANTI操作,這樣能夠保證not exists方式的性能。not exists同樣不支持CLOB等大對象。
方式4:
Toad、PL/SQL Developer等圖形化工具都提供了compare table data的功能, 這里我們以Toad工具為例,介紹如何使用該工具校驗數據:
打開Toad 鏈接數據庫-> 點擊Database-> Compare -> Data
分別在Source 1和Source 2對話框中輸入源表和目標表的信息
因為Toad的底層實際上使用了MINUS操作,所以提高SORT_AREA_SIZE有助於提高compare的性能,若使用AUTO PGA則可以不設置。
選擇所要比較的列
首先可以比較2張表的行數,點擊Execute計算count

使用MINUS 找出其中一張表上有,而另一張沒有的行
使用MINUS 找出所有的差別
Toad的compare data功能是基於MINUS實現的,所以效率上並沒有優勢。但是通過圖形界面省去了寫SQL語句的麻煩。這種方法同樣不支持LOB、LONG等對象。
方式5:
這是一種別出心裁的做法。 將一行數據的上所有字段合並起來,並使用dbms_utility.get_hash_value對合並后的中間值取hash value,再將所有這些從各行所獲得的hash值sum累加, 若2表的hash累加值相等則判定2表的數據一致。
簡單來說,如下面這樣:
create table hash_one as select object_id t1,object_name t2 from dba_objects;
select dbms_utility.get_hash_value(t1||t2,0,power(2,30)) from hash_one where rownum <3;
DBMS_UTILITY.GET_HASH_VALUE(T1||T2,0,POWER(2,30))
-------------------------------------------------
89209477
757190129
select sum(dbms_utility.get_hash_value(t1||t2,0,power(2,30))) from hash_one;
SUM(DBMS_UTILITY.GET_HASH_VALU
------------------------------
40683165992756
select sum(dbms_utility.get_hash_value(object_id||object_name,0,power(2,30))) from dba_objects;
SUM(DBMS_UTILITY.GET_HASH_VALU
------------------------------
40683165992756
對於列較多的表,手動去構造所有字段合並可能會比較麻煩,利用以下SQL可以快速構造出我們所需要的語句:
放到PL/SQL Developer等工具中運行,在sqlplus 中可能因ORA-00923: FROM keyword not found where expected出錯
select 'select sum(dbms_utility.get_hash_value('||column_name_path||',0,power(2,30)) ) from '||owner||'.'||table_name||';' from (select owner,table_name,column_name_path,row_number() over(partition by table_name order by table_name,curr_level desc) column_name_path_rank from (select owner,table_name,column_name,rank,level as curr_level,ltrim(sys_connect_by_path(column_name,'||''|''||'),'||''|''||') column_name_path from (select owner,table_name,column_name,row_number() over(partition by table_name order by table_name,column_name) rank from dba_tab_columns where owner=UPPER('&OWNER') and table_name=UPPER('&TABNAME') order by table_name,column_name) connect by table_name = prior table_name and rank-1 = prior rank)) where column_name_path_rank=1;
使用示范:
SQL> @get_hash_col
Enter value for owner: SYS
Enter value for tabname: TAB$
'SELECTSUM(DBMS_UTILITY.GET_HASH_VALUE('||COLUMN_NAME_PATH||',0,POWER(2,30)))FROM
--------------------------------------------------------------------------------
select sum(dbms_utility.get_hash_value(ANALYZETIME||'|'||AUDIT$||'|'||AVGRLN||'|
'||AVGSPC||'|'||AVGSPC_FLB||'|'||BLKCNT||'|'||BLOCK#||'|'||BOBJ#||'|'||CHNCNT||'
|'||CLUCOLS||'|'||COLS||'|'||DATAOBJ#||'|'||DEGREE||'|'||EMPCNT||'|'||FILE#||'|'
||FLAGS||'|'||FLBCNT||'|'||INITRANS||'|'||INSTANCES||'|'||INTCOLS||'|'||KERNELCO
LS||'|'||MAXTRANS||'|'||OBJ#||'|'||PCTFREE$||'|'||PCTUSED$||'|'||PROPERTY||'|'||
ROWCNT||'|'||SAMPLESIZE||'|'||SPARE1||'|'||SPARE2||'|'||SPARE3||'|'||SPARE4||'|'
||SPARE5||'|'||SPARE6||'|'||TAB#||'|'||TRIGFLAG||'|'||TS#,0,1073741824) ) from S
YS.TAB$;
利用以上生成的SQL 計算表的sum(hash)值
select sum(dbms_utility.get_hash_value(ANALYZETIME || '|' || AUDIT$ || '|' ||
AVGRLN || '|' || AVGSPC || '|' ||
AVGSPC_FLB || '|' || BLKCNT || '|' ||
BLOCK# || '|' || BOBJ# || '|' ||
CHNCNT || '|' || CLUCOLS || '|' || COLS || '|' ||
DATAOBJ# || '|' || DEGREE || '|' ||
EMPCNT || '|' || FILE# || '|' ||
FLAGS || '|' || FLBCNT || '|' ||
INITRANS || '|' || INSTANCES || '|' ||
INTCOLS || '|' || KERNELCOLS || '|' ||
MAXTRANS || '|' || OBJ# || '|' ||
PCTFREE$ || '|' || PCTUSED$ || '|' ||
PROPERTY || '|' || ROWCNT || '|' ||
SAMPLESIZE || '|' || SPARE1 || '|' ||
SPARE2 || '|' || SPARE3 || '|' ||
SPARE4 || '|' || SPARE5 || '|' ||
SPARE6 || '|' || TAB# || '|' ||
TRIGFLAG || '|' || TS#,
0,
1073741824))
from SYS.TAB$;
SUM(DBMS_UTILITY.GET_HASH_VALU
------------------------------
1646389632463
方式5 利用累加整行數據的hash來判定表上數據是否一致, 僅需要對2張表做全表掃描,效率上是這幾種方法中最高的, 且能保證較高的准確率。
但是該hash方式存在以下幾點不足:
1. 所有字段合並的整行數據可能超過4000字節,這時會出現ORA-1498錯誤。換而言之使用這種方式的前提是表中任一行的行長不能超過4000 bytes,當然常規情況下很少會有一行數據超過4000 bytes,也可以通過dba_tables.avg_row_len平均行長的統計信息來判定,若avg_row_len<<4000 那么一般不會有溢出的問題。
2. 該hash 方式僅能幫助判斷 數據是否一致, 而無法提供更多有用的,例如是哪些行不一致等細節信息
3. 同樣的該hash方式對於lob、long字段也無能為力
利用SQL可以快速構造產生sum(hash)值的SQL語句,也就是一個行轉列的功能吧?用wmsys.wm_wm_concat函數可以大大簡化SQL代碼:







select 'select sum(ora_hash('||enames||',power(2,30),0)) from '||owner||'.'||table_name||';'
from
(select owner, table_name, replace(wmsys.wm_concat(column_name),',','||' ) as enames
from (select owner, table_name, column_name
from dba_tab_columns
where owner = UPPER('&tabowner')
AND TABLE_NAME = UPPER('&tabname'))
group by owner, table_name);
另外,我測試發現用ora_hash計算hash值,比dbms_utility.get_hash_value效率要高很多。
轉自http://www.oracledatabase12g.com/archives/oracle-compare-data-between-tables-method.html