需求:
小表數據量20w條左右,大表數據量在4kw條左右,需要根據大表篩選出150w條左右的數據並關聯更新小表中5k左右的數據。
性能問題:
對篩選條件中涉及的字段加index后,如下常規的update語句仍耗時半小時左右。
UPDATE WMOCDCREPORT.DM_WM_TRADINGALL A SET ( A.RELATIONSHIPNO, A.PACKAGE ) = (SELECT B.RELATIONSHIPNO, CASE WHEN (B.SEGMENTCODE='52' OR B.SEGMENTCODE ='55' OR B.SEGMENTCODE ='56' OR B.SEGMENTCODE ='59') THEN 'BC' WHEN (B.SEGMENTCODE='66') THEN 'PW' WHEN (B.SEGMENTCODE='60') THEN 'MM' WHEN (B.SEGMENTCODE='65') THEN 'EB' WHEN (B.SEGMENTCODE='61') THEN 'PB' ELSE B.SEGMENTCODE END FROM DATACORE.DF_CUST_HISTORY B WHERE B.ACCOUNT_NO=A.SETTLEMENTACCOUNT AND B.DATA_DATE = '2018-11-30' AND rownum = 1 ) WHERE A.MONTH = 'SEP' AND A.DATA_DATE = '2018-09-30' AND EXISTS ( SELECT 1 FROM DATACORE.DF_CUST_HISTORY C WHERE C.ACCOUNT_NO=A.SETTLEMENTACCOUNT AND C.DATA_DATE = '2018-11-30' );
經過數次搜索,發現同關聯更新有關的技術博客基本上是更新大表數據,比如here.(使用批量更新)。
也分析過執行計划,同預想的性能瓶頸一樣,主要由以下兩個方面造成
(1) DATACORE.DF_CUST_HISTORY數據量太大,本想將某一天的數據select出來提前插入到一張表中,但估計效果不會太明顯,因為插入150w條數據本身也會耗時很長。
(2) 需要更新5k條數據,且每條數據需要到150w條數據中做關聯查詢(時間主要耗在這)。
性能優化:
小表5k,大表150w,理所應當想到采用join的方式並保留小表中的數據。接下來是怎么把join后的數據更新到小表中(不用update)?merge into!
這里還涉及到一個小問題,merge into中的on條件需要保證一一對應,而大表中很可能出現重復的ACCOUNT_NO,所以需要排重,怎么做?用partition by !
優化后的sql(運行時間8-10s):
merge into wmocdcreport.dm_wm_tradingall a using ( select t.rid, t.settlementaccount, tx.relationshipno, case when (tx.segmentcode = '52' or tx.segmentcode = '55' or tx.segmentcode = '56' or tx.segmentcode = '59') then 'BC' when (tx.segmentcode = '66') then 'PW' when (tx.segmentcode = '60') then 'MM' when (tx.segmentcode = '65') then 'EB' when (tx.segmentcode = '61') then 'PB' else tx.segmentcode end as package from ( select rowid rid, dwt.settlementaccount from wmocdcreport.dm_wm_tradingall dwt where dwt.month = 'SEP' and dwt.data_date = '2018-09-30' ) t inner join ( select row_number() over (partition by c.account_no order by c.relationshipno) seq, c.account_no, c.relationshipno, c.segmentcode from datacore.df_cust_history c where c.data_date = '2018-11-30' ) tx on tx.account_no = t.settlementaccount and tx.seq = 1 ) b on (a.rowid = b.rid) when matched then update set a.relationshipno = b.relationshipno, a.package = b.package;