背景:性能應該是功能的一個重要參考,特別是在大數據的背景之下!寫SQL語句時如果僅考慮業務邏輯,而不去考慮語句效率問題,有可能導致嚴重的效率問題,導致功能不可用或者資源消耗過大。其中的一種情況是,處理每日增量數據的程序,實際執行過程中可能會進行全表掃描,效率與全量程序並無二致。
案例:
mio_log數據量:134,092,418條記錄
freph_a01_fromtask3數據量:176,581,388條記錄
生產系統上按照業務處理邏輯編寫的SQL語句核心代碼如下:
- SELECT (CASE
- WHEN c.in_force_dateISNOT NULL
- THEN (CASE
- WHEN a.mio_date>=c.in_force_dateTHENa.mio_date
- ELSE c.in_force_date
- END )
- WHEN c.in_force_dateISNULL THEN (CASE
- WHEN a.mio_date>=a.plnmio_dateTHENa.mio_date
- ELSE a.plnmio_date
- END )
- ELSE a.mio_date
- END ) mio_date
- FROM dbo.mio_loga
- INNER JOIN dbo.freph_a01_fromtask3c
- ON a.cntr_no = c.cntr_no
- AND a.pol_code=c.pol_code
- WHERE ((c.in_force_dateISNOT NULL
- AND((CASE
- WHEN a.mio_date>=c.in_force_dateTHENa.mio_date
- ELSE c.in_force_date
- END ) BETWEEN @stat_begindateAND@stat_enddate))
- OR(c.in_force_dateISNULL
- AND((CASE
- WHEN a.mio_date>=a.plnmio_dateTHENa.mio_date
- ELSE a.plnmio_date
- END ) BETWEEN @stat_begindateAND@stat_enddate)) )
SELECT (CASE WHEN c.in_force_dateISNOT NULL THEN (CASE WHEN a.mio_date>=c.in_force_dateTHENa.mio_date ELSE c.in_force_date END ) WHEN c.in_force_dateISNULL THEN (CASE WHEN a.mio_date>=a.plnmio_dateTHENa.mio_date ELSE a.plnmio_date END ) ELSE a.mio_date END ) mio_date FROM dbo.mio_loga INNER JOIN dbo.freph_a01_fromtask3c ON a.cntr_no = c.cntr_no AND a.pol_code=c.pol_code WHERE ((c.in_force_dateISNOT NULL AND((CASE WHEN a.mio_date>=c.in_force_dateTHENa.mio_date ELSE c.in_force_date END ) BETWEEN @stat_begindateAND@stat_enddate)) OR(c.in_force_dateISNULL AND((CASE WHEN a.mio_date>=a.plnmio_dateTHENa.mio_date ELSE a.plnmio_date END ) BETWEEN @stat_begindateAND@stat_enddate)) )
導致雖然mio_log表的mio_date、plnmio_date字段,以及freph_a01_fromtask3表的in_force_date字段上均有索引,但是由於兩表不同字段進行CASE WHEN比較,執行計划為聚集索引掃描:
優化思路:
由於mio_log表的mio_date、plnmio_date字段,以及freph_a01_fromtask3表的in_force_date字段上均有索引,可先通過單個mio_date、in_force_date、plnmio_date索引取出增量時間段數據,在增量數據上進行不同表、字段的比對。
- SELECT (CASE
- WHEN in_force_date IS NOT NULL
- THEN ( CASE
- WHEN mio_date >= in_force_dateTHENmio_date
- ELSE in_force_date
- END )
- WHEN in_force_date IS NULL
- THEN ( CASE
- WHEN mio_date >= plnmio_dateTHENmio_date
- ELSE plnmio_date
- END )
- ELSE mio_date
- END ) mio_date
- from(
- SELECT a.mio_date,
- c.in_force_date,
- a.plnmio_date,
- a.MIO_LOG_ID
- FROM dbo.mio_loga
- INNER JOIN dbo.freph_a01_fromtask3c
- ON a.cntr_no = c.cntr_no
- ANDa.pol_code=c.pol_code
- WHERE
- a.mio_dateBETWEEN@stat_begindateAND@stat_enddate
- union
- SELECT a.mio_date,
- c.in_force_date,
- a.plnmio_date,
- a.MIO_LOG_ID
- FROM dbo.mio_loga
- INNER JOIN dbo.freph_a01_fromtask3c
- ON a.cntr_no = c.cntr_no
- ANDa.pol_code=c.pol_code
- WHERE
- c.in_force_dateBETWEEN@stat_begindateAND@stat_enddate
- union
- SELECT a.mio_date,
- c.in_force_date,
- a.plnmio_date,
- a.MIO_LOG_ID
- FROM dbo.mio_loga
- INNER JOIN dbo.freph_a01_fromtask3c
- ON a.cntr_no = c.cntr_no
- ANDa.pol_code=c.pol_code
- WHERE
- a.plnmio_dateBETWEEN@stat_begindateAND@stat_enddate
- ) T
- WHERE ((in_force_dateIS NOT NULL
- AND((CASE
- WHEN mio_date>= in_force_dateTHENmio_date
- ELSE in_force_date
- END ) BETWEEN @stat_begindateAND@stat_enddate))
- OR(in_force_dateIS NULL
- AND((CASE
- WHEN mio_date>= plnmio_dateTHENmio_date
- ELSE plnmio_date
- END ) BETWEEN @stat_begindateAND@stat_enddate)) )
SELECT (CASE WHEN in_force_date IS NOT NULL THEN ( CASE WHEN mio_date >= in_force_dateTHENmio_date ELSE in_force_date END ) WHEN in_force_date IS NULL THEN ( CASE WHEN mio_date >= plnmio_dateTHENmio_date ELSE plnmio_date END ) ELSE mio_date END ) mio_date from( SELECT a.mio_date, c.in_force_date, a.plnmio_date, a.MIO_LOG_ID FROM dbo.mio_loga INNER JOIN dbo.freph_a01_fromtask3c ON a.cntr_no = c.cntr_no ANDa.pol_code=c.pol_code WHERE a.mio_dateBETWEEN@stat_begindateAND@stat_enddate union SELECT a.mio_date, c.in_force_date, a.plnmio_date, a.MIO_LOG_ID FROM dbo.mio_loga INNER JOIN dbo.freph_a01_fromtask3c ON a.cntr_no = c.cntr_no ANDa.pol_code=c.pol_code WHERE c.in_force_dateBETWEEN@stat_begindateAND@stat_enddate union SELECT a.mio_date, c.in_force_date, a.plnmio_date, a.MIO_LOG_ID FROM dbo.mio_loga INNER JOIN dbo.freph_a01_fromtask3c ON a.cntr_no = c.cntr_no ANDa.pol_code=c.pol_code WHERE a.plnmio_dateBETWEEN@stat_begindateAND@stat_enddate ) T WHERE ((in_force_dateIS NOT NULL AND((CASE WHEN mio_date>= in_force_dateTHENmio_date ELSE in_force_date END ) BETWEEN @stat_begindateAND@stat_enddate)) OR(in_force_dateIS NULL AND((CASE WHEN mio_date>= plnmio_dateTHENmio_date ELSE plnmio_date END ) BETWEEN @stat_begindateAND@stat_enddate)) )
該語句存在兩個問題:
1. 如果子查詢中mio_log、freph_a01_fromtask3沒有主鍵,則需通過ROWID標識不同記錄,即如果沒有主鍵,可以通過ROWID進行替換。
ROWID這個概念在Oracle中非常重要,使用也非常廣泛,其意義如下:
ROWIDPseudocolumn
Foreach row in the database, the ROWID pseudocolumn returns the address of therow. oracle Database rowid values contain information necessary to locate arow:
· The dataobject number of the object
· The datablock in the datafile in which the row resides
· The positionof the row in the data block (first row is 0)
· The datafilein which the row resides (first file is 1). The file number is relative to thetablespace.
SQLServer中並沒有ROWID這個概念, SQL Server2008及以后版本中%%physloc%%虛擬列與ROWID最相近,信息如下:
The closest equivalent tothis in SQL Server is the rid
which has three componentsFile:Page:Slot
.
In SQL Server 2008 it ispossible to use the undocumented and unsupported %%physloc%%
virtual column to see this. Thisreturns a binary(8)
value with the Page ID in the firstfour bytes, then 2 bytes for File ID, followed by 2 bytes for the slot locationon the page.
The scalar function sys.fn_PhysLocFormatter
or the sys.fn_PhysLocCracker
TVF can be used to convert this into amore readable form.
- CREATE TABLET(XINT);
- INSERT INTOTVALUES(1),(2)
- SELECT %%physloc%%AS[%%physloc%%],
- sys.fn_PhysLocFormatter(%%physloc%%)AS[File:Page:Slot]
- FROM T
CREATE TABLET(XINT); INSERT INTOTVALUES(1),(2) SELECT %%physloc%%AS[%%physloc%%], sys.fn_PhysLocFormatter(%%physloc%%)AS[File:Page:Slot] FROM T
%%physloc%% |
File:Page:Slot |
0x7600000001000000 |
(1:118:0) |
0x7600000001000100 |
(1:118:1) |
Note that this is not leveraged by the queryprocessor. Whilst it is possible to use this in a WHERE
clause
- SELECT *FROMT
- WHERE %%physloc%%=0x7600000001000000
SELECT *FROMT WHERE %%physloc%%=0x7600000001000000
SQL Server will not directly seek to thespecified row. Instead it will do a full table scan, evaluate %%physloc%% foreach row and return the one that matches (if any do).
2. 該語句有parameter sniffing問題:
當使用存儲過程的時候,總是要使用到一些變量。變量有兩種,一種是在存儲過程的外面定義的,當調用存儲過程的時候,必須要給它代入值,SQLServer在編譯時知道它的值是多少。還有一種變量是在存儲過程里面定義的。它的值是在存儲過程的語句執行過程中得到的。對這種本地變量,SQLServer在編譯時不知道它的值是多少。
SQLServer在處理存儲過程時,為了節省編譯時間,是一次編譯多次使用的。那么計划重用就有兩個潛在問題:
(1) 對於第一類變量,根據第一次運行時帶入的值生成的執行計划,是不是就能夠適合所有可能的變量值?
(2) 對於第二類本地變量,SQL Server在編譯時並不知道它的值是多少,那怎么選擇“合適”的執行計划?
parametersniffing”問題的定義:因為語句的執行計划對變量值很敏感,而導致重用執行計划會遇到性能問題。本地變量做出來的執行計划是一種比較“中庸”的方法,一般不會有parameter sniffing那么嚴重,很多時候,它還是解決parametersniffing的一個候選方案。
解決parameter sniffing問題的方法:
(1) 用exec()方式運行動態SQL語句:如果在存儲過程里不是直接運行語句,而是把語句帶上變量,生成一個字符串,再讓exec()命令多動態語句運行,那SQL Server就會在運行到這個語句的時候,對動態語句進行編譯。這時,SQLServer已經知道了變量的值,會根據值生成優化的執行計划,從而繞過parametersniffing問題。
(2) 使用本地變量:如果把變量值賦給一個本地變量,SQLServer在編譯的時候是沒有辦法知道這個本地變量的值的。所以它會根據表格里數據的一般分布情況“猜測”一個返回值。不管用戶在調用存儲過程的時候帶入的變量值是多少,做出來的執行計划都是一樣的。而這樣的執行計划一般比較“中庸”,不會是最優的執行計划,但是對大多數變量值來講,也不會是一個很差的執行計划。該方法的好處是保持了存儲過程的優點,缺點是要修改存儲過程,而執行計划也不是最優的。
(3) 在語句里使用query hint指定執行計划:
在SELECT、INSERT、UPDATE、DELETE語句最后,可以加一個“Option(<query_hint>)”子句,對SQL Server將要生成的執行計划進行指導。目前的query_hint很強大,有十幾種hint。完整的定義如下:
- <query_hint>::=
- { {HASH| ORDER } GROUP
- | {CONCAT| HASH | MERGE} UNION
- | {LOOP| MERGE | HASH} JOIN
- | FASTnumber_rows
- | FORCEORDER
- | MAXDOPnumber_of_processors
- | OPTIMIZEFOR( @vaariable_name= literal_constant[ , ...n ])
- | PARAMETERIZATION{SIMPLE | FORCED }
- | RECOMPILE
- | ROBUSTPLAN
- | KEEPPLAN
- | KEEPFIXEDPLAN
- | EXPANDVIEWS
- | MAXRECURSIONnumber
- | USEPLANN'xml_plan'
- }
<query_hint>::= { {HASH| ORDER } GROUP | {CONCAT| HASH | MERGE} UNION | {LOOP| MERGE | HASH} JOIN | FASTnumber_rows | FORCEORDER | MAXDOPnumber_of_processors | OPTIMIZEFOR( @vaariable_name= literal_constant[ , ...n ]) | PARAMETERIZATION{SIMPLE | FORCED } | RECOMPILE | ROBUSTPLAN | KEEPPLAN | KEEPFIXEDPLAN | EXPANDVIEWS | MAXRECURSIONnumber | USEPLANN'xml_plan' }
這些hint的用途不一樣。有些是引導執行計划使用什么樣的運算的,例如{HASH| ORDER } GROUP、{CONCAT | HASH | MERGE} UNION、{LOOP| MERGE|HASH} JOIN。有些是防止重編譯的,例如PARAMETERIZATION{SIMPLE | FORCED }、KEEPPLAN、KEEPFIXEDPLAN,有些是強制重編譯的,如RECOMPILE。有些是影響執行計划的選擇的,如FASTnumber_rows、FORCEORDER、MAXDOPnumber_of_processors、OPTIMIZEFOR( @vaariable_name= literal_constant[ , ...n ]),它們是和在不同的場合。具體定義參見SQL Server聯機幫助。
為避免parameter sniffing問題,主要有以下幾種常見query hint
(1)Recompile
Recompile這個查詢提示告訴SQL Server,語句在每一次存儲過程運行的時候,都要重新編譯一下。這樣就能夠使SQL Server根據當前變量的值,選一個最好的執行計划。對前面的那個例子,我們可以這么改寫。
- CREATE PROCNosniff_queryhint_recompile(@iINT)
- AS
- SELECT Count(b.SalesOrderID),
- Sum(p.Weight)
- FROM dbo.SalesOrderHeader_testa
- INNER JOIN dbo.SalesOrderDetail_testb
- ON a.SalesOrderID=b.SalesOrderID
- INNER JOIN Production.Productp
- ON b.ProductID=p.ProductID
- WHERE a.SalesOrderID=@i
- OPTION (recompile)
- go
CREATE PROCNosniff_queryhint_recompile(@iINT) AS SELECT Count(b.SalesOrderID), Sum(p.Weight) FROM dbo.SalesOrderHeader_testa INNER JOIN dbo.SalesOrderDetail_testb ON a.SalesOrderID=b.SalesOrderID INNER JOIN Production.Productp ON b.ProductID=p.ProductID WHERE a.SalesOrderID=@i OPTION (recompile) go
和這種方法類似的,是在存儲過程的定義里直接指定"recompile",也能達到避免parameter sniffing的效果。
- CREATE PROCNosniff_spcreate_recompile(@iINT)
- WITH recompile
- AS
- SELECT Count(b.SalesOrderID),
- Sum(p.Weight)
- FROM dbo.SalesOrderHeader_testa
- INNER JOIN dbo.SalesOrderDetail_testb
- ON a.SalesOrderID=b.SalesOrderID
- INNER JOIN Production.Productp
- ON b.ProductID=p.ProductID
- WHERE a.SalesOrderID=@i
- go
CREATE PROCNosniff_spcreate_recompile(@iINT) WITH recompile AS SELECT Count(b.SalesOrderID), Sum(p.Weight) FROM dbo.SalesOrderHeader_testa INNER JOIN dbo.SalesOrderDetail_testb ON a.SalesOrderID=b.SalesOrderID INNER JOIN Production.Productp ON b.ProductID=p.ProductID WHERE a.SalesOrderID=@i go
(2) 指定JOIN運算
- CREATE PROCNosniff_queryhint_joinhint(@iINT)
- AS
- SELECT Count(b.SalesOrderID),
- Sum(p.Weight)
- FROM dbo.SalesOrderHeader_testa
- INNER JOIN dbo.SalesOrderDetail_testb
- ON a.SalesOrderID=b.SalesOrderID
- INNER hash JOIN Production.Productp
- ON b.ProductID=p.ProductID
- WHERE a.SalesOrderID=@i
- go
CREATE PROCNosniff_queryhint_joinhint(@iINT) AS SELECT Count(b.SalesOrderID), Sum(p.Weight) FROM dbo.SalesOrderHeader_testa INNER JOIN dbo.SalesOrderDetail_testb ON a.SalesOrderID=b.SalesOrderID INNER hash JOIN Production.Productp ON b.ProductID=p.ProductID WHERE a.SalesOrderID=@i go
(3) OPTIMIZEFOR(@variable_name= literal_constant[ , …n] )
使用OPTIMIZE FOR 這個查詢指導,就能夠讓SQL Server做到這一點。這是SQL 2005以后的一個新功能。
- create procNoSniff_QueryHint_OptimizeFor(@iint)as
- select count(b.SalesOrderID),sum(p.Weight)
- from dbo.SalesOrderHeader_testa
- inner joindbo.SalesOrderDetail_testb
- on a.SalesOrderID=b.SalesOrderID
- inner joinProduction.Productp
- on b.ProductID=p.ProductID
- where a.SalesOrderID=@i
- option (optimizefor(@i= 75124))
- go
create procNoSniff_QueryHint_OptimizeFor(@iint)as select count(b.SalesOrderID),sum(p.Weight) from dbo.SalesOrderHeader_testa inner joindbo.SalesOrderDetail_testb on a.SalesOrderID=b.SalesOrderID inner joinProduction.Productp on b.ProductID=p.ProductID where a.SalesOrderID=@i option (optimizefor(@i= 75124)) go
(4) Plan Guide
以上方法有個明顯的局限性,就是徐要修改存儲過程定義。有些時候沒有應用開發組的許可,修改存儲過程是不可以的。對用sp_executesql方式調用的指令,問題更大,因為這些指令可能是寫在應用程序里面而不是SQLServer里。數據庫管理員沒有辦法去修改應用程序。自SQLServer 2005以后,引入和完善了一種叫PlanGuide的功能,數據庫管理員可以告訴SQLServer,當運行某個語句時,請數據庫使用我制定的執行計划。這樣就不許要修改存儲過程或者應用。例如可以用下面的方法,在原來那個有parameter sniffing問題的存儲過程”Sniff”上,解決sniffing問題。
- EXEC sp_create_plan_guide
- @name= N'Guide1',
- @stmt = N'select count(b.SalesOrderID),sum(p.Weight)
- from dbo.SalesOrderHeader_test a
- inner join dbo.SalesOrderDetail_test b
- on a.SalesOrderID = b.SalesOrderID
- inner join Production.Product p
- on b.ProductID = p.ProductID
- where a.SalesOrderID =@i',
- @type = N'OBJECT',
- @module_or_batch = N'Sniff',
- @params = NULL,
- @hints = N'OPTION (optimize for (@i = 75124))';
- go
EXEC sp_create_plan_guide @name= N'Guide1', @stmt = N'select count(b.SalesOrderID),sum(p.Weight) from dbo.SalesOrderHeader_test a inner join dbo.SalesOrderDetail_test b on a.SalesOrderID = b.SalesOrderID inner join Production.Product p on b.ProductID = p.ProductID where a.SalesOrderID =@i', @type = N'OBJECT', @module_or_batch = N'Sniff', @params = NULL, @hints = N'OPTION (optimize for (@i = 75124))'; go
由於以上兩個問題,導致該方案在實際中並不是很好用。
最優解決方案:
總體優化思路與上面的類似,只不過取增量范圍是通過mio_log、in_force_date、plnmio_date字段上的索引取出mio_log_id范圍,這三個索引取出的最大mio_log_id的最大值為@mio_log_id_max,最小的mio_log_id的最小值為@mio_log_id_min,那么增量數據范圍可取出為mio_log_idbetween @mio_log_id_min and @mio_log_id_max。這是因為是瞬間完成的,同時通過mio_log_id取增量時能夠確保走聚集索引。
具體解決方案如下:
- SELECT @mio_log_id_max3=Max(mio_log_id),
- @mio_log_id_min3 = Min(mio_log_id)
- FROM dbo.freph_a01_fromtask3c(INDEX=i2_freph_a01_fromtask3)
- INNER loop JOIN mio_logaWITH(nolock)
- ON a.cntr_no = c.cntr_no
- AND a.pol_code=c.pol_code
- WHERE c.in_force_dateBETWEEN@date_minAND @date_max
- SELECT @mio_log_id_max2=Max(mio_log_id),
- @mio_log_id_min2 = Min(mio_log_id)
- FROM mio_log(INDEX=idx_mio_log_plnmio_date)
- WHERE plnmio_dateBETWEEN@date_minAND @date_max
- SELECT @mio_log_id_max1=Max(mio_log_id),
- @mio_log_id_min1 = Min(mio_log_id)
- FROM mio_log(INDEX=idx_mio_log_mio_date)
- WHERE mio_dateBETWEEN@date_minAND @date_max
- SELECT @mio_log_id_max=dbo.F_find_max(@mio_log_id_max1,@mio_log_id_max2,@mio_log_id_max3)
- SELECT @mio_log_id_min=dbo.F_find_min(@mio_log_id_min1,@mio_log_id_min2,@mio_log_id_min3)
- SELECT (CASE
- WHEN in_force_date IS NOT NULL THEN
- (CASE
- WHEN mio_date>= in_force_dateTHENmio_date
- ELSE in_force_date
- END )
- WHEN in_force_date IS NULL THEN
- (CASE
- WHEN mio_date>= plnmio_dateTHENmio_date
- ELSE plnmio_date
- END )
- ELSE mio_date
- END ) mio_date
- FROM (SELECTa.mio_date,
- a.plnmio_date,
- c.in_force_date
- FROM dbo.mio_logaWITH(nolock)
- INNER JOIN dbo.freph_a01_fromtask3cWITH(nolock)
- ON a.cntr_no = c.cntr_no
- AND a.pol_code=c.pol_code
- WHERE mio_log_id BETWEEN @mio_log_id_min AND @mio_log_id_max) T
- WHERE ((t.in_force_dateISNOT NULL
- AND((CASE
- WHEN t.mio_date>=t.in_force_dateTHENt.mio_date
- ELSE t.in_force_date
- END ) BETWEEN @date_minAND@date_max ) )
- OR(t.in_force_dateISNULL
- AND((CASE
- WHEN t.mio_date>=t.plnmio_dateTHENt.mio_date
- ELSE t.plnmio_date
- END ) BETWEEN @date_minAND@date_max ) ) )
SELECT @mio_log_id_max3=Max(mio_log_id), @mio_log_id_min3 = Min(mio_log_id) FROM dbo.freph_a01_fromtask3c(INDEX=i2_freph_a01_fromtask3) INNER loop JOIN mio_logaWITH(nolock) ON a.cntr_no = c.cntr_no AND a.pol_code=c.pol_code WHERE c.in_force_dateBETWEEN@date_minAND @date_max SELECT @mio_log_id_max2=Max(mio_log_id), @mio_log_id_min2 = Min(mio_log_id) FROM mio_log(INDEX=idx_mio_log_plnmio_date) WHERE plnmio_dateBETWEEN@date_minAND @date_max SELECT @mio_log_id_max1=Max(mio_log_id), @mio_log_id_min1 = Min(mio_log_id) FROM mio_log(INDEX=idx_mio_log_mio_date) WHERE mio_dateBETWEEN@date_minAND @date_max SELECT @mio_log_id_max=dbo.F_find_max(@mio_log_id_max1,@mio_log_id_max2,@mio_log_id_max3) SELECT @mio_log_id_min=dbo.F_find_min(@mio_log_id_min1,@mio_log_id_min2,@mio_log_id_min3) SELECT (CASE WHEN in_force_date IS NOT NULL THEN (CASE WHEN mio_date>= in_force_dateTHENmio_date ELSE in_force_date END ) WHEN in_force_date IS NULL THEN (CASE WHEN mio_date>= plnmio_dateTHENmio_date ELSE plnmio_date END ) ELSE mio_date END ) mio_date FROM (SELECTa.mio_date, a.plnmio_date, c.in_force_date FROM dbo.mio_logaWITH(nolock) INNER JOIN dbo.freph_a01_fromtask3cWITH(nolock) ON a.cntr_no = c.cntr_no AND a.pol_code=c.pol_code WHERE mio_log_id BETWEEN @mio_log_id_min AND @mio_log_id_max) T WHERE ((t.in_force_dateISNOT NULL AND((CASE WHEN t.mio_date>=t.in_force_dateTHENt.mio_date ELSE t.in_force_date END ) BETWEEN @date_minAND@date_max ) ) OR(t.in_force_dateISNULL AND((CASE WHEN t.mio_date>=t.plnmio_dateTHENt.mio_date ELSE t.plnmio_date END ) BETWEEN @date_minAND@date_max ) ) )
該方案在實施過程中有兩個問題需要注意:
1. 通過非聚集索引取聚集索引鍵的最大最小值時,其自身生成的執行計划效率低下,需要通過query hint指導SQL Server優化器選擇正確的執行計划:
- set statisticsioon
- set statisticstimeon
- declare @date_mindatetime
- declare @date_maxdatetime
- set @date_min='2013-07-15'
- set @date_max='2013-07-25'
- declare @mio_log_id_max1int
- declare @mio_log_id_min1int
- select @mio_log_id_max1=max(mio_log_id),@mio_log_id_min1=min(mio_log_id)
- from mio_log
- where mio_datebetween@date_minAND @date_max
set statisticsioon set statisticstimeon declare @date_mindatetime declare @date_maxdatetime set @date_min='2013-07-15' set @date_max='2013-07-25' declare @mio_log_id_max1int declare @mio_log_id_min1int select @mio_log_id_max1=max(mio_log_id),@mio_log_id_min1=min(mio_log_id) from mio_log where mio_datebetween@date_minAND @date_max
執行計划如下為兩個並行聚集索引掃描:
之所以通過聚集索引掃描來得到最大、最小mio_log_id,並不是進行完整的聚集索引掃描。SQL Server優化器以為從兩頭分別進行掃描,碰到第一個符合WHERE條件就返回的算法是最優的。而實驗中通過參數得到的實際數據均分布在mio_log的最大端,得到最小的mio_log_id幾乎就掃描了整個mio_log表,因而整個邏輯讀為【到目前為止結果還沒出來……,不等了】 。
該問題可以通過指導SQL Server優化器選擇正確的執行計划解決:
- select @mio_log_id_max1=max(mio_log_id),@mio_log_id_min1=min(mio_log_id)
- from mio_log(index=idx_mio_log_mio_date)
- where mio_datebetween@date_minAND @date_max
select @mio_log_id_max1=max(mio_log_id),@mio_log_id_min1=min(mio_log_id) from mio_log(index=idx_mio_log_mio_date) where mio_datebetween@date_minAND @date_max
執行計划如下:
邏輯讀673,耗時215 ms。
2. 通過freph_a01_fromtask3表in_force_date字段獲取mio_log表的mio_log_id時,其自身生成的執行計划效率低下,需要通過query hint指導SQL Server優化器選擇正確的執行計划:
- SELECT @mio_log_id_max3=Max(mio_log_id),
- @mio_log_id_min3 = Min(mio_log_id)
- FROM dbo.freph_a01_fromtask3c(INDEX=i2_freph_a01_fromtask3)
- INNER loop JOIN mio_logaWITH(nolock)
- ON a.cntr_no = c.cntr_no
- AND a.pol_code=c.pol_code
- WHERE c.in_force_dateBETWEEN@date_minAND @date_max
SELECT @mio_log_id_max3=Max(mio_log_id), @mio_log_id_min3 = Min(mio_log_id) FROM dbo.freph_a01_fromtask3c(INDEX=i2_freph_a01_fromtask3) INNER loop JOIN mio_logaWITH(nolock) ON a.cntr_no = c.cntr_no AND a.pol_code=c.pol_code WHERE c.in_force_dateBETWEEN@date_minAND @date_max
另外,在邏輯優化過程中,還用到了索引覆蓋、關聯字段添加索引、臟讀等技術。
參考資料:
1. SQL Server ROWID: http://stackoverflow.com/questions/909155/equivalent-of-oracles-rowid-in-sql-server
2. 徐海蔚. Microsoft SQL Server企業級平台管理實踐