十八般武藝玩轉GaussDB(DWS)性能調優：Plan hint運用

本文轉載自查看原文 2021-01-14 10:34 355 華為雲新鮮技術分享/ 性能調優/ 算子/ 數據庫/ GaussDB/ SQL

摘要：本文介紹GaussDB(DWS)另一種可以人工干預計划生成的功能--plan hint。

前言

數據庫的使用者在書寫SQL語句時，會根據自己已知的情況盡力寫出性能很高的SQL語句。但是當需要寫大量SQL語句，且有些SQL語句的邏輯極為復雜時，數據庫使用者就很難寫出性能較高的SQL語句。

而每個數據庫都有一個類似人的大腦的查詢優化器模塊，它接收來自語法分析模塊傳遞過來的查詢樹，在這個查詢樹的基礎上進行邏輯上的等價變換、物理執行路徑的篩選，並且把選擇出的最優的執行路徑傳遞給數據庫的執行器模塊。查詢優化器是提升查詢效率非常重要的一個手段。

數據庫查詢優化器的分類詳見博文《GaussDB(DWS)性能調優系列基礎篇一：萬物之始analyze統計信息》。

Plan hint的引入

由於優化器基於統計信息和估算模型生成計划，當估算出現偏差時，計划可能出現問題，性能較差，使語句的執行變得奇慢無比。

通常，查詢優化器的優化過程對數據庫使用者是透明的。在上一篇博文《GaussDB(DWS)性能調優系列實戰篇五：十八般武藝之路徑干預》中，Gauss DB(DWS)提供了可通過配置GUC參數的方式，全局的干預查詢計划的路徑生成。本次，將介紹另一種可以人工干預計划生成的功能--plan hint。Hint是一種通過SQL語句中的注釋傳遞給優化器的指令，優化器使用hint為語句選擇執行計划。在測試或開發環境中，hint對於測試特定訪問路徑的性能非常有用。例如，您可能知道某些表優先進行連接，可以有效減少中間結果集大小，在這種情況下，可以使用提示來指示優化器使用更好的執行計划。

Plan hint功能屬於語句級的調控，僅對當前語句的當前層次生效，可以幫助我們在調優的過程中，針對特定的語句，通過plan hint進行人工干預，選擇更高效的執行計划。

GaussDB(DWS)的Plan hint有以下種類:

Join順序的hint：調整join順序
Scan/Join方法的hint：指定或避免scan/join的方法
Stream方法的hint：指定或避免redistribute/broadcast
行數hint：對於給定結果集，指定行數，或對原有估算值進行計算調整
傾斜值hint：在傾斜優化時，指定需要傾斜處理的特殊值

下面分別對以上幾種plan hint的功能及其在實際中的運用做一下介紹。在下面幾節的介紹中，除傾斜值hint外，都以tpcds中的Q6作為示例。為了能明顯看到hint在查詢優化過程中的作用，我們將store_sales表的統計信息刪除。原始語句和生成的初始計划如下。

示例語句：

 
          explain performance
select a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
               where d_year = 2000
         and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
      from item j
      where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

Plan hint的應用

Join 順序的hint

語法：

格式1：

leading(table_list)

僅指定join順序，不指定內外表順序

格式2：

leading((table_list))

同時指定join順序和內外表順序，內外表順序僅在最外層生效

說明：

table_list為要調整join順序的表名列表，表之間使用空格分隔。可以包含當前層的任意個表（別名），或對於子查詢提升的場景，也可以包含子查詢的hint別名，同時任意表可以使用括號指定優先級。

注意：

表只能用單個字符串表示，不能帶schema。
表如果存在別名，需要優先使用別名來表示該表。
list中的表在當前層或提升的子查詢中必須是唯一的。如果不唯一，需要使用不同的別名進行區分。
同一個表只能在list里出現一次。

示例1：

對於示例中的計划，可以看出，17-22號算子時store_sales表和item表join后生成hash表，store_sales表的數據量很大，store_sales和item表join后未過濾掉任何數據，所以這兩個表join並生成hash表的時間都比較長。根據對tpcds各表中數據分布的了解，我們知道，store_sales表和date_dim進行join，可以過濾掉較多數據，所以，可以使用hint來提示優化器優將store_sales表和date_dim表先進行join，store_sales作為外表，date_dim作為內表，減少中間結果集大小。語句改寫如下：

 
          explain performance
select /*+ leading((s d)) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
   and c.c_customer_sk = s.ss_customer_sk
   and s.ss_sold_date_sk = d.d_date_sk
   and s.ss_item_sk = i.i_item_sk
   and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
              from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

增加了join順序hint的查詢計划如下：

通過調整join順序，使得之后各join的中間結果集都大幅減少，執行時間由34268.322ms降為11095.046ms。

Scan/Join方法的hint

用於指示優化器使用那種scan方法或join方法。

語法：

Join方法的hint格式：

               [no] nestloop|hashjoin|mergejoin(table_list)

Scan方法的hint格式：

               [no] tablescan|indexscan|indexonlyscan(table [index])

說明：

no表示提示優化器不使用這種方法。
table表示hint指定的表，只能指定一個表，如果表存在別名應優先使用別名進行hint。
index表示使用indexscan或indexonlyscan的hint時，指定的索引名稱，當前只能指定一個。

示例2-1：

示例1中得到的執行計划，由於store_sales表的行數估算不准，store_sales和date_dim采用了效率不好的nestloop方式進行join。現在通過本節的hint方法來指示優化器不使用nestloop方式進行join。

 
          explain performance
select /*+ leading((s d)) no nestloop(s d) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
   and c.c_customer_sk = s.ss_customer_sk
   and s.ss_sold_date_sk = d.d_date_sk
   and s.ss_item_sk = i.i_item_sk
   and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
              from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

增加了join方式hint后的計划如下：

從上面的計划中可以看到，優化器對store_sales和date_dim表之間的join方法已經由nestloop改為了hashjoin，且這條語句的執行時間也由11095.046ms降為4644.409ms。

示例2-2：

為了演示scan方式的hint使用，如下在item表的i_item_sk列上創建一個名稱為i_item的索引。

       create index i_item on item(i_item_sk);

通過下面的語句指示優化器訪問別名為i的item表時，使用索引i_item做索引掃描。

 
          explain performance
select /*+ leading((s d)) no nestloop(s d) indexscan(i i_item) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
              from item j
             where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

使用scan的hint指示掃描item表時采用indexscan后的查詢計划如下：

從上面的執行結果看，使用索引掃描后（s 和 d join后，再和item的join采用了mergejoin方式）反而使性能略有下降，所以后面的用例中，我們將不對item表采用索引掃描的方法。

Stream方法的hint

用於指示優化器采用哪種stream方法，可以為broadcast和redistribute。

語法：

       [no] broadcast|redistribute(table_list)

說明：

no表示不使用hint的stream方式。
table_list為進行stream操作的單表或多表join結果集

示例3：

此處作為演示，修改語句如下，通過hint指示優化器對item表掃描的結果使用broadcast方式進行分布。

 
          explain performance
select /*+ leading((s d)) no nestloop(s d) broadcast(i) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 Where a.ca_address_sk = c.c_current_addr_sk
   and c.c_customer_sk = s.ss_customer_sk
   and s.ss_sold_date_sk = d.d_date_sk
   and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
        and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
               from item j
              where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

指示優化器使用broadcast方式分布item結果的查詢計划如下：

可以看出，之前在item掃描后的結果上是redistribute分布方式，現在已經變為了broadcast分布方式。Broadcast分布方式一般用於數據量比較小的結果集上，相反redistribute用於數據量比較大的結果集上。所以，根據執行計划中單表或表join后的結果集大小，可以通過這種方式，調整結果集的分布方式，從而提升查詢的性能。

行數hint

用於指明中間結果集的大小，支持絕對值和相對值的hint。

語法：

       rows(table_list #|+|-|* const)

說明：

#,+,-,*，進行行數估算hint的四種操作符號。#表示直接使用后面的行數進行hint。+,-,*表示對原來估算的行數進行加、減、乘操作，運算后的行數最小值為1行。
const可以是任意非負數，支持科學計數法。

由於store_sales表沒有統計信息，所以在上面的各個計划中可以看到，store_sales表的估計行數和實際行數相差非常大，這就會導致生成了最初的效率比較低的計划。下面我們看看使用行數hint的效果。

示例4：

 
          explain performance
select /*+ rows(s #2880404) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 Where a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
      where d_year = 2000
         and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select avg(j.i_current_price)
               from item j
              where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

具體查詢計划如下：

指定了store_sales表的准確行數后，優化器生成的計划執行時間直接從最初的34268.322ms將為1991.843ms，提升了17倍。這也充分的說明了優化器對統計信息准確性的強烈依賴。

除了可以指明單表的行數，還可指明中間結果集的行數。比如上例中8號算子的實際行數和估計行數也相差較大，我們指明8號算子的結果集行數看看效果。在下面這個例子中，還使用了子鏈接塊名的hint，為子鏈接指定了一個別名，便於在行數hint中指定子鏈接。

 
          explain performance
select /*+ rows(s #2880404) rows(s i tt c a d #2512) */ a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where       a.ca_address_sk = c.c_current_addr_sk
 and c.c_customer_sk = s.ss_customer_sk
 and s.ss_sold_date_sk = d.d_date_sk
 and s.ss_item_sk = i.i_item_sk
 and d.d_month_seq =
      (select distinct (d_month_seq)
       from date_dim
               where d_year = 2000
         and d_moy = 2 )
 and i.i_current_price > 1.2 *
             (select /*+ blockname (tt)*/ avg(j.i_current_price)
      from item j
      where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt
 limit 100; 
         

查詢計划如下：

8號算子的估計行數已經和實際行數一致。由於8號算子不是計划的瓶頸點，所以性能提升並不明顯。

傾斜值hint

用於指明查詢運行時重分布過程中存在傾斜的重分布鍵和傾斜值，針對Join和HashAgg運算中的重分布進行優化。

語法：

指定單表傾斜

              skew(table (column) [(value)])

指定中間結果傾斜

              skew((join_rel) (column) [(values)])

說明：

table表示存在傾斜的單個表名。
join_rel表示參與join的兩個或多個表，如（t1 t2）表示t1和t2 join后的結果存在傾斜。
column表示傾斜表中存在傾斜的一個或多個列。
value表示傾斜的列中存在傾斜的一個或多個值。

示例5：

本節，我們用tpcds中的Q1作為示例，未使用hint前的查詢及計划如下：

 
          explain performance
with customer_total_return as
    (select sr_customer_sk as ctr_customer_sk
          ,sr_store_sk as ctr_store_sk
          ,sum(SR_FEE) as ctr_total_return
     from store_returns
         ,date_dim
     where sr_returned_date_sk = d_date_sk
       and d_year =2000
     group by sr_customer_sk
             ,sr_store_sk)
select  c_customer_id
  from customer_total_return ctr1
      ,store
      ,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
                                 from customer_total_return ctr2
                                where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  and s_store_sk = ctr1.ctr_store_sk
  and s_state = 'NM'
  and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100; 
         

with表達式中group by在做HashAgg中進行重分布時存在傾斜，對應上圖中的10和27號算子。對with表達式中的hashagg進行hint指定，查詢和計划如下：

 
          explain performance
with customer_total_return as
    (select /*+ skew(store_returns(sr_store_sk sr_customer_sk)) */sr_customer_sk as ctr_customer_sk
          ,sr_store_sk as ctr_store_sk
          ,sum(SR_FEE) as ctr_total_return
     from store_returns
         ,date_dim
     where sr_returned_date_sk = d_date_sk
       and d_year =2000
     group by sr_customer_sk
             ,sr_store_sk)
select  c_customer_id
  from customer_total_return ctr1
      ,store
      ,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
                                 from customer_total_return ctr2
                                where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  and s_store_sk = ctr1.ctr_store_sk
  and s_state = 'NM'
  and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100; 
         

作了傾斜hint的查詢計划如下：

從優化后的計划可以看出：對於HashAgg，由於其重分布存在傾斜，所以優化為雙層Agg。

結語

通過上面各節中的示例，展示了Gauss DB(DWS)中plan hint的使用方法，及其對執行計划的影響。數據庫使用者結合自己對數據庫對象、數據分布情況及數據量等信息的了解，或者根據SQL語句的查詢計划分析出其中采用了不正確計划的部分，正確的利用plan hint，提示優化器采用更高效的計划，可以使查詢執行的性能獲得大幅的提升，成為性能調優的一件有利的工具。

本文分享自華為雲社區《GaussDB(DWS)性能調優系列實現篇六：十八般武藝Plan hint運用》，原文作者：wangxiaojuan8 。

點擊關注，第一時間了解華為雲新鮮技術~

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 十八般武藝玩轉GaussDB(DWS)性能調優：總體調優策略十八般武藝玩轉GaussDB(DWS)性能調優：路徑干預十八般武藝玩轉GaussDB(DWS)性能調優：SQL改寫十八般武藝玩轉GaussDB(DWS)性能調優（三）：好味道表定義 web前端教程：CSS 布局十八般武藝都在這里了使用 C# 下載文件的十八般武藝痞子衡嵌入式：在串口波特率識別實例里逐步展示i.MXRT上提升代碼執行性能的十八般武藝編碼編成翔的十八般兵器 SQL Server調優系列玩轉篇（如何利用查詢提示（Hint）引導語句運行）帶你了解WDR-GaussDB(DWS) 的性能監測報告