clickhouse 在order by非常慢的情況下優化(引擎+分段sql)


 

 

1、展示一下order by 在上億級別數據量有多慢

對於clickhouse來說,當表的基礎大到億級別, 如果做查詢后,在做order by 速遞是非常慢的;

比如我有一張表有3億條數據,表結構是:

{

  dvid 設備ID ,
  json 每個設備的信息 ,   dt 時間 ,   filter 過濾條件 ,   order_key 排序條件 }

ENGINE = MergeTree
PARTITION BY toYYYYMMDD(toDate(dt))
PRIMARY KEY (dt, dvid)
ORDER BY (dt, dvid)
SETTINGS index_granularity = 256

 
        

現在要統計3個月(1.5億條)的設備數量,並按照order_key做排序

給出的sql:

select count(1) from 
(select
dt, toUInt32(dvid) as dvid , json fromwhere filter條件 order by order_key排序條件)

很簡單的sql,但是查詢速度很慢:

+---------+
| count() |
+---------+
| 1262100 |
+---------+
1 row in set (37.50 sec)

但是當你把order by去掉后,同樣是數據量查詢速度會非常快

+---------+
| count() |
+---------+
| 1262100 |
+---------+
1 row in set (1.74 sec)

 

2、查看執行計划,看看為什么這么慢

2.1、先看沒有order by的執行計划:

+-------------------------------------------------------------------------------------------------+
| explain                                                                                         |
+-------------------------------------------------------------------------------------------------+
| Expression ((Projection + Before ORDER BY))                                                     |
|   Aggregating                                                                                   |
|     Expression ((Before GROUP BY + Projection))                                                 |
|       SettingQuotaAndLimits (Set limits and quota after reading from storage)                   |
|         Union                                                                                   |
|           Expression ((Convert block structure for query from local replica + Before ORDER BY)) |
|             SettingQuotaAndLimits (Set limits and quota after reading from storage)             |
|               ReadFromStorage (MergeTree)                                                       |
|           ReadFromPreparedSource (Read from remote replica)                                     |
+-------------------------------------------------------------------------------------------------+
9 rows in set (0.05 sec)

大概意思是:

第一步:從每個機器的硬盤上把數據讀取出來

第二步:union一下每個機器讀取的數據

第三步:求count

沒有order by的總耗時:1.7s

2.2、進行order by的執行計划

+-------------------------------------------------------------------------------------------+
| explain                                                                                   |
+-------------------------------------------------------------------------------------------+
| Expression ((Projection + Before ORDER BY))                                               |
|   Aggregating                                                                             |
|     Expression ((Before GROUP BY + Projection))                                           |
|       MergingSorted (Merge sorted streams for ORDER BY)                                   |
|         SettingQuotaAndLimits (Set limits and quota after reading from storage)           |
|           Union                                                                           |
|             Expression (Convert block structure for query from local replica)             |
|               FinishSorting                                                               |
|                 Expression (Before ORDER BY)                                              |
|                   SettingQuotaAndLimits (Set limits and quota after reading from storage) |
|                     Expression (Remove unused columns after reading from storage)         |
|                       Union                                                               |
|                         MergingSorted (Merge sorting mark ranges)                         |
|                           Expression (Calculate sorting key prefix)                       |
|                             ReadFromStorage (MergeTree with order)                        |
|                         MergingSorted (Merge sorting mark ranges)                         |
|                           Expression (Calculate sorting key prefix)                       |
|                             ReadFromStorage (MergeTree with order)                        |
|                         MergingSorted (Merge sorting mark ranges)                         |
|                           Expression (Calculate sorting key prefix)                       |
|                             ReadFromStorage (MergeTree with order) 
...................
                省略一摸一樣的N行
              ...................
 

大概意思:

1、從硬盤讀取需要的數據(部分,因為order by需要在內存里面快速排序,無法讀取全部)

2、按照order by 的key進行排序

3、N多個order by排序完的數據,在做最終匯總,然后對匯總后的數據在做排序(這一步也會根據數據量分成多步完成)

4、最終做聚合求count

最終order by的總耗時:37.5s

3、優化

3.1、切換引擎

上一步慢、猜測可能是clickhouse的mergeTree引擎並不適合做排序操作,於是嘗試了各種引擎,最終得到最適合order by的引擎:ReplicatedAggregatingMergeTree

重建建表:

ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/庫/表', '{replica}')
PARTITION BY toYYYYMMDD(toDate(dt))
PRIMARY KEY dvid
ORDER BY dvid
SETTINGS index_granularity = 8196

然后增加表的數據量,將原來的1.5億條 增加到3.5億條

執行同樣的語句,測試性能:

1、不帶order by

+---------+
| count() |
+---------+
| 3290193 |
+---------+
1 row in set (3.58 sec)
2、帶order by的查詢速度
+---------+
| count() |
+---------+
| 3290193 |
+---------+
1 row in set (4.16 sec)

 

3.2、查看切換ReplicatedAggregatingMergeTree引擎后的執行計划

1、不帶order by的執行計划

+-------------------------------------------------------------------------------------------------+
| explain                                                                                         |
+-------------------------------------------------------------------------------------------------+
| Expression ((Projection + Before ORDER BY))                                                     |
|   Aggregating                                                                                   |
|     Expression ((Before GROUP BY + Projection))                                                 |
|       SettingQuotaAndLimits (Set limits and quota after reading from storage)                   |
|         Union                                                                                   |
|           Expression ((Convert block structure for query from local replica + Before ORDER BY)) |
|             SettingQuotaAndLimits (Set limits and quota after reading from storage)             |
|               ReadFromStorage (MergeTree)                                                       |
|           ReadFromPreparedSource (Read from remote replica)                                     |
+-------------------------------------------------------------------------------------------------+
9 rows in set (0.05 sec)

不帶order by的執行計划MergeTree的引擎一樣,所以沒任何改變,之所以查詢速度從1.74s變到3.58s 是因為基礎數據從原來的1.5億變成現在的3.3億

2、帶order by的執行計划

+-----------------------------------------------------------------------------------------------+
| explain                                                                                       |
+-----------------------------------------------------------------------------------------------+
| Expression ((Projection + Before ORDER BY))                                                   |
|   Aggregating                                                                                 |
|     Expression ((Before GROUP BY + Projection))                                               |
|       MergingSorted (Merge sorted streams for ORDER BY)                                       |
|         SettingQuotaAndLimits (Set limits and quota after reading from storage)               |
|           Union                                                                               |
|             Expression (Convert block structure for query from local replica)                 |
|               MergingSorted (Merge sorted streams for ORDER BY)                               |
|                 MergeSorting (Merge sorted blocks for ORDER BY)                               |
|                   PartialSorting (Sort each block for ORDER BY)                               |
|                     Expression (Before ORDER BY)                                              |
|                       SettingQuotaAndLimits (Set limits and quota after reading from storage) |
|                         ReadFromStorage (MergeTree)                                           |
|             ReadFromPreparedSource (Read from remote replica)                                 |
+-----------------------------------------------------------------------------------------------+
14 rows in set (0.08 sec)

 

對比MergeTree引擎的order by執行計划,ReplicatedAggregatingMergeTree的要簡化很多

優化點顯而易見:分機器/分區/分段的讀取數據做排序,在匯總 , 並且是按照block進行排序的

1、對每台機器,按照一定范圍進行PartialSorting(不按照具體數據排序,而是按照block快排序)

2、對第一步排序好的block進行匯總排序(因為已經知道第一步block的順序了,所以匯總的時候直接對比最大block和最小block就可以了)

3、本地的排序好了,在做分布式的排序進行最終的匯總排序

4、求count

上面的操作很像歸並排序,適應大數據量分布式排序

4、對sql做優化

【做完上一步其實已經慢滿足需求了,但我們的需求並不是單純為了求數量,而是拿到自定義時間段的某個第一個出現的東西】

切換引擎后,查詢速度有了本質的提升,接下來在對sql做優化。假如原來的sql是查詢3個月的數據量,那么還可以進行分段查,比如每15天/一個月 做一次order by

原來的sql:

select count(1) from 
(select 
dt,
toUInt32(dvid) as dvid ,
json 
fromwhere filter條件 and dt >= '2021-01-01' <= '2021-03-31'
order by order_key排序條件)

優化后的sql:

select count(1) from (
    select dt,toUInt32(dvid) as dvid ,json fromwhere filter條件 and dt>='2021-01-01' and dt <= '2021-01-31' order by order_key排序條件
    union all
    select dt,toUInt32(dvid) as dvid ,json fromwhere filter條件 and dt>='2021-02-01' and dt <= '2021-02-31' order by order_key排序條件
    union all
    select dt,toUInt32(dvid) as dvid ,json fromwhere filter條件 and dt>='2021-03-01' and dt <= '2021-03-31' order by order_key排序條件
)

查詢速度:

+---------+
| count() |
+---------+
| 3290193 |
+---------+
1 row in set (2.74 sec)

經過優化引擎+優化sql:最終的查詢速度由原來的37.5s 優化2.74s

 

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM