0 ClickHouse 語法優化規則
ClickHouse 的 SQL 優化規則是基於 RBO(Rule Based Optimization),下面是一些優化規則
1 准備測試用表
1)上傳官方的數據集
將 visits_v1.tar 和 hits_v1.tar 上傳到虛擬機,解壓到 clickhouse 數據路徑下
// 解壓到 clickhouse 數據路徑 sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse //修改所屬用戶 sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets
2)重啟 clickhouse-server
sudo clickhouse restart
3)執行查詢
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
注意:官方的 tar 包,包含了建庫、建表語句、數據內容,這種方式不需要手動建庫、建表,最方便。
hits_v1 表有 130 多個字段,880 多萬條數據
visits_v1 表有 180 多個字段,160 多萬條數據
2 COUNT 優化
在調用 count 函數時,如果使用的是 count() 或者 count(*),且沒有 where 條件,則
會直接使用 system.tables 的 total_rows,例如:
EXPLAIN SELECT count()FROM datasets.hits_v1; Union Expression (Projection) Expression (Before ORDER BY and SELECT) MergingAggregated ReadNothing (Optimized trivial count)
注意 Optimized trivial count ,這是對 count 的優化。
如果 count 具體的列字段,則不會使用此項優化:
EXPLAIN SELECT count(CounterID) FROM datasets.hits_v1; Union Expression (Projection) Expression (Before ORDER BY and SELECT) Aggregating Expression (Before GROUP BY) ReadFromStorage (Read from MergeTree)
3 消除子查詢重復字段
下面語句子查詢中有兩個重復的 id 字段,會被去重:
EXPLAIN SYNTAX SELECT a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a LEFT JOIN ( SELECT UserID, UserID as HaHa, VisitID FROM visits_v1) AS b USING (UserID) limit 3; //返回優化語句: SELECT UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN ( SELECT UserID, VisitID FROM visits_v1 ) AS b USING (UserID) LIMIT 3
4 謂詞下推
當 group by 有 having 子句,但是沒有 with cube、with rollup 或者 with totals 修飾的時候,
having 過濾會下推到 where 提前過濾。例如下面的查詢,HAVING name 變成了 WHERE name,在 group by 之前過濾:
EXPLAIN SYNTAX SELECT UserID FROM hits_v1 GROUP BY UserID HAVING UserID = '8585742290196126178'; //返回優化語句 SELECT UserID FROM hits_v1 WHERE UserID = \'8585742290196126178\' GROUP BY UserID
子查詢也支持謂詞下推:
EXPLAIN SYNTAX SELECT * FROM ( SELECT UserID FROM visits_v1 ) WHERE UserID = '8585742290196126178' //返回優化后的語句 SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = \'8585742290196126178\'
) WHERE UserID = \'8585742290196126178\'
再來一個復雜例子:
EXPLAIN SYNTAX SELECT * FROM ( SELECT * FROM ( SELECT UserID FROM visits_v1) UNION ALL SELECT * FROM ( SELECT UserID FROM visits_v1) ) WHERE UserID = '8585742290196126178' //返回優化后的語句 SELECT UserID FROM ( SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\' UNION ALL SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\'
5 聚合計算外推
聚合函數內的計算,會外推,例如:
EXPLAIN SYNTAX SELECT sum(UserID * 2) FROM visits_v1 //返回優化后的語句 SELECT sum(UserID) * 2 FROM visits_v1
6 聚合函數消除
如果對聚合鍵,也就是 group by key 使用 min、max、any 聚合函數,則將函數消除,
例如:
EXPLAIN SYNTAX SELECT sum(UserID * 2), max(VisitID), max(UserID) FROM visits_v1 GROUP BY UserID //返回優化后的語句 SELECT sum(UserID) * 2, max(VisitID), UserID FROM visits_v1 GROUP BY UserID
7 刪除重復的 order by key
例如下面的語句,重復的聚合鍵 id 字段會被去重:
EXPLAIN SYNTAX SELECT * FROM visits_v1 ORDER BY UserID ASC, UserID ASC, VisitID ASC, VisitID ASC //返回優化后的語句: select …… FROM visits_v1 ORDER BY UserID ASC, VisitID ASC
8 刪除重復的 limit by key
例如下面的語句,重復聲明的 name 字段會被去重:
EXPLAIN SYNTAX SELECT * FROM visits_v1 LIMIT 3 BY VisitID, VisitID LIMIT 10 //返回優化后的語句: select …… FROM visits_v1 LIMIT 3 BY VisitID LIMIT 10
9 刪除重復的 USING Key
例如下面的語句,重復的關聯鍵 id 字段會被去重:
EXPLAIN SYNTAX SELECT a.UserID, a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a LEFT JOIN visits_v1 AS b USING (UserID, UserID) //返回優化后的語句: SELECT UserID, UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN visits_v1 AS b USING (UserID)
10 標量替換
如果子查詢只返回一行數據,在被引用的時候用標量替換,例如下面語句中的total_disk_usage 字段:
EXPLAIN SYNTAX WITH ( SELECT sum(bytes) FROM system.parts WHERE active ) AS total_disk_usage SELECT (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC
LIMIT 10;
//返回優化后的語句: WITH CAST(0, \'UInt64\') AS total_disk_usage SELECT (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10
11 三元運算優化
如果開啟了 optimize_if_chain_to_multiif 參數,三元運算符會被替換成 multiIf 函數,
例如:
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 'atguigu') FROM numbers(10) settings optimize_if_chain_to_multiif = 1; //返回優化后的語句: SELECT multiIf(number = 1, \'hello\', number = 2, \'world\', \'atguigu\') FROM numbers(10) SETTINGS optimize_if_chain_to_multiif = 1