Oracle "HASH GROUP BY"和"SORT GROUP BY"區別以及"無法使用"HASH GROUP BY"的情況
10G以前GROUP BY子句可以返回排序的結果集,即使沒有ORDER BY子句。
原因是因為使用了“SORT GROUP BY”,會自動排序分組字段。
從10G開始以后引入了“HASH GROUP BY”,新的內部排序算法會導致GROUP BY 子句不保證輸出會按分組的列排序,也不保證結果集的順序。
要對分組進行排序,請使用 ORDER BY 子句。
如果未指定 ORDER BY 子句,則檢索行的順序取決於用於從數據庫檢索行的方法。換句話說,這取決於選擇的執行計划。
下邊看下簡單的實驗:
環境:19.13.0.0.0
創建表並插入實驗數據,盡量保證同一會話插入數據保證數據看起來就是無序的,當然實際上也是:
create table zkm (id int,name varchar2(20)); insert into zkm values(1,'a'); insert into zkm values(2,'b'); insert into zkm values(3,'c'); insert into zkm values(9,'i'); insert into zkm values(5,'e'); insert into zkm values(4,'d'); insert into zkm values(8,'h'); insert into zkm values(7,'g'); insert into zkm values(6,'f'); commit;
目標SQL:select id,count(name) from zkm group by id;
參數設置:alter session set statistics_level=all;
使用Hint:NO_USE_HASH_AGGREGATION來禁用“HASH GROUP BY”,這樣目標SQL執行后結果集總是按照ID列進行排序返回。
並且從執行計划看是“SORT GROUP BY”。
17:06:46 ZKM@dev-app73/pdb(9)> select /*+ NO_USE_HASH_AGGREGATION */ id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 9 rows selected. Elapsed: 00:00:00.01 17:06:47 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------- SQL_ID a7kukqrrrvrra, child number 1 ------------------------------------- select /*+ NO_USE_HASH_AGGREGATION */ id,count(name) from zkm group by id Plan hash value: 2238836816 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:00.01 | 6 | | | | | 1 | SORT GROUP BY | | 1 | 9 | 9 |00:00:00.01 | 6 | 2048 | 2048 | 2048 (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 9 |00:00:00.01 | 6 | | | | ---------------------------------------------------------------------------------------------------------------- 15 rows selected. Elapsed: 00:00:00.06
去掉Hint后,再次執行返回的結果集則是無序的。
並且從執行計划看是“HASH GROUP BY”。
17:09:33 ZKM@dev-app73/pdb(9)> select id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 6 1 1 1 7 1 2 1 8 1 5 1 4 1 3 1 9 1 9 rows selected. Elapsed: 00:00:00.01 17:09:34 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------- SQL_ID dqw15j89d8r1b, child number 2 ------------------------------------- select id,count(name) from zkm group by id Plan hash value: 201225912 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:00.01 | 6 | | | | | 1 | HASH GROUP BY | | 1 | 9 | 9 |00:00:00.01 | 6 | 1558K| 1558K| 1063K (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 9 |00:00:00.01 | 6 | | | | ---------------------------------------------------------------------------------------------------------------- 14 rows selected. Elapsed: 00:00:00.10
從排序內存使用大小看,“HASH GROUP BY”使用的內存為1063K,“SORT GROUP BY”為2048bytes。
也可以從v$sql_workarea.last_memory_used獲取信息。
由於數據量比較小,構造大量數據后執行速度為:
17:26:32 ZKM@dev-app73/pdb(9)> select id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 6 2097152 7 2097152 1 2097152 8 2097152 2 2097152 5 2097152 4 2097152 9 2097152 3 2097152 9 rows selected. Elapsed: 00:00:01.65 17:26:34 ZKM@dev-app73/pdb(9)> select /*+ NO_USE_HASH_AGGREGATION */ id,count(name) from zkm group by id; ID COUNT(NAME) ---------- ----------- 1 2097152 2 2097152 3 2097152 4 2097152 5 2097152 6 2097152 7 2097152 8 2097152 9 2097152 9 rows selected. Elapsed: 00:00:03.13
數據量比較大的情況下,“HASH GROUP BY”要更快,當然不能得出“HASH GROUP BY”就一定快的結論。
實際上是因為避免了排序操作所以“HASH GROUP BY”會比”SORT GROUP BY“更快。
無法使用”HASH GROUP BY“的兩種情況
情況1:GROUP BY后有對字段進行ORDER BY。
比如:
17:35:32 ZKM@dev-app73/pdb(9)> select id,count(name) from zkm group by id order by id; ID COUNT(NAME) ---------- ----------- 1 2097152 2 2097152 3 2097152 4 2097152 5 2097152 6 2097152 7 2097152 8 2097152 9 2097152 9 rows selected. Elapsed: 00:00:03.36 17:36:22 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID cns02rbymv6b6, child number 0 ------------------------------------- select id,count(name) from zkm group by id order by id Plan hash value: 2238836816 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:03.36 | 28731 | | | | | 1 | SORT GROUP BY | | 1 | 9 | 9 |00:00:03.36 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:00.49 | 28731 | | | | ---------------------------------------------------------------------------------------------------------------- 14 rows selected. Elapsed: 00:00:00.06
解決方法:使用子查詢先進行GROUP BY操作,然后再外層查詢使用ORDER BY子句進行排序。同時使用/*+ no_merge */防止視圖合並。
17:37:19 ZKM@dev-app73/pdb(9)> select * from (select /*+ no_merge */ id,count(name) from zkm group by id) order by id; ID COUNT(NAME) ---------- ----------- 1 2097152 2 2097152 3 2097152 4 2097152 5 2097152 6 2097152 7 2097152 8 2097152 9 2097152 9 rows selected. Elapsed: 00:00:01.69 17:37:37 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID bxh00bg36g809, child number 0 ------------------------------------- select * from (select /*+ no_merge */ id,count(name) from zkm group by id) order by id Plan hash value: 970191995 ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:01.69 | 28731 | | | | | 1 | SORT ORDER BY | | 1 | 9 | 9 |00:00:01.69 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | VIEW | | 1 | 9 | 9 |00:00:01.69 | 28731 | | | | | 3 | HASH GROUP BY | | 1 | 9 | 9 |00:00:01.69 | 28731 | 1558K| 1558K| 1065K (0)| | 4 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:00.48 | 28731 | | | | ------------------------------------------------------------------------------------------------------------------ 17 rows selected. Elapsed: 00:00:00.06
明顯改寫后的SQL執行速度更快。
原因是雖然還是有排序動作但是排序的結果集更更更更小了,從A-Rows看是9行,而不改寫之前是對全部的行排序。
情況2:在聚合函數中多次使用distinct處理不同字段。
如SQL:select id,count(distinct name),count(distinct id) from zkm group by id order by id;
09:01:40 ZKM@dev-app73/pdb(9)> select id,count(distinct name),count(distinct id) from zkm group by id order by id; ID COUNT(DISTINCTNAME) COUNT(DISTINCTID) ---------- ------------------- ----------------- 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 1 1 7 1 1 8 1 1 9 1 1 9 rows selected. Elapsed: 00:00:14.67 09:01:56 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 7ht3gbdz1z5ts, child number 0 ------------------------------------- select id,count(distinct name),count(distinct id) from zkm group by id order by id Plan hash value: 2238836816 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:14.66 | 28731 | | | | | 1 | SORT GROUP BY | | 1 | 1 | 9 |00:00:14.66 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:01.45 | 28731 | | | | ---------------------------------------------------------------------------------------------------------------- 15 rows selected. Elapsed: 00:00:00.06
可以看出,多個聚合函數中均使用了distinct導致無法用"HASH GROUP BY",因為兩個distinct需要去重,從結果看,對同一結果集可以同時排序兩個以上不同的字段后做去重然后count,卻無法同時對同一結果集做HASH去重去避免排序。
去掉其中一個distinct的話就沒問題,如:select id,count(distinct name),count(id) from zkm group by id order by id;
09:13:56 ZKM@dev-app73/pdb(9)> select id,count(distinct name),count(id) from zkm group by id order by id; ID COUNT(DISTINCTNAME) COUNT(ID) ---------- ------------------- ---------- 1 1 2097152 2 1 2097152 3 1 2097152 4 1 2097152 5 1 2097152 6 1 2097152 7 1 2097152 8 1 2097152 9 1 2097152 9 rows selected. Elapsed: 00:00:02.08 09:14:02 ZKM@dev-app73/pdb(9)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------- SQL_ID 9t4u0dtgn1q0q, child number 0 ------------------------------------- select id,count(distinct name),count(id) from zkm group by id order by id Plan hash value: 1511739550 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 9 |00:00:02.08 | 28731 | | | | | 1 | SORT GROUP BY | | 1 | 9 | 9 |00:00:02.08 | 28731 | 2048 | 2048 | 2048 (0)| | 2 | VIEW | VW_DAG_0 | 1 | 9 | 9 |00:00:02.08 | 28731 | | | | | 3 | HASH GROUP BY | | 1 | 9 | 9 |00:00:02.08 | 28731 | 1452K| 1452K| 1192K (0)| | 4 | TABLE ACCESS FULL| ZKM | 1 | 9 | 18M|00:00:00.44 | 28731 | | | | ---------------------------------------------------------------------------------------------------------------------- 17 rows selected. Elapsed: 00:00:00.07
解決辦法:暫無。
參考文檔
GROUP BY Clause Does Not Guarantee a Sort Without ORDER BY Clause in 10g and Above (文檔 ID 345048.1)
