distinct關鍵字對執行計划的影響

本文轉載自查看原文 2019-08-27 15:57 295 oracle

一、前言

最近看到一段話，"count(distinct 列名)若列上有索引，且有非空約束或在where子句中使用is not null，則會選擇索引快速全掃描。其余情況則選擇全表掃描"，對其中的原理不理解，因此有了以下的實驗。

二、准備工作

1. 准備t1表

SQL> create table t1 as select * from dba_objects;
SQL> insert into t1 select * from t1;
SQL> insert into t1 select * from t1;
SQL> commit;

2. 將object_name列弄出少量的空值

SQL> update t1 set object_name = null where owner = 'SCOTT';

3. 在object_name列上創建普通索引

SQL> create index idx_t1_name on t1(object_name);

4. 收集t1表和t1表上索引的統計信息

SQL> begin
   2 dbms_stats.gather_table_stats(ownname => 'SCOTT',
   3 tabname => 'T1',
   4 estimate_percent => 100,
   5 cascade => true,
   6 no_invalidate => false,
   7 degree => 4);
   8 end;
   9 /

5. 統計t1表的總行數，object_name的行數

SQL> select count(*), count(object_name), count(distinct object_name) from t1;

  COUNT(*) COUNT(OBJECT_NAME) COUNT(DISTINCTOBJECT_NAME)
---------- ------------------ --------------------------
     54068              54060                      10472

至此，准備工作已經完成。t1表有54068行，object_name列有54060行，之所以這個值比總行數少，是因為count(列)的時候不統計該列上的空值。

三、查看執行計划

分別執行下面四條sql，觀察執行計划
a. select count(object_name) from t1;
b. select count(object_name) from t1 where object_name is not null;
c. select count(distinct object_name) from t1 where object_name is not null;
d. select count(distinct object_name) from t1;

1. 執行sql(a)

SQL> set autot on
SQL> select count(object_name) from t1;

COUNT(OBJECT_NAME)
------------------
             54060

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |    19 |    63   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |             |     1 |    19 |            |          |
|   2 |   INDEX FAST FULL SCAN| IDX_T1_NAME | 54068 |  1003K|    63   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

2. 執行sql(b)

SQL> select count(object_name) from t1 where object_name is not null;

COUNT(OBJECT_NAME)
------------------
             54060

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |    19 |    63   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |             |     1 |    19 |            |          |
|*  2 |   INDEX FAST FULL SCAN| IDX_T1_NAME | 54060 |  1003K|    63   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

可以看到sql(a)和sql(b)的執行結果和執行計划都一樣，執行結果一樣很好理解，count(object_name)本來就不會統計object_name為空的行，所以后面有沒有where object_name is not null對結果都沒有影響。
執行計划一樣，也很好理解，都是走的索引快速全掃描，畢竟我只是想得到object_name有多少個值，空值我根本不管，而btree索引剛好也不存儲空值，所以只需要統計object_name上的索引有多少行就行了。

3. 執行sql(c)

SQL> select count(distinct object_name) from t1 where object_name is not null;

COUNT(DISTINCTOBJECT_NAME)
--------------------------
                     10472

-----------------------------------------------------------------------------------------------
| Id  | Operation               | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |             |     1 |    66 |       |   220   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE         |             |     1 |    66 |       |            |          |
|   2 |   VIEW                  | VW_DAG_0    | 10472 |   674K|       |   220   (2)| 00:00:03 |
|   3 |    HASH GROUP BY        |             | 10472 |   194K|  1496K|   220   (2)| 00:00:03 |
|*  4 |     INDEX FAST FULL SCAN| IDX_T1_NAME | 54060 |  1003K|       |    63   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

可以看到sql(c)比sql(b)多了一個distinct關鍵字，執行計划仍然采用的是索引快速全掃描。

4. 執行sql(d)

SQL> select count(distinct object_name) from t1;

COUNT(DISTINCTOBJECT_NAME)
--------------------------
                     10472

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     1 |    66 |       |   349   (1)| 00:00:05 |
|   1 |  SORT AGGREGATE      |          |     1 |    66 |       |            |          |
|   2 |   VIEW               | VW_DAG_0 | 10472 |   674K|       |   349   (1)| 00:00:05 |
|   3 |    HASH GROUP BY     |          | 10472 |   194K|  1496K|   349   (1)| 00:00:05 |
|   4 |     TABLE ACCESS FULL| T1       | 54068 |  1003K|       |   192   (0)| 00:00:03 |
-----------------------------------------------------------------------------------------

可以看到sql(d)在sql(c)的基礎上，刪掉了where object_name is not null，執行結果沒有變，但是執行計划由索引快速全掃描變成了全表掃描。照道理來講，sql(d)依然可以使用索引的快速全掃描就可以得出結果，但是卻選擇了cost更大的全表掃描，這個是為什么呢？

四、問題

a. select count(object_name) from t1;
b. select count(object_name) from t1 where object_name is not null;
c. select count(distinct object_name) from t1 where object_name is not null;
d. select count(distinct object_name) from t1;

sql(a)與sql(b)，都走索引INDEX FAST FULL SCAN，在它的上層是SORT AGGREGATE。也就是掃個索引，統計下索引行數就行了。
sql(c)，也走索引INDEX FAST FULL SCAN，它的上層是HASH GROUP BY，然后是VIEW，最后才是SORT AGGREGATE。
sql(d)，走的是全表掃描，它的上層是HASH GROUP BY，然后是VIEW，最后才是SORT AGGREGATE。

count(object_name)，oracle知道空值對結果沒有什么影響，所以不管加不加where條件，都能走索引。
count(distinct object_name)，oracle估計就懵了，它會在sql中先看看有沒有過濾條件。如果將空值踢掉了，開開心心走索引，沒踢掉，老老實實全表掃描。
這是為啥？

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 SQLite – DISTINCT 關鍵字統計信息對執行計划的影響(一) 使用concat函數和distinct關鍵字 sql 去重關鍵字 distinct 執行計划--WHERE條件的先后順序對執行計划的影響數據庫中的distinct關鍵字（去重） mysql關鍵字講解(join 、order by、group by、having、distinct) mysql去重復關鍵字distinct的用法深入理解Oracle中distinct關鍵字萬字長文詳解HiveSQL執行計划