一、前言
最近看到一段話,"count(distinct 列名)若列上有索引,且有非空約束或在where子句中使用is not null,則會選擇索引快速全掃描。其余情況則選擇全表掃描",對其中的原理不理解,因此有了以下的實驗。
二、准備工作
1. 准備t1表
SQL> create table t1 as select * from dba_objects;
SQL> insert into t1 select * from t1;
SQL> insert into t1 select * from t1;
SQL> commit;
2. 將object_name列弄出少量的空值
SQL> update t1 set object_name = null where owner = 'SCOTT';
3. 在object_name列上創建普通索引
SQL> create index idx_t1_name on t1(object_name);
4. 收集t1表和t1表上索引的統計信息
SQL> begin
2 dbms_stats.gather_table_stats(ownname => 'SCOTT',
3 tabname => 'T1',
4 estimate_percent => 100,
5 cascade => true,
6 no_invalidate => false,
7 degree => 4);
8 end;
9 /
5. 統計t1表的總行數,object_name的行數
SQL> select count(*), count(object_name), count(distinct object_name) from t1;
COUNT(*) COUNT(OBJECT_NAME) COUNT(DISTINCTOBJECT_NAME) ---------- ------------------ -------------------------- 54068 54060 10472
至此,准備工作已經完成。t1表有54068行,object_name列有54060行,之所以這個值比總行數少,是因為count(列)的時候不統計該列上的空值。
三、查看執行計划
分別執行下面四條sql,觀察執行計划
a. select count(object_name) from t1;
b. select count(object_name) from t1 where object_name is not null;
c. select count(distinct object_name) from t1 where object_name is not null;
d. select count(distinct object_name) from t1;
1. 執行sql(a)
SQL> set autot on
SQL> select count(object_name) from t1;
COUNT(OBJECT_NAME) ------------------ 54060 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 19 | 63 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 19 | | | | 2 | INDEX FAST FULL SCAN| IDX_T1_NAME | 54068 | 1003K| 63 (0)| 00:00:01 | -------------------------------------------------------------------------------------
2. 執行sql(b)
SQL> select count(object_name) from t1 where object_name is not null;
COUNT(OBJECT_NAME) ------------------ 54060 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 19 | 63 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 19 | | | |* 2 | INDEX FAST FULL SCAN| IDX_T1_NAME | 54060 | 1003K| 63 (0)| 00:00:01 | -------------------------------------------------------------------------------------
可以看到sql(a)和sql(b)的執行結果和執行計划都一樣,執行結果一樣很好理解,count(object_name)本來就不會統計object_name為空的行,所以后面有沒有where object_name is not null對結果都沒有影響。
執行計划一樣,也很好理解,都是走的索引快速全掃描,畢竟我只是想得到object_name有多少個值,空值我根本不管,而btree索引剛好也不存儲空值,所以只需要統計object_name上的索引有多少行就行了。
3. 執行sql(c)
SQL> select count(distinct object_name) from t1 where object_name is not null;
COUNT(DISTINCTOBJECT_NAME) -------------------------- 10472 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 66 | | 220 (2)| 00:00:03 | | 1 | SORT AGGREGATE | | 1 | 66 | | | | | 2 | VIEW | VW_DAG_0 | 10472 | 674K| | 220 (2)| 00:00:03 | | 3 | HASH GROUP BY | | 10472 | 194K| 1496K| 220 (2)| 00:00:03 | |* 4 | INDEX FAST FULL SCAN| IDX_T1_NAME | 54060 | 1003K| | 63 (0)| 00:00:01 | -----------------------------------------------------------------------------------------------
可以看到sql(c)比sql(b)多了一個distinct關鍵字,執行計划仍然采用的是索引快速全掃描。
4. 執行sql(d)
SQL> select count(distinct object_name) from t1;
COUNT(DISTINCTOBJECT_NAME) -------------------------- 10472 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 66 | | 349 (1)| 00:00:05 | | 1 | SORT AGGREGATE | | 1 | 66 | | | | | 2 | VIEW | VW_DAG_0 | 10472 | 674K| | 349 (1)| 00:00:05 | | 3 | HASH GROUP BY | | 10472 | 194K| 1496K| 349 (1)| 00:00:05 | | 4 | TABLE ACCESS FULL| T1 | 54068 | 1003K| | 192 (0)| 00:00:03 | -----------------------------------------------------------------------------------------
可以看到sql(d)在sql(c)的基礎上,刪掉了where object_name is not null,執行結果沒有變,但是執行計划由索引快速全掃描變成了全表掃描。照道理來講,sql(d)依然可以使用索引的快速全掃描就可以得出結果,但是卻選擇了cost更大的全表掃描,這個是為什么呢?
四、問題
a. select count(object_name) from t1;
b. select count(object_name) from t1 where object_name is not null;
c. select count(distinct object_name) from t1 where object_name is not null;
d. select count(distinct object_name) from t1;
sql(a)與sql(b),都走索引INDEX FAST FULL SCAN,在它的上層是SORT AGGREGATE。也就是掃個索引,統計下索引行數就行了。
sql(c),也走索引INDEX FAST FULL SCAN,它的上層是HASH GROUP BY,然后是VIEW,最后才是SORT AGGREGATE。
sql(d),走的是全表掃描,它的上層是HASH GROUP BY,然后是VIEW,最后才是SORT AGGREGATE。
count(object_name),oracle知道空值對結果沒有什么影響,所以不管加不加where條件,都能走索引。
count(distinct object_name),oracle估計就懵了,它會在sql中先看看有沒有過濾條件。如果將空值踢掉了,開開心心走索引,沒踢掉,老老實實全表掃描。
這是為啥?