Oracle Index Clustering Factor(集群因子)


一、本文說明:

    今天在做測試的時候發現字段上有索引,但是執行計划就是不走索引,經過在網上查找才發現原來是索引的集群因子過高導致的。本文屬於轉載+模擬。

二、官網說明

    The index clustering factor measures row order in relation to an indexed value suches employee last name.The more order that exists in rowstorage for this value,the lower the clustering factor.

    ----row存儲的越有序,clustering factor的值越低。

    The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index:

     (1)、If the clustering factor is high,then Oracle Database performs a relatively high number of I/Os during a large index range scan.The index entriespoint to random table blocks,so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.

     ----當clustering factor很高時,說明index entry (rowid) 是隨機指向一些block的,在一個大的index range scan時,這樣為了讀取這些rowid指向的block,就需要一次又一次重復的去讀這些block。

     (2)、If the clustering factor is low,then Oracle Database performs a relatively low number of I/Os during a large index range scan.The index keys in arange tend to point to the same data blcok,so the database does not have to read and reread the same blocks over and over.

      ----當clustering factor值低時,說明index keys (rowid) 是指向的記錄是存儲在相同的block里,這樣去讀row時,只需要在同一個block里讀取就可以了,這樣減少重復讀取blocks的次數。

      The clustering factor is relevant for index scans because it can show:

           (1)、Whether the database will use an index for large range scans;

           (2)、The degree of table organization in relation to the index key;

           (3)、Whether you should consider using an index-organized table,partitioning,or table cluster if rows must be ordered by the index key.

三、Index Clustering Factor說明

    簡單的說,Index Clustering Factor是通過一個索引掃描一張表,需要訪問的表的數據塊的數量,即對I/O的影響,也代表索引鍵存儲位置是否有序。

    (1)、如果越有序,即相鄰的鍵值存儲在相同的block,那么這時候Clustering Factor的值就越低;

    (2)、如果不是很有序,即鍵值是隨機的存儲在block上,這樣在讀取鍵值時,可能就需要一次又一次的去訪問相同的block,從而增加了I/O。

    Clustering Factor的計算方式如下:

     (1)、掃描一個索引(large index range scan);

     (2)、比較某行的rowid和前一行的rowid,如果這兩個rowid不屬於同一個數據塊,那么cluster factor增加1;

     (3)、整個索引掃描完畢后,就得到了該索引的clustering factor。

            如果clustering factor接近於表存儲的塊數,說明這張表是按照索引字段的順序存儲的。

            如果clustering factor接近於行的數量,那說明這張表不是按索引字段順序存儲的。

            在計算索引訪問成本的時候,這個值十分有用。Clustering Factor乘以選擇性參數(selectivity)就是訪問索引的開銷。

            如果這個統計數據不能真實反映出索引的真實情況,那么可能會造成優化器錯誤的選擇執行計划。另外如果某張表上的大多數訪問是按照某個索引做索引掃描,那么將該表的數據按照索引字段的順序重新組織,可以提高該表的訪問性能。

四、測試

  4.1、產生問題:

  ----查看一下數據庫的版本----
1
SQL> select * from v$version where rownum=1; 2 3 BANNER 4 -------------------------------------------------------------------------------- 5 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production 6 ----創建一張測試表jack----
7
SQL> create table jack as select * from dba_objects where 1=2; 8 9 Table created. 10 ----將數據無序的插入jack表中----
11
SQL> begin 12 2 for i in 1..10 loop 13 3 insert /*+ append */ into jack select * from dba_objects order by i; 14 4 commit; 15 5 end loop; 16 6 end; 17 7 / 18 19 PL/SQL procedure successfully completed. 20 21 SQL> select count(*) from jack; 22 23 COUNT(*) 24 ---------- 25 725460 26 ----查看一下表的大小-----
27
SQL> set wrap off 28 SQL> col owner for a10; 29 SQL> col segment_name for a15; 30 SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK'; 31 32 SEGMENT_NAME BLOCKS EXTENTS size 33 ------------- ---------- ---------- --------
34 JACK 11264 82 88M 35 ----在object_id上創建索引----
36
SQL> create index jack_ind on jack(object_id); 37 38 Index created. 39 ----查看一下索引的大小----
40
SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND'; 41 42 SEGMENT_NAME SEGMENT_TYPE BLOCKS EXTENTS size 43 ------------ ------------------ ---------- ---------- ---------
44 JACK_IND INDEX 1664 28 13M ----在沒有收集相關的統計信息之前,查看一下index clustering factor----
45
SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND'; 46 47 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS 48 --------------- ----------------- ---------- 49 JACK_IND 725460 725460 50 ----簡單的收集一下統計信息----
51
SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true); 52 53 PL/SQL procedure successfully completed. 54 ----再次查看index clustering factor----
55
SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND'; 56 57 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS 58 -------------- ----------------- ---------- 59 JACK_IND 725460 725460 ----顯然統計信息收集前和后,clustering factor值不變,說在創建索引的時候,會收集表中的數據真正的行數。並且這里的clustering factor等num_rows,也說明表的clustering factor是無序的。 60 ----查看一個確定值,然后查看執行計划----
61
SQL> explain plan for select * from jack where object_id=1501; 62 63 Explained. 64 65 SQL> select * from table(dbms_xplan.display); 66 67 PLAN_TABLE_OUTPUT 68 -------------------------------------------------------------------------------- 69 Plan hash value: 2860868395 70 71 -------------------------------------------------------------------------------- 72 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Ti 73 -------------------------------------------------------------------------------- 74 | 0 | SELECT STATEMENT | | 10 | 970 | 13 (0)| 00 75 | 1 | TABLE ACCESS BY INDEX ROWID| JACK | 10 | 970 | 13 (0)| 00 76 |* 2 | INDEX RANGE SCAN | JACK_IND | 10 | | 3 (0)| 00 77 -------------------------------------------------------------------------------- 78 79 Predicate Information (identified by operation id): 80 81 PLAN_TABLE_OUTPUT 82 -------------------------------------------------------------------------------- 83
84 85 2 - access("OBJECT_ID"=1501) 86 87 14 rows selected. ----在這里走了索引,cost為13. 88
89
SQL> alter system flush buffer_cache; 90 91 System altered. 92 93 SQL> set autotrace traceonly; ----查詢一個范圍的執行計划----
94
SQL> select * from jack where object_id>1000 and object_id<2000; 95 96 9880 rows selected. 97 98 99 Execution Plan 100 ---------------------------------------------------------- 101 Plan hash value: 949574992 102 103 -------------------------------------------------------------------------- 104 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 105 -------------------------------------------------------------------------- 106 | 0 | SELECT STATEMENT | | 9657 | 914K| 1824 (1)| 00:00:22 | 107 |* 1 | TABLE ACCESS FULL| JACK | 9657 | 914K| 1824 (1)| 00:00:22 | 108 -------------------------------------------------------------------------- 109 110 Predicate Information (identified by operation id): 111 --------------------------------------------------- 112 113 1 - filter("OBJECT_ID"<2000 AND "OBJECT_ID">1000) 114 115 116 Statistics 117 ---------------------------------------------------------- 118 0 recursive calls 119 0 db block gets 120 10993 consistent gets 121 10340 physical reads 122 0 redo size 123 471945 bytes sent via SQL*Net to client 124 7657 bytes received via SQL*Net from client 125 660 SQL*Net roundtrips to/from client 126 0 sorts (memory) 127 0 sorts (disk) 128 9880 rows processed ----注意,object_id上是有索引的,但是這里並沒有使用索引,而是使用了全表掃描。 129 130 SQL> alter system flush buffer_cache; 131 132 System altered. 133 ----強制走索引,查看執行計划----
134
SQL> select /*+ index(jack jack_ind) */ * from jack where object_id>1000 and object_id<2000; 135 136 9880 rows selected. 137 138 139 Execution Plan 140 ---------------------------------------------------------- 141 Plan hash value: 2860868395 142 143 ---------------------------------------------------------------------------------------- 144 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 145 ---------------------------------------------------------------------------------------- 146 | 0 | SELECT STATEMENT | | 9657 | 914K| 9683 (1)| 00:01:57 | 147 | 1 | TABLE ACCESS BY INDEX ROWID| JACK | 9657 | 914K| 9683 (1)| 00:01:57 | 148 |* 2 | INDEX RANGE SCAN | JACK_IND | 9657 | | 24 (0)| 00:00:01 | 149 ---------------------------------------------------------------------------------------- 150 151 Predicate Information (identified by operation id): 152 --------------------------------------------------- 153 154 2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000) 155 156 157 Statistics 158 ---------------------------------------------------------- 159 0 recursive calls 160 0 db block gets 161 10561 consistent gets 162 164 physical reads 163 0 redo size 164 988947 bytes sent via SQL*Net to client 165 7657 bytes received via SQL*Net from client 166 660 SQL*Net roundtrips to/from client 167 0 sorts (memory) 168 0 sorts (disk) 169 9880 rows processed
----強制走索引之后,使用了index range scan,但是cost變成了9683,而全表掃描時是1824.
----還有比較一下兩次查詢中物理讀的情況:全表掃描的物理讀明顯比索引的要高很多,但是Oracle卻沒有使用索引。
----因此Oracle認為走索引的Cost比走全表掃描大,而是大N倍,CBO是基於Cost來決定執行計划的。
----由此得出,對於索引的Cost,Oracle是根據clustering factor參數來計算的,而該實驗中的clustering factor參數是很高的,數據存儲無序。這就造成了Oracle認為走索引的cost比全表掃描的大。

   4.2、解決問題:

  ----通過上面的分析,可以看出,要降低clustering factor才能解決問題,而要解決clustering factor,就需要重新對表的存儲位置進行排序。----
----重建jakc表----
1
SQL> create table echo as select * from jack where 1=0; 2 3 Table created. 4 5 SQL> insert /*+ append */ into echo select * from jack order by object_id; 6 7 725460 rows created. 8 9 SQL> commit; 10 11 Commit complete. 12 13 SQL> truncate table jack; 14 15 Table truncated. 16 17 SQL> insert /*+ append */ into jack select * from echo; 18 19 725460 rows created. 20 21 SQL> commit; 22 23 Commit complete. 24 ----查看表和索引的信息----
25
SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK'; 26 27 SEGMENT_NAME BLOCKS EXTENTS size 28 ------------- ---------- ---------- -----------
29 JACK 11264 82 88M 30 31 SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND'; 32 33 SEGMENT_NAME SEGMENT_TYPE BLOCKS EXTENTS size 34 ------------ ------------------ ---------- ---------- -------------
35 JACK_IND INDEX 1536 27 12M 36 37 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND'; 38 39 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS 40 ------------- ----------------- ---------- 41 JACK_IND 725460 725460 42 ----對索引進行rebuild----
43
SQL> alter index jack_ind rebuild; 44 45 Index altered. 46 ----查看cluster factor----
47
SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND'; 48 49 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS 50 --------------- ----------------- ---------- 51 JACK_IND 10327 725460 ------注意這里的Factor,已經變成10327,我們收集一下表的統計信息,然后與表的block進行一次比較。 52 53 SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true); 54 55 PL/SQL procedure successfully completed. 56 57 SQL> select blocks from dba_tables where table_name='JACK'; 58 59 BLOCKS 60 ---------- 61 10474 ----表jack實際使用的block是10474,clustering factor是10327基本還是比較接近了,這也說明相鄰的row是存儲在相同的block里。 62 63 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND'; 64 65 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS 66 ------------------------------ ----------------- ---------- 67 JACK_IND 10327 725460 68 69 SQL> alter system flush buffer_cache; 70 71 System altered. 72 73 SQL> set autotrace traceonly; ----再次查看之前sql的執行計划----
74
SQL> select * from jack where object_id>1000 and object_id<2000; 75 76 9880 rows selected. 77 78 79 Execution Plan 80 ---------------------------------------------------------- 81 Plan hash value: 2860868395 82 83 ---------------------------------------------------------------------------------------- 84 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 85 ---------------------------------------------------------------------------------------- 86 | 0 | SELECT STATEMENT | | 9657 | 914K| 162 (0)| 00:00:02 | 87 | 1 | TABLE ACCESS BY INDEX ROWID| JACK | 9657 | 914K| 162 (0)| 00:00:02 | 88 |* 2 | INDEX RANGE SCAN | JACK_IND | 9657 | | 24 (0)| 00:00:01 | 89 ---------------------------------------------------------------------------------------- 90 91 Predicate Information (identified by operation id): 92 --------------------------------------------------- 93 94 2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000) 95 96 97 Statistics 98 ---------------------------------------------------------- 99 1 recursive calls 100 0 db block gets 101 1457 consistent gets 102 151 physical reads 103 0 redo size 104 988947 bytes sent via SQL*Net to client 105 7657 bytes received via SQL*Net from client 106 660 SQL*Net roundtrips to/from client 107 0 sorts (memory) 108 0 sorts (disk) 109 9880 rows processed
  ----注意這里的cost已經降到了162,性能提升還是非常明顯。

 五、小結

    通過以上說明和測試,可以看到clustering factor也是索引健康的一個重要判斷的標准。其值越低越好。它會影響CBO選擇正確的執行計划。但是注意一點,clustering factor總是趨勢與不斷惡化的。

   

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM