Oracle Index Clustering Factor(集群因子)

本文轉載自查看原文 2013-01-24 14:49 5235 Performance Tuning II

一、本文說明：

今天在做測試的時候發現字段上有索引，但是執行計划就是不走索引，經過在網上查找才發現原來是索引的集群因子過高導致的。本文屬於轉載+模擬。

二、官網說明

The index clustering factor measures row order in relation to an indexed value suches employee last name.The more order that exists in rowstorage for this value,the lower the clustering factor.

----row存儲的越有序，clustering factor的值越低。

The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index:

(1)、If the clustering factor is high,then Oracle Database performs a relatively high number of I/Os during a large index range scan.The index entriespoint to random table blocks,so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.

----當clustering factor很高時，說明index entry (rowid) 是隨機指向一些block的，在一個大的index range scan時，這樣為了讀取這些rowid指向的block，就需要一次又一次重復的去讀這些block。

(2)、If the clustering factor is low,then Oracle Database performs a relatively low number of I/Os during a large index range scan.The index keys in arange tend to point to the same data blcok,so the database does not have to read and reread the same blocks over and over.

----當clustering factor值低時，說明index keys (rowid) 是指向的記錄是存儲在相同的block里，這樣去讀row時，只需要在同一個block里讀取就可以了，這樣減少重復讀取blocks的次數。

The clustering factor is relevant for index scans because it can show:

(1)、Whether the database will use an index for large range scans;

(2)、The degree of table organization in relation to the index key;

(3)、Whether you should consider using an index-organized table,partitioning,or table cluster if rows must be ordered by the index key.

三、Index Clustering Factor說明

簡單的說，Index Clustering Factor是通過一個索引掃描一張表，需要訪問的表的數據塊的數量，即對I/O的影響，也代表索引鍵存儲位置是否有序。

(1)、如果越有序，即相鄰的鍵值存儲在相同的block，那么這時候Clustering Factor的值就越低；

(2)、如果不是很有序，即鍵值是隨機的存儲在block上，這樣在讀取鍵值時，可能就需要一次又一次的去訪問相同的block，從而增加了I/O。

Clustering Factor的計算方式如下：

(1)、掃描一個索引(large index range scan)；

(2)、比較某行的rowid和前一行的rowid，如果這兩個rowid不屬於同一個數據塊，那么cluster factor增加1；

(3)、整個索引掃描完畢后，就得到了該索引的clustering factor。

如果clustering factor接近於表存儲的塊數，說明這張表是按照索引字段的順序存儲的。

如果clustering factor接近於行的數量，那說明這張表不是按索引字段順序存儲的。

在計算索引訪問成本的時候，這個值十分有用。Clustering Factor乘以選擇性參數(selectivity)就是訪問索引的開銷。

如果這個統計數據不能真實反映出索引的真實情況，那么可能會造成優化器錯誤的選擇執行計划。另外如果某張表上的大多數訪問是按照某個索引做索引掃描，那么將該表的數據按照索引字段的順序重新組織，可以提高該表的訪問性能。

四、測試

4.1、產生問題：

  ----查看一下數據庫的版本----
  1 SQL> select * from v$version where rownum=1;
  2 
  3 BANNER
  4 --------------------------------------------------------------------------------
  5 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
  6 
   ----創建一張測試表jack----
  7 SQL> create table jack as select * from dba_objects where 1=2;
  8 
  9 Table created.
 10 
  ----將數據無序的插入jack表中----
 11 SQL> begin
 12   2      for i in 1..10 loop
 13   3        insert /*+ append */ into jack select * from dba_objects order by i;
 14   4      commit;
 15   5    end loop;
 16   6  end;
 17   7  /
 18 
 19 PL/SQL procedure successfully completed.
 20 
 21 SQL> select count(*) from jack;
 22 
 23   COUNT(*)
 24 ----------
 25     725460
 26 
  ----查看一下表的大小-----
 27 SQL> set wrap off
 28 SQL> col owner for a10;
 29 SQL> col segment_name for a15;
 30 SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK';
 31 
 32 SEGMENT_NAME     BLOCKS    EXTENTS   size
 33 ------------- ---------- ---------- --------
 34 JACK             11264       82      88M
 35 
  ----在object_id上創建索引----
 36 SQL> create index jack_ind on jack(object_id);
 37 
 38 Index created.
 39 
  ----查看一下索引的大小----
 40 SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND';
 41 
 42 SEGMENT_NAME    SEGMENT_TYPE      BLOCKS     EXTENTS     size
 43 ------------ ------------------ ---------- ---------- ---------
 44 JACK_IND           INDEX           1664         28        13M
  ----在沒有收集相關的統計信息之前，查看一下index clustering factor----
 45 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
 46 
 47 INDEX_NAME      CLUSTERING_FACTOR   NUM_ROWS
 48 --------------- ----------------- ----------
 49 JACK_IND              725460         725460
 50 
  ----簡單的收集一下統計信息----
 51 SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true);
 52 
 53 PL/SQL procedure successfully completed.
 54 
  ----再次查看index clustering factor----
 55 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
 56 
 57 INDEX_NAME     CLUSTERING_FACTOR   NUM_ROWS
 58 -------------- ----------------- ----------
 59 JACK_IND              725460       725460       ----顯然統計信息收集前和后，clustering factor值不變，說在創建索引的時候，會收集表中的數據真正的行數。並且這里的clustering factor等num_rows，也說明表的clustering factor是無序的。
 60 
  ----查看一個確定值，然后查看執行計划----
 61 SQL> explain plan for select * from jack where object_id=1501;
 62 
 63 Explained.
 64 
 65 SQL> select * from table(dbms_xplan.display);
 66 
 67 PLAN_TABLE_OUTPUT
 68 --------------------------------------------------------------------------------
 69 Plan hash value: 2860868395
 70 
 71 --------------------------------------------------------------------------------
 72 | Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Ti
 73 --------------------------------------------------------------------------------
 74 |   0 | SELECT STATEMENT        |           |    10 |   970 |    13     (0)| 00
 75 |   1 |  TABLE ACCESS BY INDEX ROWID| JACK     |    10 |   970 |    13     (0)| 00
 76 |*  2 |   INDEX RANGE SCAN        | JACK_IND |    10 |       |     3     (0)| 00
 77 --------------------------------------------------------------------------------
 78 
 79 Predicate Information (identified by operation id):
 80 
 81 PLAN_TABLE_OUTPUT
 82 --------------------------------------------------------------------------------
 83 
 84 
 85    2 - access("OBJECT_ID"=1501)
 86 
 87 14 rows selected.                 ----在這里走了索引，cost為13.
 88 
 89 SQL> alter system flush buffer_cache;
 90 
 91 System altered.
 92 
 93 SQL> set autotrace traceonly;
  ----查詢一個范圍的執行計划----
 94 SQL> select * from jack where object_id>1000 and object_id<2000;
 95 
 96 9880 rows selected.
 97 
 98 
 99 Execution Plan
100 ----------------------------------------------------------
101 Plan hash value: 949574992
102 
103 --------------------------------------------------------------------------
104 | Id  | Operation      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
105 --------------------------------------------------------------------------
106 |   0 | SELECT STATEMENT  |     |  9657 |   914K|  1824   (1)| 00:00:22 |
107 |*  1 |  TABLE ACCESS FULL| JACK |  9657 |   914K|  1824   (1)| 00:00:22 |
108 --------------------------------------------------------------------------
109 
110 Predicate Information (identified by operation id):
111 ---------------------------------------------------
112 
113    1 - filter("OBJECT_ID"<2000 AND "OBJECT_ID">1000)
114 
115 
116 Statistics
117 ----------------------------------------------------------
118       0   recursive calls
119       0   db block gets
120     10993 consistent gets
121     10340 physical reads
122       0   redo size
123    471945 bytes sent via SQL*Net to client
124     7657  bytes received via SQL*Net from client
125     660   SQL*Net roundtrips to/from client
126       0   sorts (memory)
127       0   sorts (disk)
128     9880  rows processed       ----注意，object_id上是有索引的，但是這里並沒有使用索引，而是使用了全表掃描。
129 
130 SQL> alter system flush buffer_cache;
131 
132 System altered.
133 
 ----強制走索引，查看執行計划----
134 SQL> select /*+ index(jack jack_ind) */ * from jack where object_id>1000 and object_id<2000;
135 
136 9880 rows selected.
137 
138 
139 Execution Plan
140 ----------------------------------------------------------
141 Plan hash value: 2860868395
142 
143 ----------------------------------------------------------------------------------------
144 | Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
145 ----------------------------------------------------------------------------------------
146 |   0 | SELECT STATEMENT        |           |  9657 |   914K|  9683     (1)| 00:01:57 |
147 |   1 |  TABLE ACCESS BY INDEX ROWID| JACK     |  9657 |   914K|  9683     (1)| 00:01:57 |
148 |*  2 |   INDEX RANGE SCAN        | JACK_IND |  9657 |       |    24     (0)| 00:00:01 |
149 ----------------------------------------------------------------------------------------
150 
151 Predicate Information (identified by operation id):
152 ---------------------------------------------------
153 
154    2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)
155 
156 
157 Statistics
158 ----------------------------------------------------------
159       0    recursive calls
160       0    db block gets
161     10561  consistent gets
162     164    physical reads
163       0    redo size
164    988947  bytes sent via SQL*Net to client
165     7657   bytes received via SQL*Net from client
166     660    SQL*Net roundtrips to/from client
167       0    sorts (memory)
168       0    sorts (disk)
169     9880   rows processed    
----強制走索引之后，使用了index range scan,但是cost變成了9683，而全表掃描時是1824.
----還有比較一下兩次查詢中物理讀的情況：全表掃描的物理讀明顯比索引的要高很多，但是Oracle卻沒有使用索引。
----因此Oracle認為走索引的Cost比走全表掃描大，而是大N倍，CBO是基於Cost來決定執行計划的。
----由此得出，對於索引的Cost，Oracle是根據clustering factor參數來計算的，而該實驗中的clustering factor參數是很高的，數據存儲無序。這就造成了Oracle認為走索引的cost比全表掃描的大。

4.2、解決問題：

  ----通過上面的分析，可以看出，要降低clustering factor才能解決問題，而要解決clustering factor，就需要重新對表的存儲位置進行排序。----
  ----重建jakc表----
  1 SQL> create table echo as select * from jack where 1=0;
  2 
  3 Table created.
  4 
  5 SQL> insert /*+ append */ into echo select * from jack order by object_id;
  6 
  7 725460 rows created.
  8 
  9 SQL> commit;
 10 
 11 Commit complete.
 12 
 13 SQL> truncate table jack;
 14 
 15 Table truncated.
 16 
 17 SQL> insert /*+ append */ into jack select * from echo;
 18 
 19 725460 rows created.
 20 
 21 SQL> commit;
 22 
 23 Commit complete.
 24 
  ----查看表和索引的信息----
 25 SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK';
 26 
 27 SEGMENT_NAME    BLOCKS    EXTENTS     size
 28 ------------- ---------- ---------- -----------
 29 JACK             11264       82        88M
 30 
 31 SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND';
 32 
 33 SEGMENT_NAME    SEGMENT_TYPE      BLOCKS     EXTENTS    size
 34 ------------ ------------------ ---------- ---------- -------------
 35 JACK_IND            INDEX          1536          27     12M
 36 
 37 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
 38 
 39 INDEX_NAME     CLUSTERING_FACTOR NUM_ROWS
 40 ------------- ----------------- ----------
 41 JACK_IND             725460      725460
 42 
  ----對索引進行rebuild----
 43 SQL> alter index jack_ind rebuild;
 44 
 45 Index altered.
 46 
  ----查看cluster factor----
 47 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
 48 
 49 INDEX_NAME      CLUSTERING_FACTOR  NUM_ROWS
 50 --------------- ----------------- ----------
 51 JACK_IND               10327       725460    ------注意這里的Factor，已經變成10327，我們收集一下表的統計信息，然后與表的block進行一次比較。
 52 
 53 SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true);
 54 
 55 PL/SQL procedure successfully completed.
 56 
 57 SQL> select blocks from dba_tables where table_name='JACK';
 58 
 59     BLOCKS
 60 ----------
 61      10474   ----表jack實際使用的block是10474，clustering factor是10327基本還是比較接近了，這也說明相鄰的row是存儲在相同的block里。
 62 
 63 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
 64 
 65 INDEX_NAME               CLUSTERING_FACTOR   NUM_ROWS
 66 ------------------------------ ----------------- ----------
 67 JACK_IND                   10327     725460
 68 
 69 SQL> alter system flush buffer_cache;
 70 
 71 System altered.
 72 
 73 SQL> set autotrace traceonly;
  ----再次查看之前sql的執行計划----
 74 SQL> select * from jack where object_id>1000 and object_id<2000;
 75 
 76 9880 rows selected.
 77 
 78 
 79 Execution Plan
 80 ----------------------------------------------------------
 81 Plan hash value: 2860868395
 82 
 83 ----------------------------------------------------------------------------------------
 84 | Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
 85 ----------------------------------------------------------------------------------------
 86 |   0 | SELECT STATEMENT        |           |  9657 |   914K|   162     (0)| 00:00:02 |
 87 |   1 |  TABLE ACCESS BY INDEX ROWID| JACK     |  9657 |   914K|   162     (0)| 00:00:02 |
 88 |*  2 |   INDEX RANGE SCAN        | JACK_IND |  9657 |       |    24     (0)| 00:00:01 |
 89 ----------------------------------------------------------------------------------------
 90 
 91 Predicate Information (identified by operation id):
 92 ---------------------------------------------------
 93 
 94    2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)
 95 
 96 
 97 Statistics
 98 ----------------------------------------------------------
 99       1   recursive calls
100       0   db block gets
101     1457  consistent gets
102     151   physical reads
103       0   redo size
104   988947  bytes sent via SQL*Net to client
105     7657  bytes received via SQL*Net from client
106     660   SQL*Net roundtrips to/from client
107       0   sorts (memory)
108       0   sorts (disk)
109     9880  rows processed

----注意這里的cost已經降到了162，性能提升還是非常明顯。

五、小結

通過以上說明和測試，可以看到clustering factor也是索引健康的一個重要判斷的標准。其值越低越好。它會影響CBO選擇正確的執行計划。但是注意一點，clustering factor總是趨勢與不斷惡化的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 因子分析(Factor analysis) R factor因子及因子水平levels Hashtable 負載因子Load Factor R語言里的因子factor Stat3—因子分析（Factor Analysis） [LeetCode] 254. Factor Combinations 因子組合轉錄因子 | transcription factor | 從入門到精通幾何結構因子(Geometrical structure factor)和原子形狀因子(atomic form factor) R語言實戰-數據類型3-因子（factor） Etcd學習（二）集群搭建Clustering