postgresql----Gist索引


GiST的意思是通用的搜索樹(Generalized Search Tree)。 它是一種平衡樹結構的訪問方法,在系統中作為一個基本模版,可以使用它實現任意索引模式。B-trees, R-trees和許多其它的索引模式都可以用GiST實現。

上面一段高能的官方解釋有點難以理解,暫時也不需要使用Gist實現其他的索引模式,就簡單的介紹下Gist索引如何使用,

與Btree索引比較的優缺點:

優點:

Gist索引適用於多維數據類型和集合數據類型,和Btree索引類似,同樣適用於其他的數據類型。和Btree索引相比,Gist多字段索引在查詢條件中包含索引字段的任何子集都會使用索引掃描,而Btree索引只有查詢條件包含第一個索引字段才會使用索引掃描。

 

缺點:

Gist索引創建耗時較長,占用空間也比較大。

測試表

 

test=# create table tbl_index(a bigint,b timestamp without time zone,c varchar(12));
CREATE TABLE
test=# insert into tbl_index (a,b,c)  select generate_series(1,3000000),clock_timestamp()::timestamp(0) without time zone,'got u';
INSERT 0 3000000

 

 

test=# \timing 
Timing is on.

 

創建Gist索引的前提是已經編譯並安裝了Gist的擴展,因為我源碼編譯時已經編譯安裝了所有的擴展,所以這里只需要在數據庫中創建擴展即可。

test=# create extension btree_gist;
CREATE EXTENSION
Time: 774.131 ms

 

創建索引

test=# create index idx_gist_tbl_index_a_b on tbl_index using gist(a,b);
CREATE INDEX
Time: 168595.321 ms

 

 

示例1.使用字段a查詢

test=# explain analyze select * from tbl_index where a=3000000;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..21395.10 rows=1 width=22) (actual time=310.514..310.517 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbl_index  (cost=0.00..20395.00 rows=0 width=22) (actual time=289.432..289.433 rows=0 loops=3)
         Filter: (a = 3000000)
         Rows Removed by Filter: 1000000
 Planning time: 0.119 ms
 Execution time: 310.631 ms
(8 rows)

Time: 311.505 ms

 

test=# explain analyze select * from tbl_index where a='3000000';
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_gist_tbl_index_a_b on tbl_index  (cost=0.29..8.30 rows=1 width=22) (actual time=0.104..0.105 rows=1 loops=1)
   Index Cond: (a = '3000000'::bigint)
 Planning time: 0.109 ms
 Execution time: 0.297 ms
(4 rows)

Time: 1.124 ms

 

以上兩條SQL語句的區別在於第一條SQL語句按照a的類型bigint去查詢,而第二條SQL語句卻將bigint轉成char類型查詢,但是結果顯示char類型的查詢(索引掃描)性能遠高於bigint的查詢(全表掃描)性能,懷疑是不是創建索引時將bigint轉成char類型了(只是猜測),反正Gist索引查詢最好使用char。

 

示例2.使用字段b查詢

test=# explain analyze select * from tbl_index where b='2016-06-29 14:54:00';
                                                                  QUERY PLAN                                                         
         
-------------------------------------------------------------------------------------------------------------------------------------
---------
 Bitmap Heap Scan on tbl_index  (cost=3373.54..10281.04 rows=171000 width=22) (actual time=37.200..53.564 rows=172824 loops=1)
   Recheck Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone)
   Heap Blocks: exact=276
   ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..3330.79 rows=171000 width=0) (actual time=37.139..37.139 rows=172824 
loops=1)
         Index Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone)
 Planning time: 0.343 ms
 Execution time: 60.843 ms
(7 rows)

Time: 62.359 ms

 

該查詢不包含第一個索引字段,但是仍使用索引掃描,而此條件下Btree索引只能全表掃描。

 

示例3.使用a and b查詢

test=# explain analyze select * from tbl_index where a='3000000' and b='2016-06-29 14:54:00';
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_gist_tbl_index_a_b on tbl_index  (cost=0.29..8.31 rows=1 width=22) (actual time=0.114..0.115 rows=1 loops=1)
   Index Cond: ((a = '3000000'::bigint) AND (b = '2016-06-29 14:54:00'::timestamp without time zone))
 Planning time: 0.376 ms
 Execution time: 0.258 ms
(4 rows)

Time: 1.747 ms

 

示例4.使用a or b查詢

test=# explain analyze select * from tbl_index where a='3000000' or b='2016-06-29 14:54:00';
                                                                     QUERY PLAN                                                      
               
-------------------------------------------------------------------------------------------------------------------------------------
---------------
 Bitmap Heap Scan on tbl_index  (cost=3420.58..10755.60 rows=171001 width=22) (actual time=31.142..49.728 rows=172824 loops=1)
   Recheck Cond: ((a = '3000000'::bigint) OR (b = '2016-06-29 14:54:00'::timestamp without time zone))
   Heap Blocks: exact=276
   ->  BitmapOr  (cost=3420.58..3420.58 rows=171001 width=0) (actual time=31.083..31.083 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..4.29 rows=1 width=0) (actual time=0.100..0.100 rows=1 loops=1)
               Index Cond: (a = '3000000'::bigint)
         ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..3330.79 rows=171000 width=0) (actual time=30.981..30.981 rows=1
72824 loops=1)
               Index Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone)
 Planning time: 0.143 ms
 Execution time: 57.193 ms
(10 rows)

Time: 58.067 ms

 

使用and和or查詢雖然也是索引掃描,但是和Btree索引相比並沒有性能提升。

 

比較Gist索引和Btree索引的創建耗時和大小

btree索引耗時:

 

test=# create index idx_btree_tbl_index_a_b on tbl_index using btree(a,b);
CREATE INDEX
Time: 5217.976 ms

 

 

 

Gist索引耗時從上面看到是168595.321 ms,是Btree索引耗時的32倍。

 

大小比較,結果顯示Gist索引是Btree索引的3倍多。

test=# select relname,pg_size_pretty(pg_relation_size(oid)) from pg_class where relname like 'idx_%_tbl_index_a_b';
         relname         | pg_size_pretty 
-------------------------+----------------
 idx_gist_tbl_index_a_b  | 281 MB
 idx_btree_tbl_index_a_b | 89 MB
(2 rows)

Time: 4.068 ms

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM