深入淺出PostgreSQL B-Tree索引結構


深入淺出PostgreSQL B-Tree索引結構

作者

digoal

日期

2016-05-28

標簽

PostgreSQL , b-tree , 索引結構


背景

PostgreSQL B-Tree是一種變種(high-concurrency B-tree management algorithm),算法詳情請參考

src/backend/access/nbtree/README

PostgreSQL 的B-Tree索引頁分為幾種類別

meta page    
root page         #  btpo_flags=2    
branch page    #  btpo_flags=0    
leaf page         #  btpo_flags=1    
    
如果即是leaf又是root則  btpo_flags=3。      

其中meta page和root page是必須有的,meta page需要一個頁來存儲,表示指向root page的page id。

隨着記錄數的增加,一個root page可能存不下所有的heap item,就會有leaf page,甚至branch page,甚至多層的branch page。

一共有幾層branch 和 leaf,就用btree page元數據的 level 來表示。

4

我們可以使用pageinspect插件,內窺B-Tree的結構。

層次可以從bt_page_stats的btpo得到,代表當前index page所處的層級。

注意層級並不是唯一的,例如btpo=3的層級,可能有分幾個檔。

打個比喻,騰訊的技術崗位級別T3,對應T3這個級別又有幾個小的檔位。和這里的含義差不多,只是沒有區分小檔位的值,但是后面我們能看到它的存在。

btpo=0級表示最底層,處於這個層級的index pages存儲的items(ctid)是指向heap page的。

類別和層級不掛鈎,類別里面又可以有多個層級,但是只有層級=0的index page存儲的ctid內容才是指向heap page的; 其他層級index page存儲的ctid內容都是指向同層級其他index page(雙向鏈表),或者指下級的index page。

1.

0層結構,只有meta和root頁。

root頁最多可以存儲的item數,取決於索引字段數據的長度、以及索引頁的大小。

1

例子

postgres=# create extension pageinspect;    
    
postgres=# create table tab1(id int primary key, info text);    
CREATE TABLE    
postgres=# insert into tab1 select generate_series(1,100), md5(random()::text);    
INSERT 0 100    
postgres=# vacuum analyze tab1;    
VACUUM    

查看meta page,可以看到root page id = 1 。

索引的level = 0, 說明沒有branch和leaf page。

postgres=# select * from bt_metap('tab1_pkey');    
 magic  | version | root | level | fastroot | fastlevel     
--------+---------+------+-------+----------+-----------    
 340322 |       2 |    1 |     0 |        1 |         0    
(1 row)    

根據root page id = 1查看root page的stats

btpo=0 說明已經到了最底層

btpo_flags=3,說明它既是leaf又是root頁。

postgres=# select * from bt_page_stats('tab1_pkey',1);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     1 | l    |        100 |          0 |            16 |      8192 |      6148 |         0 |         0 |    0 |          3    
(1 row)    

btpo_prev和btpo_next分別表示該頁的相鄰頁(branch page是雙向鏈表)。

btpo_flags 可以在代碼中查看(src/include/access/nbtree.h),一共有幾個

/* Bits defined in btpo_flags */    
#define BTP_LEAF                (1 << 0)        /* leaf page, i.e. not internal page */    
#define BTP_ROOT                (1 << 1)        /* root page (has no parent) */    
#define BTP_DELETED             (1 << 2)        /* page has been deleted from tree */    
#define BTP_META                (1 << 3)        /* meta-page */    
#define BTP_HALF_DEAD   (1 << 4)        /* empty, but still in tree */    
#define BTP_SPLIT_END   (1 << 5)        /* rightmost page of split group */    
#define BTP_HAS_GARBAGE (1 << 6)        /* page has LP_DEAD tuples */    
#define BTP_INCOMPLETE_SPLIT (1 << 7)   /* right sibling's downlink is missing */    

查看0級 page存儲的ctid (即items)

0級ctid 表示存儲的是 heap頁的尋址。 (如果是多層結構,那么branch page中的ctid, 它表示的是同級btree頁(鏈條項ctid)或者下級btree頁的尋址) 。

當ctid指向heap時, data是對應的列值。(多級結構的data意義不一樣,后面會講)

postgres=# select * from bt_page_items('tab1_pkey',1);    
 itemoffset |  ctid   | itemlen | nulls | vars |          data               
------------+---------+---------+-------+------+-------------------------    
          1 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00    
          2 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00    
...    
         99 | (0,99)  |      16 | f     | f    | 63 00 00 00 00 00 00 00    
        100 | (0,100) |      16 | f     | f    | 64 00 00 00 00 00 00 00    
(100 rows)    

根據ctid 查看heap記錄

postgres=# select * from tab1 where ctid='(0,100)';    
 id  |               info                   
-----+----------------------------------    
 100 | 68b63c269ee8cc2d99fe204f04d0ffcb    
(1 row)    

2.

1層結構,包括meta page, root page, leaf page.

2

例子

postgres=# truncate tab1;    
TRUNCATE TABLE    
postgres=# insert into tab1 select generate_series(1,1000), md5(random()::text);    
INSERT 0 1000    
postgres=# vacuum analyze tab1;    
VACUUM    

查看meta page,可以看到root page id = 3, 索引的level = 1。

level = 1 表示包含了leaf page。

postgres=# select * from bt_metap('tab1_pkey');    
 magic  | version | root | level | fastroot | fastlevel     
--------+---------+------+-------+----------+-----------    
 340322 |       2 |    3 |     1 |        3 |         1    
(1 row)    

根據root page id 查看root page的stats

btpo = 1 說明還沒有到最底層(最底層btpo=0, 這種頁里面存儲的ctid才代表指向heap page的地址)

btpo_flags=2 說明這個頁是root page

postgres=# select * from bt_page_stats('tab1_pkey',3);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     3 | r    |          3 |          0 |            13 |      8192 |      8096 |         0 |         0 |    1 |          2    
(1 row)    

查看root page存儲的 leaf page items (指向leaf page)

一共3個leaf pages, data存儲的是這個leaf page存儲的最小值。

postgres=# select * from bt_page_items('tab1_pkey',3);    
 itemoffset | ctid  | itemlen | nulls | vars |          data               
------------+-------+---------+-------+------+-------------------------    
          1 | (1,1) |       8 | f     | f    |     
          2 | (2,1) |      16 | f     | f    | 6f 01 00 00 00 00 00 00    
          3 | (4,1) |      16 | f     | f    | dd 02 00 00 00 00 00 00    
(3 rows)    

第一條為空,是因為這個leaf page是最左邊的PAGE,不存最小值。

對於有右leaf page的leaf page,第一條存儲的heap item為該頁的右鏈路。

第二條才是起始ITEM。

另外需要注意,雖然在item里面只存儲右鏈,leaf page還是雙向鏈表,在stats能看到它的prev 和next page。

根據leaf page id查看stats

最左leaf page = 1

prev btpo 指向meta page

可以看到btpo = 0了,說明這個頁是底層頁。      
btpo_flags=1 說明是leaf page      
postgres=# select * from bt_page_stats('tab1_pkey',1);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     1 | l    |        367 |          0 |            16 |      8192 |       808 |         0 |         2 |    0 |          1    
(1 row)    

next btpo 指向meta page

最右leaf page = 4

btpo_flags=1 說明是leaf page

postgres=# select * from bt_page_stats('tab1_pkey',4);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     4 | l    |        268 |          0 |            16 |      8192 |      2788 |         2 |         0 |    0 |          1    
(1 row)    

中間leaf page = 2

btpo_flags=1 說明是leaf page

postgres=# select * from bt_page_stats('tab1_pkey',2);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     2 | l    |        367 |          0 |            16 |      8192 |       808 |         1 |         4 |    0 |          1    
(1 row)    

查看leaf page存儲的 heap ctid (即heap items)

含右頁的例子, index page 1

第一條為右鏈表的第一條item, 第二條才是起始item

postgres=# select * from bt_page_items('tab1_pkey',1);    
 itemoffset |  ctid   | itemlen | nulls | vars |          data               
------------+---------+---------+-------+------+-------------------------    
          1 | (3,7)   |      16 | f     | f    | 6f 01 00 00 00 00 00 00    
          2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00    
          3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00    
...    
        367 | (3,6)   |      16 | f     | f    | 6e 01 00 00 00 00 00 00    
(367 rows)    

不含右頁的例子, index page 4

第一條就是起始ctid (即items)

postgres=# select * from bt_page_items('tab1_pkey',4);    
 itemoffset |  ctid   | itemlen | nulls | vars |          data               
------------+---------+---------+-------+------+-------------------------    
          1 | (6,13)  |      16 | f     | f    | dd 02 00 00 00 00 00 00    
          2 | (6,14)  |      16 | f     | f    | de 02 00 00 00 00 00 00    
...    
        268 | (8,40)  |      16 | f     | f    | e8 03 00 00 00 00 00 00    
(268 rows)    

根據ctid 查看heap記錄

postgres=#              select * from tab1 where ctid='(0,1)';    
 id |               info                   
----+----------------------------------    
  1 | 6ebc6b77aebf5dd11621a2ed846c08c4    
(1 row)    

3.

記錄數超過1層結構的索引可以存儲的記錄數時,會分裂為2層結構,除了meta page和root page,還可能包含1層branch page以及1層leaf page。

如果是邊界頁(branch or leaf),那么其中一個方向沒有PAGE,這個方向的鏈表信息都統一指向meta page。

3

例子

create table tbl1(id int primary key, info text);      
postgres=# select 285^2;    
 ?column?     
----------    
    81225    
(1 row)    
postgres=# insert into tab2 select trunc(random()*10000000), md5(random()::text) from generate_series(1,1000000) on conflict on constraint tab2_pkey do nothing;    
INSERT 0 951379    
postgres=# vacuum analyze tab2;    
VACUUM    

查看meta page,可以看到root page id = 412, 索引的level=2,即包括1級 branch 和 1級 leaf。

postgres=# select * from bt_metap('tab2_pkey');    
 magic  | version | root | level | fastroot | fastlevel     
--------+---------+------+-------+----------+-----------    
 340322 |       2 |  412 |     2 |      412 |         2    
(1 row)    

根據root page id 查看root page的stats

btpo = 2 當前在第二層,另外還表示下層是1

btpo_flags = 2 說明是root page

postgres=# select * from bt_page_stats('tab2_pkey', 412);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
   412 | r    |         11 |          0 |            15 |      8192 |      7936 |         0 |         0 |    2 |          2    
(1 row)    

查看root page存儲的 branch page items (指向branch page)

postgres=# select * from bt_page_items('tab2_pkey', 412);    
 itemoffset |   ctid   | itemlen | nulls | vars |          data               
------------+----------+---------+-------+------+-------------------------    
          1 | (3,1)    |       8 | f     | f    |     
          2 | (2577,1) |      16 | f     | f    | e1 78 0b 00 00 00 00 00    
          3 | (1210,1) |      16 | f     | f    | ec 3a 18 00 00 00 00 00    
          4 | (2316,1) |      16 | f     | f    | de 09 25 00 00 00 00 00    
          5 | (574,1)  |      16 | f     | f    | aa e8 33 00 00 00 00 00    
          6 | (2278,1) |      16 | f     | f    | 85 90 40 00 00 00 00 00    
          7 | (1093,1) |      16 | f     | f    | f6 e9 4e 00 00 00 00 00    
          8 | (2112,1) |      16 | f     | f    | a3 60 5c 00 00 00 00 00    
          9 | (411,1)  |      16 | f     | f    | b2 ea 6b 00 00 00 00 00    
         10 | (2073,1) |      16 | f     | f    | db de 79 00 00 00 00 00    
         11 | (1392,1) |      16 | f     | f    | df b0 8a 00 00 00 00 00    
(11 rows)    

根據branch page id查看stats

btpo = 1 當前在第一層 ,另外還表示下層是0

btpo_flags = 0 說明是branch page

postgres=# select * from bt_page_stats('tab2_pkey', 3);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     3 | i    |        254 |          0 |            15 |      8192 |      3076 |         0 |      2577 |    1 |          0    
(1 row)    

查看branch page存儲的 leaf page ctid (指向leaf page)

只要不是最右邊的頁,第一條都代表右頁的起始item。

第二條才是當前頁的起始ctid

注意所有branch page的起始item對應的data都是空的。

也就是說它不存儲當前branch page包含的所有leaf pages的索引字段內容的最小值。

postgres=# select * from bt_page_items('tab2_pkey', 3);    
 itemoffset |   ctid   | itemlen | nulls | vars |          data               
------------+----------+---------+-------+------+-------------------------    
          1 | (735,1)  |      16 | f     | f    | e1 78 0b 00 00 00 00 00    
          2 | (1,1)    |       8 | f     | f    |     
          3 | (2581,1) |      16 | f     | f    | a8 09 00 00 00 00 00 00    
          4 | (1202,1) |      16 | f     | f    | f8 13 00 00 00 00 00 00    
...    
        254 | (3322,1) |      16 | f     | f    | ee 6f 0b 00 00 00 00 00    
(254 rows)    

根據ctid 查看leaf page

btpo = 0 當前在第0層,即最底層,這里存儲的是heap ctid

btpo_flags = 1 說明是leaf page

postgres=# select * from bt_page_stats('tab2_pkey', 1);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     1 | l    |        242 |          0 |            16 |      8192 |      3308 |         0 |      2581 |    0 |          1    
(1 row)    
    
postgres=# select * from bt_page_items('tab2_pkey', 1);    
 itemoffset |    ctid    | itemlen | nulls | vars |          data               
------------+------------+---------+-------+------+-------------------------    
          1 | (4985,16)  |      16 | f     | f    | a8 09 00 00 00 00 00 00    
          2 | (7305,79)  |      16 | f     | f    | 01 00 00 00 00 00 00 00    
          3 | (2757,120) |      16 | f     | f    | 09 00 00 00 00 00 00 00    
...    
        242 | (1329,101) |      16 | f     | f    | a0 09 00 00 00 00 00 00    
(242 rows)    

查看leaf page中包含的heap page items。

如果我們根據索引頁結構的原理,能推算出來(7305,79)是最小值,取它就沒錯了。

postgres=# select * from tab2 where ctid='(7305,79)';    
 id |               info                   
----+----------------------------------    
  1 | 18aaeb74c359355311ac825ae2aeb22a    
(1 row)    
    
postgres=# select min(id) from tab2;    
 min     
-----    
   1    
(1 row)    

4.

多層結構,除了meta page,還可能包含多層branch page,以及一層leaf page。

4

例子

postgres=# create table tab3(id int primary key, info text);    
CREATE TABLE    
postgres=# insert into tab3 select generate_series(1, 100000000), md5(random()::text);      

查看meta page, 注意level,已經是3級了。

meta page    
postgres=# select * from bt_metap('tab3_pkey');    
 magic  | version |  root  | level | fastroot | fastlevel     
--------+---------+--------+-------+----------+-----------    
 340322 |       2 | 116816 |     3 |   116816 |         3    
(1 row)    

btpo_flags=2 代表 root page

btpo = 3 代表第3層

postgres=# select * from bt_page_stats('tab3_pkey', 116816);    
 blkno  | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
--------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
 116816 | r    |          3 |          0 |            13 |      8192 |      8096 |         0 |         0 |    3 |          2    
(1 row)    
    
postgres=# select * from bt_page_items('tab3_pkey', 116816);    
 itemoffset |    ctid    | itemlen | nulls | vars |          data               
------------+------------+---------+-------+------+-------------------------    
          1 | (412,1)    |       8 | f     | f    |     
          2 | (116815,1) |      16 | f     | f    | 5f 9e c5 01 00 00 00 00    
          3 | (198327,1) |      16 | f     | f    | bd 3c 8b 03 00 00 00 00    
(3 rows)    

btpo_flags=0 代表 branch page

btpo = 2 代表第2層

postgres=# select * from bt_page_stats('tab3_pkey', 412);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
   412 | i    |        286 |          0 |            15 |      8192 |      2436 |         0 |    116815 |    2 |          0    
(1 row)    
    
postgres=# select * from bt_page_items('tab3_pkey', 412);    
 itemoffset |   ctid    | itemlen | nulls | vars |          data               
------------+-----------+---------+-------+------+-------------------------    
          1 | (81636,1) |      16 | f     | f    | 5f 9e c5 01 00 00 00 00  -- 這是指向當前層級右頁的ctid    
          2 | (3,1)     |       8 | f     | f    |    -- 注意第一條初始值是這    
          3 | (411,1)   |      16 | f     | f    | 77 97 01 00 00 00 00 00    
          4 | (698,1)   |      16 | f     | f    | ed 2e 03 00 00 00 00 00    
...    
        286 | (81350,1) |      16 | f     | f    | e9 06 c4 01 00 00 00 00    
(286 rows)    

btpo_flags=0 代表 branch page

btpo = 1 代表第1層

postgres=# select * from bt_page_stats('tab3_pkey', 3);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     3 | i    |        286 |          0 |            15 |      8192 |      2436 |         0 |       411 |    1 |          0    
(1 row)    
    
postgres=# select * from bt_page_items('tab3_pkey', 3);    
 itemoffset |  ctid   | itemlen | nulls | vars |          data               
------------+---------+---------+-------+------+-------------------------    
          1 | (287,1) |      16 | f     | f    | 77 97 01 00 00 00 00 00    
          2 | (1,1)   |       8 | f     | f    |     
          3 | (2,1)   |      16 | f     | f    | 6f 01 00 00 00 00 00 00    
          4 | (4,1)   |      16 | f     | f    | dd 02 00 00 00 00 00 00    
...    
        286 | (286,1) |      16 | f     | f    | 09 96 01 00 00 00 00 00    
(286 rows)    

btpo_flags=1 代表 leaf page

btpo = 0 代表第0層

postgres=# select * from bt_page_stats('tab3_pkey', 1);    
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags     
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------    
     1 | l    |        367 |          0 |            16 |      8192 |       808 |         0 |         2 |    0 |          1    
(1 row)    
    
postgres=# select * from bt_page_items('tab3_pkey', 1);    
 itemoffset |  ctid   | itemlen | nulls | vars |          data               
------------+---------+---------+-------+------+-------------------------    
          1 | (3,7)   |      16 | f     | f    | 6f 01 00 00 00 00 00 00    
          2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00    
          3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00    
...    
        367 | (3,6)   |      16 | f     | f    | 6e 01 00 00 00 00 00 00    
(367 rows)    

通過第0層的ctid就可以獲取到heap了.

heap tuple例子

postgres=# select * from tab3 where ctid='(0,1)';    
 id |               info                   
----+----------------------------------    
  1 | 370ee1989a2b7f5d8a5b43243596d91f    
(1 row)    

如何解釋explain analyze中的掃描了多少個btree page

實戰例子1

postgres=# create table tbl1(id int primary key, info text);    
CREATE TABLE    
postgres=# insert into tbl1 select trunc(random()*10000000), md5(random()::text) from generate_series(1,5000000) on conflict on constraint tbl1_pkey do nothing;    
INSERT 0 3934875    
postgres=# select ctid,* from tbl1 limit 10;    
  ctid  |   id    |               info                   
--------+---------+----------------------------------    
 (0,1)  | 2458061 | 5c91812b54bdcae602321dceaf22e276    
 (0,2)  | 8577271 | fe8e7a8be0d71a94e13b1b5a7786010b    
 (0,3)  | 4612744 | 56983e47f044b5a4655300e1868d2850    
 (0,4)  | 3690167 | 4a5ec8abf67bc018dcc113be829a59da    
 (0,5)  | 2646638 | 7686b47dcb94e56c11d69ec04d6017f3    
 (0,6)  | 6023272 | 4779d9a849c8287490be9d37a27b4637    
 (0,7)  | 7163674 | 35af37f479f48caa65033a5ef56cd75e    
 (0,8)  | 4049257 | 12fa110d927c88dce0773b546cc600c6    
 (0,9)  | 5815903 | 69ed9770ede59917d15ac2373ca8c797    
 (0,10) | 4068194 | 738595f73670da7ede40aefa8cb3d00c    
(10 rows)    
postgres=# vacuum analyze tbl1;    
VACUUM    

首先我們需要了解索引的level,才能正確的判斷需要掃描多少個index page才能取出1條記錄。

postgres=# select * from bt_metap('tbl1_pkey');    
 magic  | version | root | level | fastroot | fastlevel     
--------+---------+------+-------+----------+-----------    
 340322 |       2 |  412 |     2 |      412 |         2    
(1 row)    

level = 2的btree應該長這樣

6

1. 以下查詢,命中了1條記錄,並且走的是index only scan。

讀了4個INDEX PAGE, 包括1 meta page, 1 root page, 1 branch page, 1 leaf page.

postgres=#  explain (analyze,verbose,timing,costs,buffers) select id from tbl1 where id = 1;    
                                                         QUERY PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------    
 Index Only Scan using tbl1_pkey on public.tbl1  (cost=0.42..1.44 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1)    
   Output: id    
   Index Cond: (tbl1.id = 1)    
   Heap Fetches: 0    
   Buffers: shared hit=4    
 Planning time: 0.072 ms    
 Execution time: 0.072 ms    
(7 rows)    

2. 以下查詢,命中了0條記錄,並且走的是index only scan。

讀了4個INDEX PAGE, 包括1 meta page, 1 root page, 1 branch page, 1 leaf page.

但是explain只算了3個,沒有計算leaf page的那次,算個小BUG吧。

postgres=# explain (analyze,verbose,timing,costs,buffers) select id from tbl1 where id in (3);    
                                                         QUERY PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------    
 Index Only Scan using tbl1_pkey on public.tbl1  (cost=0.43..1.45 rows=1 width=4) (actual time=0.010..0.010 rows=0 loops=1)    
   Output: id    
   Index Cond: (tbl1.id = 3)    
   Heap Fetches: 0    
   Buffers: shared hit=3    
 Planning time: 0.073 ms    
 Execution time: 0.031 ms    
(7 rows)    

3. 以下查詢,命中了7條記錄,並且走的是index only scan。

讀了22個INDEX PAGE,

1 meta page + 7 * (1 root + 1 branch + 1 leaf) = 22

也就是說,每個value都掃了root,branch,leaf。

postgres=#  explain (analyze,verbose,timing,costs,buffers) select id from tbl1 where id in (1,2,3,4,100,1000,10000);    
                                                         QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------    
 Index Only Scan using tbl1_pkey on public.tbl1  (cost=0.42..10.10 rows=7 width=4) (actual time=0.018..0.033 rows=7 loops=1)    
   Output: id    
   Index Cond: (tbl1.id = ANY ('{1,2,3,4,100,1000,10000}'::integer[]))    
   Heap Fetches: 0    
   Buffers: shared hit=22    
 Planning time: 0.083 ms    
 Execution time: 0.056 ms    
(7 rows)    

4. 以下查詢,命中了2條記錄,並且走的是index only scan。

讀了22個INDEX PAGE,

1 meta page + 7 * (1 root + 1 branch + 1 leaf) = 22

也就是說,每個value都掃了root,branch,leaf。

postgres=# explain (analyze,verbose,timing,costs,buffers) select id from tbl1 where id in (1,2,3,4,5,6,7);    
                                                         QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------    
 Index Only Scan using tbl1_pkey on public.tbl1  (cost=0.43..10.13 rows=7 width=4) (actual time=0.039..0.046 rows=2 loops=1)    
   Output: id    
   Index Cond: (tbl1.id = ANY ('{1,2,3,4,5,6,7}'::integer[]))    
   Heap Fetches: 0    
   Buffers: shared hit=22    
 Planning time: 0.232 ms    
 Execution time: 0.086 ms    
(7 rows)    

5. 以下查詢結果和以上查詢一樣,也命中了3條記錄,並且走的是index only scan。

但是只讀了4個INDEX PAGE,

1 meta page + 1 root + 1 branch + 1 leaf

postgres=# explain (analyze,verbose,timing,costs,buffers) select id from tbl1 where id>0 and id <=7;    
                                                         QUERY PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------    
 Index Only Scan using tbl1_pkey on public.tbl1  (cost=0.43..1.49 rows=3 width=4) (actual time=0.008..0.009 rows=2 loops=1)    
   Output: id    
   Index Cond: ((tbl1.id > 0) AND (tbl1.id <= 7))    
   Heap Fetches: 0    
   Buffers: shared hit=4    
 Planning time: 0.127 ms    
 Execution time: 0.028 ms    
(7 rows)    

對於第四個查詢,掃描了22個塊,這個查詢,優化器有優化的空間,比如找到1和7作為邊界值,在查詢到第一個值時,就可以取到leaf page的下一個page的最小值,從而得到1,2,3,4,5,6,7的值在當前page就可以完全取到,不需要去重復掃描。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM