PostgreSQL分區表(Table Partitioning)應用


一、簡介

  在數據庫日漸龐大的今天,為了方便對數據庫數據的管理,比如按時間,按地區去統計一些數據時,基數過於龐大,多有不便。很多商業數據庫都提供分區的概念,按不同的維度去存放數據,便於后期的管理,PostgreSQL也不例外。

  PostgresSQL分區的意思是把邏輯上的一個大表分割成物理上的幾塊兒。分區不僅能帶來訪問速度的提升,關鍵的是,它能帶來管理和維護上的方便。

  分區的具體好處是:

  • 某些類型的查詢性能可以得到極大提升。
  • 更新的性能也可以得到提升,因為表的每塊的索引要比在整個數據集上的索引要小。如果索引不能全部放在內存里,那么在索引上的讀和寫都會產生更多的磁盤訪問。
  • 批量刪除可以用簡單的刪除某個分區來實現。
  • 可以將很少用的數據移動到便宜的、轉速慢的存儲介質上。

  在PG里表分區是通過表繼承來實現的,一般都是建立一個主表,里面是空,然后每個分區都去繼承它。無論何時,都應保證主表里面是空的。

  小表分區不實際,表在多大情況下才考慮分區呢?PostgresSQL官方給出的建議是:當表本身大小超過了機器物理內存的實際大小時(the size of the table should exceed the physical memory of the database server),可以考慮分區。

  PG目前(9.2.2)僅支持范圍分區和列表分區,尚未支持散列分區。

二、環境

系統環境:CentOS release 6.3 (Final)

PostgreSQL版本:PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

三、實現分區

3.1 創建主表

david=# create table tbl_partition (
david(# id integer,
david(# name varchar(20),
david(# gender boolean,
david(# join_date date,
david(# dept char(4));
CREATE TABLE
david=#

3.2 創建分區表

david=# create table tbl_partition_201211 (
check ( join_date >= DATE '2012-11-01' AND join_date < DATE '2012-12-01' )       
) INHERITS (tbl_partition);
CREATE TABLE
david=# create table tbl_partition_201212 (
check ( join_date >= DATE '2012-12-01' AND join_date < DATE '2013-01-01' )      
) INHERITS (tbl_partition);
CREATE TABLE
david=# create table tbl_partition_201301 (
check ( join_date >= DATE '2013-01-01' AND join_date < DATE '2013-02-01' )      
) INHERITS (tbl_partition);
CREATE TABLE
david=# create table tbl_partition_201302 (
check ( join_date >= DATE '2013-02-01' AND join_date < DATE '2013-03-01' )  
) INHERITS (tbl_partition);
CREATE TABLE
david=# create table tbl_partition_201303 (
check ( join_date >= DATE '2013-03-01' AND join_date < DATE '2013-04-01' ) 
) INHERITS (tbl_partition);
CREATE TABLE
david=# create table tbl_partition_201304 (
check ( join_date >= DATE '2013-04-01' AND join_date < DATE '2013-05-01' )  
) INHERITS (tbl_partition);
CREATE TABLE
david=# create table tbl_partition_201305 (
check ( join_date >= DATE '2013-05-01' AND join_date < DATE '2013-06-01' )  
) INHERITS (tbl_partition);
CREATE TABLE
david=# 

3.3 分區鍵上建索引

david=# create index tbl_partition_201211_joindate on tbl_partition_201211 (join_date);
CREATE INDEX
david=# create index tbl_partition_201212_joindate on tbl_partition_201212 (join_date); 
CREATE INDEX
david=# create index tbl_partition_201301_joindate on tbl_partition_201301 (join_date);    
CREATE INDEX
david=# create index tbl_partition_201302_joindate on tbl_partition_201302 (join_date); 
CREATE INDEX
david=# create index tbl_partition_201303_joindate on tbl_partition_201303 (join_date); 
CREATE INDEX
david=# create index tbl_partition_201304_joindate on tbl_partition_201304 (join_date); 
CREATE INDEX
david=# create index tbl_partition_201305_joindate on tbl_partition_201305 (join_date); 
CREATE INDEX
david=# 

對於開發人員來說,希望數據庫是透明的,只管 insert into tbl_partition。對於數據插向哪個分區,則希望由DB決定。這點,ORACLE實現了,但是PG不行,需要前期人工處理下。

3.4 創建觸發器函數

david=# CREATE OR REPLACE FUNCTION tbl_partition_insert_trigger()                      
RETURNS TRIGGER AS $$  
BEGIN  
    IF ( NEW.join_date >= DATE '2012-11-01' AND    
         NEW.join_date < DATE '2012-12-01' ) THEN  
        INSERT INTO tbl_partition_201211 VALUES (NEW.*);  
    ELSIF ( NEW.join_date >= DATE '2012-12-01' AND  
            NEW.join_date < DATE '2013-01-01' ) THEN  
        INSERT INTO tbl_partition_201212 VALUES (NEW.*);  
    ELSIF ( NEW.join_date >= DATE '2013-01-01' AND  
            NEW.join_date < DATE '2013-02-01' ) THEN  
        INSERT INTO tbl_partition_201301 VALUES (NEW.*);  
    ELSIF ( NEW.join_date >= DATE '2013-02-01' AND  
            NEW.join_date < DATE '2013-03-01' ) THEN  
        INSERT INTO tbl_partition_201302 VALUES (NEW.*);  
    ELSIF ( NEW.join_date >= DATE '2013-03-01' AND  
            NEW.join_date < DATE '2013-04-01' ) THEN  
        INSERT INTO tbl_partition_201303 VALUES (NEW.*);  
    ELSIF ( NEW.join_date >= DATE '2013-04-01' AND  
            NEW.join_date < DATE '2013-05-01' ) THEN  
        INSERT INTO tbl_partition_201304 VALUES (NEW.*);
    ELSIF ( NEW.join_date >= DATE '2013-05-01' AND
            NEW.join_date < DATE '2013-06-01' ) THEN
        INSERT INTO tbl_partition_201305 VALUES (NEW.*); 
    ELSE  
        RAISE EXCEPTION 'Date out of range. Fix the tbl_partition_insert_trigger() function!';  
    END IF;  
    RETURN NULL;  
END;  
$$  
LANGUAGE plpgsql;
CREATE FUNCTION
david=# 

說明:如果不想丟失數據,上面的ELSE 條件可以改成 INSERT INTO tbl_partition_error_join_date VALUES (NEW.*); 同時需要創建一張結構和tbl_partition 一樣的表tbl_partition_error_join_date,這樣,錯誤的join_date 數據就可以插入到這張表中而不是報錯了。

3.5 創建觸發器

david=# CREATE TRIGGER insert_tbl_partition_trigger
david-#     BEFORE INSERT ON tbl_partition
david-#     FOR EACH ROW EXECUTE PROCEDURE tbl_partition_insert_trigger();
CREATE TRIGGER
david=# 

四、查看表

4.1 查看所有表

david=# \dt
                List of relations
 Schema |         Name         | Type  |  Owner   
--------+----------------------+-------+----------
 public | tbl_partition        | table | postgres
 public | tbl_partition_201211 | table | postgres
 public | tbl_partition_201212 | table | postgres
 public | tbl_partition_201301 | table | postgres
 public | tbl_partition_201302 | table | postgres
 public | tbl_partition_201303 | table | postgres
 public | tbl_partition_201304 | table | postgres
 public | tbl_partition_201305 | table | postgres
(8 rows)

david=# 

4.2 查看主表

david=# \d tbl_partition
         Table "public.tbl_partition"
  Column   |         Type          | Modifiers 
-----------+-----------------------+-----------
 id        | integer               | 
 name      | character varying(20) | 
 gender    | boolean               | 
 join_date | date                  | 
 dept      | character(4)          | 
Triggers:
    insert_tbl_partition_trigger BEFORE INSERT ON tbl_partition FOR EACH ROW EXECUTE PROCEDURE tbl_partition_insert_trigger()
Number of child tables: 7 (Use \d+ to list them.)

david=# 

4.3 查看分區表

david=# \d tbl_partition_201304
      Table "public.tbl_partition_201304"
  Column   |         Type          | Modifiers 
-----------+-----------------------+-----------
 id        | integer               | 
 name      | character varying(20) | 
 gender    | boolean               | 
 join_date | date                  | 
 dept      | character(4)          | 
Indexes:
    "tbl_partition_201304_joindate" btree (join_date)
Check constraints:
    "tbl_partition_201304_join_date_check" CHECK (join_date >= '2013-04-01'::date AND join_date < '2013-05-01'::date)
Inherits: tbl_partition

david=#

五、測試

5.1 插入數據

david=# insert into tbl_partition values (1, 'David', '1', '2013-01-10', 'TS');
INSERT 0 0
david=# insert into tbl_partition values (2, 'Sandy', '0', '2013-02-10', 'TS');
INSERT 0 0
david=# insert into tbl_partition values (3, 'Eagle', '1', '2012-11-01', 'TS');
INSERT 0 0
david=# insert into tbl_partition values (4, 'Miles', '1', '2012-12-15', 'SD');
INSERT 0 0
david=# insert into tbl_partition values (5, 'Simon', '1', '2012-12-10', 'SD');
INSERT 0 0
david=# insert into tbl_partition values (6, 'Rock', '1', '2012-11-10', 'SD');
INSERT 0 0
david=# insert into tbl_partition values (7, 'Peter', '1', '2013-01-11', 'SD');
INSERT 0 0
david=# insert into tbl_partition values (8, 'Sally', '0', '2013-03-10', 'BCSC');
INSERT 0 0
david=# insert into tbl_partition values (9, 'Carrie', '0', '2013-04-02', 'BCSC');
INSERT 0 0
david=# insert into tbl_partition values (10, 'Lee', '1', '2013-01-05', 'BMC');
INSERT 0 0
david=# insert into tbl_partition values (11, 'Nicole', '0', '2012-11-10', 'PROJ');
INSERT 0 0
david=# insert into tbl_partition values (12, 'Renee', '0', '2013-01-10', 'TS');
INSERT 0 0
david=# 

5.2 查看主表數據

david=# select * from tbl_partition;
 id |  name  | gender | join_date  | dept 
----+--------+--------+------------+------
  3 | Eagle  | t      | 2012-11-01 | TS  
  6 | Rock   | t      | 2012-11-10 | SD  
 11 | Nicole | f      | 2012-11-10 | PROJ
  4 | Miles  | t      | 2012-12-15 | SD  
  5 | Simon  | t      | 2012-12-10 | SD  
  1 | David  | t      | 2013-01-10 | TS  
  7 | Peter  | t      | 2013-01-11 | SD  
 10 | Lee    | t      | 2013-01-05 | BMC 
 12 | Renee  | f      | 2013-01-10 | TS  
  2 | Sandy  | f      | 2013-02-10 | TS  
  8 | Sally  | f      | 2013-03-10 | BCSC
  9 | Carrie | f      | 2013-04-02 | BCSC
(12 rows)

david=# 

5.3 查看分區表數據

david=# select * from tbl_partition_201301 ;
 id | name  | gender | join_date  | dept 
----+-------+--------+------------+------
  1 | David | t      | 2013-01-10 | TS  
  7 | Peter | t      | 2013-01-11 | SD  
 10 | Lee   | t      | 2013-01-05 | BMC 
 12 | Renee | f      | 2013-01-10 | TS  
(4 rows)

david=# 

六、管理分區

6.1 移除數據/分區

實現分區表之后,我們就可以很容易地移除不再使用的舊數據了,最簡單的方法就是:

david=# drop table tbl_partition_201304;

這樣可以快速移除大量數據,而不是逐條刪除數據。

另一個推薦做法是將分區從分區表中移除,但是保留訪問權限。

david=# alter table tbl_partition_201304 no inherit tbl_partition;
ALTER TABLE
david=#

和直接DROP 相比,該方式僅僅是使子表脫離了原有的主表,而存儲在子表中的數據仍然可以得到訪問,因為此時該表已經被還原成一個普通的數據表了。這樣對於數據庫的DBA來說,就可以在此時對該表進行必要的維護操作,如數據清理、歸檔等,在完成諸多例行性的操作之后,就可以考慮是直接刪除該表(DROP TABLE),還是先清空該表的數據(TRUNCATE TABLE),之后再讓該表重新繼承主表。

david=# alter table tbl_partition_201304 inherit tbl_partition;                                                            
ALTER TABLE
david=# 

6.2 增加分區

我們可以像之前那樣增加一個分區

david=# create table tbl_partition_201306 (
check ( join_date >= DATE '2013-06-01' AND join_date < DATE '2013-07-01' )  
) INHERITS (tbl_partition);
david=# create index tbl_partition_201306_joindate on tbl_partition_201306 (join_date);

同時,需要修改觸發器函數,將插入條件改成相應的值。

說明:創建觸發器函數時,最好把插入條件寫更未來一點,比如多寫十年,這樣以后增加新分區時就不需要重新創建觸發器函數了,也可以避免一些不必要的錯誤。

另外,還可以如下增加新的分區:

david=# create table tbl_partition_201307
david-# (LIKE tbl_partition INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
CREATE TABLE
david=#

david=# alter table tbl_partition_201307 add constraint tbl_partition_201307_join_date_check 
david-# check ( join_date >= DATE '2013-07-01' AND join_date < DATE '2013-08-01' );
ALTER TABLE
david=#

david=# create index tbl_partition_201307_joindate on tbl_partition_201307 (join_date);

david=# copy tbl_partition_201307 from '/tmp/tbl_partition_201307.sql'; //從文件中拷貝數據,這些數據可以是事前准備的 david
=# alter table tbl_partition_201307 inherit tbl_partition;

七、約束排除

約束排除(Constraint exclusion)是一種查詢優化技巧,它改進了用上面方法定義的表分區的性能。

確保postgresql.conf 里的配置參數constraint_exclusion 是打開的。沒有這個參數,查詢不會按照需要進行優化。這里我們需要做的是確保該選項在配置文件中沒有被注釋掉。

如果沒有約束排除,查詢會掃描tbl_partition 表中的每一個分區。打開了約束排除之后,規划器將檢查每個分區的約束然后再試圖證明該分區不需要被掃描,因為它不能包含任何符合WHERE子句條件的數據行。如果規划器可以證明這個,它就把該分區從查詢規划里排除出去。

可以使用EXPLAIN 命令顯示一個規划在constraint_exclusion 關閉和打開情況下的不同:

7.1 約束排除關閉

david=# set constraint_exclusion = off;                                                 
SET
david=# explain select count(*) from tbl_partition where join_date >= DATE '2013-04-01';
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Aggregate  (cost=172.80..172.81 rows=1 width=0)
   ->  Append  (cost=0.00..167.62 rows=2071 width=0)
         ->  Seq Scan on tbl_partition  (cost=0.00..0.00 rows=1 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201211 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201212 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201301 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201302 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201303 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201305 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201304 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201306 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201307 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
(22 rows)

david=# 

從上面的查詢計划中可以看出,PostgreSQL 掃描了所有分區。下面我們再看一下打開約束排除之后的查詢計划: 

7.2 約束排除開啟

david=# set constraint_exclusion = on;                                                  
SET
david=# explain select count(*) from tbl_partition where join_date >= DATE '2013-04-01';
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Aggregate  (cost=76.80..76.81 rows=1 width=0)
   ->  Append  (cost=0.00..74.50 rows=921 width=0)
         ->  Seq Scan on tbl_partition  (cost=0.00..0.00 rows=1 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201305 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201304 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201306 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
         ->  Seq Scan on tbl_partition_201307 tbl_partition  (cost=0.00..18.62 rows=230 width=0)
               Filter: (join_date >= '2013-04-01'::date)
(12 rows)

david=#

可以看到,PostgreSQL 只掃描四月份以后的分區表。

八、可選的分區方式

還可以通過建立規則的方式進行分區。

CREATE RULE insert_tbl_partition_201211 AS
ON INSERT TO tbl_partition WHERE
    ( join_date >= DATE '2012-11-01' AND join_date < DATE '2012-12-01' )
DO INSTEAD
    INSERT INTO tbl_partition_201211 VALUES (NEW.*);

CREATE RULE insert_tbl_partition_201212 AS
ON INSERT TO tbl_partition WHERE
    ( join_date >= DATE '2012-12-01' AND join_date < DATE '2013-01-01' )
DO INSTEAD
    INSERT INTO tbl_partition_201212 VALUES (NEW.*);

...
    
CREATE RULE insert_tbl_partition_201306 AS
ON INSERT TO tbl_partition WHERE
    ( join_date >= DATE '2013-06-01' AND join_date < DATE '2013-07-01' )
DO INSTEAD
    INSERT INTO tbl_partition_201306 VALUES (NEW.*);
    
CREATE RULE insert_tbl_partition_201307 AS
ON INSERT TO tbl_partition WHERE
    ( join_date >= DATE '2013-07-01' AND join_date < DATE '2013-08-01' )
DO INSTEAD
    INSERT INTO tbl_partition_201307 VALUES (NEW.*);
    
CREATE RULE insert_tbl_partition_error_join_date AS
ON INSERT TO tbl_partition WHERE
    ( join_date >= DATE '2013-08-01' OR join_date < DATE '2012-11-01' )
DO INSTEAD
    INSERT INTO tbl_partition_error_join_date VALUES (NEW.*);

九、注意事項

VACUUM 或 ANALYZE tbl_partition 只會對主表起作用,要想分析表,需要分別分析每個分區表。

十、參考資料


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM