PG_普通表在线改造成分区表


很多项目现场由于前期规划问题导致很多表未设置成分区表,下面介绍一种在线迁移的方法。

使用继承表,触发器,异步迁移,交换表名一系列步骤,在线将非分区表,转换为分区表(交换表名是需要短暂的堵塞)。

关键技术:

1、继承表(子分区)

对select, update, delete, truncate, drop透明。

2、触发器

插入,采用before触发器,数据路由到继承分区

更新,采用before触发器,删除老表记录,同时将更新后的数据插入新表

3、后台迁移数据,cte only skip locked , delete only, insert into new table

4、迁移结束(p表没有数据后),短暂上锁,剥离INHERTI关系,切换到原生分区,切换表名。

以下以常用的时间分区进行试验:

1.创建测试表

create table old (id int primary key, info text, create_time timestamp);  

2.插入100万测试数据

insert into old select generate_series(1,1000000),md5(random()::text),(now()+ ((random()*100)::int ||' day')::interval);

3.创建子分区(模拟项目现场按时间的range分区)

do language plpgsql $$    
declare    
  rec record;    
begin    
  for rec in (select t as beginmonth,t+interval '1 month' as endmonth from generate_series('2020-12-01'::timestamp,'2021-05-01'::timestamp,interval '1 month') g(t)) 
  loop    
    execute format('create table old_%s (like old including all) inherits (old)', to_char(rec.beginmonth,'yyyyMM'));    
    execute format('alter table old_%s add constraint ck check(create_time>=%s::timestamp and create_time<%s::timestamp)', to_char(rec.beginmonth,'yyyyMM'), ''''||rec.beginmonth||'''', ''''||rec.endmonth||'''');    
  end loop;    
end;    
$$;

4.old表继承关系

postgres=# \d+ old
                                                 Table "public.old"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 id          | integer                     |           | not null |         | plain    |              | 
 info        | text                        |           |          |         | extended |              | 
 create_time | timestamp without time zone |           |          |         | plain    |              | 
Indexes:
    "old_pkey" PRIMARY KEY, btree (id)
Child tables: 
              old_202012,
              old_202101,
              old_202102,
              old_202103,
              old_202104,
              old_202105
Access method: heap

postgres=# \d+ old_202109
                                             Table "public.old_202109"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 id          | integer                     |           | not null |         | plain    |              | 
 info        | text                        |           |          |         | extended |              | 
 create_time | timestamp without time zone |           |          |         | plain    |              | 
Indexes:
    "old_202109_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "ck" CHECK (create_time >= '2021-09-01 00:00:00'::timestamp without time zone AND create_time < '2021-10-01 00:00:00'::timestamp without time zone)
Inherits: old
Access method: heap

5.插入,采用before触发器,路由到新表(需要根据实际情况修改,测试数据分区只到'2021-05')

create or replace function ins_tbl() returns trigger as $$    
declare    
begin    
  case when (new.create_time>='2020-12-01'::timestamp and new.create_time<'2021-01-01'::timestamp) then
        insert into old_202012 values (NEW.*);
    when (new.create_time>='2021-01-01'::timestamp and new.create_time<'2021-02-01'::timestamp) then    
      insert into old_202101 values (NEW.*);    
    when (new.create_time>='2021-02-01'::timestamp and new.create_time<'2021-03-01'::timestamp) then    
      insert into old_202102 values (NEW.*);    
    when (new.create_time>='2021-03-01'::timestamp and new.create_time<'2021-04-01'::timestamp) then    
      insert into old_202103 values (NEW.*);    
    when (new.create_time>='2021-04-01'::timestamp and new.create_time<'2021-05-01'::timestamp) then    
      insert into old_202104 values (NEW.*);    
    when (new.create_time>='2021-05-01'::timestamp and new.create_time<'2021-06-01'::timestamp) then    
      insert into old_202105 values (NEW.*); 
    else    
      return NEW;  -- 如果是NULL则写本地父表,主键不会为NULL     
  end case;    
  return null;    
end;    
$$ language plpgsql strict;    
  
create trigger tg1 before insert on old for each row execute procedure ins_tbl();

6.更新,采用before触发器,路由到新表(分区字段理论上不更新,如果更新会导致报错)

create or replace function upd_tbl () returns trigger as $$  
declare  
begin  
  case when (new.create_time>='2020-12-01'::timestamp and new.create_time<'2021-01-01'::timestamp) then
        insert into old_202012 values (NEW.*);
    when (new.create_time>='2021-01-01'::timestamp and new.create_time<'2021-02-01'::timestamp) then    
      insert into old_202101 values (NEW.*);    
    when (new.create_time>='2021-02-01'::timestamp and new.create_time<'2021-03-01'::timestamp) then    
      insert into old_202102 values (NEW.*);    
    when (new.create_time>='2021-03-01'::timestamp and new.create_time<'2021-04-01'::timestamp) then    
      insert into old_202103 values (NEW.*);    
    when (new.create_time>='2021-04-01'::timestamp and new.create_time<'2021-05-01'::timestamp) then    
      insert into old_202104 values (NEW.*);    
    when (new.create_time>='2021-05-01'::timestamp and new.create_time<'2021-06-01'::timestamp) then    
      insert into old_202105 values (NEW.*);    
    else    
      return NEW;  -- 如果是NULL则写本地父表,主键不会为NULL     
  end case;    
  
  delete from only old where id=NEW.id;  
  return null;    
end;    
$$ language plpgsql strict;    
  
create trigger tg2 before update on old for each row execute procedure upd_tbl(); 

7.测试delete、insert、update、select是否逻辑正常

--DELETE
postgres=# select tableoid::regclass,* from old where id=1;
 tableoid | id |               info               |        create_time         
----------+----+----------------------------------+----------------------------
 old      |  1 | 9f6bd5bc6e54e549b8380c8d6c70c9b4 | 2021-01-14 15:13:05.442282
(1 row)

postgres=# delete from old  where id=1;
DELETE 1
postgres=# select tableoid::regclass,* from old where id=1;
 tableoid | id | info | create_time 
----------+----+------+-------------
(0 rows)

--INSERT
postgres=# INSERT INTO old values(1,md5(random()::text),(now()+ ((random()*100)::int ||' day')::interval));
INSERT 0 0
postgres=# select tableoid::regclass,* from old where id=1;
  tableoid  | id |               info               |        create_time         
------------+----+----------------------------------+----------------------------
 old_202101 |  1 | adfcee05df6437fabb21f40b13320ce0 | 2021-01-05 15:17:26.066304
(1 row)

--UPDATE
postgres=# select tableoid::regclass,* from old where id in(1,2);
  tableoid  | id |               info               |        create_time         
------------+----+----------------------------------+----------------------------
 old        |  2 | ca46fe7d0fe21f33ec46fb07dd669e32 | 2021-03-13 15:13:05.442282
 old_202101 |  1 | adfcee05df6437fabb21f40b13320ce0 | 2021-01-05 15:17:26.066304
(2 rows)

postgres=# update old set info='test' where id in(1,2) returning tableoid::regclass,*;
  tableoid  | id | info |        create_time         
------------+----+------+----------------------------
 old_202101 |  1 | test | 2021-01-05 15:17:26.066304
(1 row)

UPDATE 1
postgres=# select tableoid::regclass,* from old where id in(1,2);
  tableoid  | id | info |        create_time         
------------+----+------+----------------------------
 old_202101 |  1 | test | 2021-01-05 15:17:26.066304
 old_202103 |  2 | test | 2021-03-13 15:13:05.442282
(2 rows)

8、开启压测,后台对原表数据进行迁移

create or replace function test_ins(int) returns void as $$  
declare  
begin  
  insert into old values ($1,'test',(now()+ ((random()*100)::int ||' day')::interval));  
  exception when others then  
  return;  
end;  
$$ language plpgsql strict;  

vi test.sql  
  
\set id1 random(10000001,200000000)  
\set id2 random(1,50000)  
\set id3 random(50001,100000)  
delete from old where id=:id2;  
update old set info=md5(random()::text) where id=:id3;  
select test_ins(:id1); 

开启压测

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 4 -j 4 -T 1200 

9、在线迁移数据

批量迁移,每一批迁移N条。调用以下SQL

with a as (  
delete from only old where ctid = any (array (select ctid from only old limit 10000 for update skip locked) ) returning *  
)  
insert into old select * from a; 

持续调用以上SQL,直到old表已经完全没数据,则代表数据全部迁移到分区

postgres=# select count(*) from only old;
 count 
-------
     0
(1 row)

postgres=# select count(*) from old;
  count  
---------
 1023111
(1 row)

10.切换到分区表
创建分区表

create table new (id int, info text, create_time timestamp) partition by range (create_time); 

切换表名,防止雪崩,使用所超时,因为只是涉及到表名更改,所以速度非常快

begin;  
set lock_timeout ='3s';    
alter table old_202012 no inherit old;
alter table old_202101 no inherit old;
alter table old_202102 no inherit old;
alter table old_202103 no inherit old;
alter table old_202104 no inherit old;
alter table old_202105 no inherit old;
alter table old rename to old_tmp;  
alter table new rename to old;  
alter table old ATTACH PARTITION old_202012 for values from ('2020-12-01'::timestamp) to ('2021-01-01'::timestamp);    
alter table old ATTACH PARTITION old_202101 for values from ('2021-01-01'::timestamp) to ('2021-02-01'::timestamp);    
alter table old ATTACH PARTITION old_202102 for values from ('2021-02-01'::timestamp) to ('2021-03-01'::timestamp);     
alter table old ATTACH PARTITION old_202103 for values from ('2021-03-01'::timestamp) to ('2021-04-01'::timestamp); 
alter table old ATTACH PARTITION old_202104 for values from ('2021-04-01'::timestamp) to ('2021-05-01'::timestamp); 
alter table old ATTACH PARTITION old_202105 for values from ('2021-05-01'::timestamp) to ('2021-06-01'::timestamp); 
end;

切换后分区如下:

postgres=# \d+ old
                                           Partitioned table "public.old"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 id          | integer                     |           |          |         | plain    |              | 
 info        | text                        |           |          |         | extended |              | 
 create_time | timestamp without time zone |           |          |         | plain    |              | 
Partition key: RANGE (create_time)
Partitions: old_202012 FOR VALUES FROM ('2020-12-01 00:00:00') TO ('2021-01-01 00:00:00'),
            old_202101 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2021-02-01 00:00:00'),
            old_202102 FOR VALUES FROM ('2021-02-01 00:00:00') TO ('2021-03-01 00:00:00'),
            old_202103 FOR VALUES FROM ('2021-03-01 00:00:00') TO ('2021-04-01 00:00:00'),
            old_202104 FOR VALUES FROM ('2021-04-01 00:00:00') TO ('2021-05-01 00:00:00'),
            old_202105 FOR VALUES FROM ('2021-05-01 00:00:00') TO ('2021-06-01 00:00:00')

查询测试

postgres=# explain analyze select * from old where id=1;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.42..50.26 rows=6 width=40) (actual time=0.265..0.266 rows=0 loops=1)
   ->  Index Scan using old_202012_pkey on old_202012  (cost=0.42..8.44 rows=1 width=40) (actual time=0.077..0.077 rows=0 loops=1)
         Index Cond: (id = 1)
   ->  Index Scan using old_202101_pkey on old_202101  (cost=0.42..8.44 rows=1 width=40) (actual time=0.046..0.046 rows=0 loops=1)
         Index Cond: (id = 1)
   ->  Index Scan using old_202102_pkey on old_202102  (cost=0.42..8.44 rows=1 width=39) (actual time=0.052..0.052 rows=0 loops=1)
         Index Cond: (id = 1)
   ->  Index Scan using old_202103_pkey on old_202103  (cost=0.42..8.44 rows=1 width=40) (actual time=0.037..0.038 rows=0 loops=1)
         Index Cond: (id = 1)
   ->  Index Scan using old_202104_pkey on old_202104  (cost=0.29..8.30 rows=1 width=39) (actual time=0.036..0.036 rows=0 loops=1)
         Index Cond: (id = 1)
   ->  Index Scan using old_202105_pkey on old_202105  (cost=0.15..8.17 rows=1 width=44) (actual time=0.011..0.011 rows=0 loops=1)
         Index Cond: (id = 1)
 Planning Time: 2.582 ms
 Execution Time: 0.424 ms
(15 rows)

postgres=# explain analyze select * from old where id=1 and create_time between '2020-12-01'::timestamp and '2021-03-01'::timestamp;
                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.42..33.80 rows=4 width=40) (actual time=0.122..0.122 rows=0 loops=1)
   ->  Index Scan using old_202012_pkey on old_202012  (cost=0.42..8.44 rows=1 width=40) (actual time=0.038..0.039 rows=0 loops=1)
         Index Cond: (id = 1)
         Filter: ((create_time >= '2020-12-01 00:00:00'::timestamp without time zone) AND (create_time <= '2021-03-01 00:00:00'::timestamp without time zone))
   ->  Index Scan using old_202101_pkey on old_202101  (cost=0.42..8.45 rows=1 width=40) (actual time=0.040..0.040 rows=0 loops=1)
         Index Cond: (id = 1)
         Filter: ((create_time >= '2020-12-01 00:00:00'::timestamp without time zone) AND (create_time <= '2021-03-01 00:00:00'::timestamp without time zone))
   ->  Index Scan using old_202102_pkey on old_202102  (cost=0.42..8.45 rows=1 width=39) (actual time=0.018..0.019 rows=0 loops=1)
         Index Cond: (id = 1)
         Filter: ((create_time >= '2020-12-01 00:00:00'::timestamp without time zone) AND (create_time <= '2021-03-01 00:00:00'::timestamp without time zone))
   ->  Index Scan using old_202103_pkey on old_202103  (cost=0.42..8.45 rows=1 width=40) (actual time=0.022..0.022 rows=0 loops=1)
         Index Cond: (id = 1)
         Filter: ((create_time >= '2020-12-01 00:00:00'::timestamp without time zone) AND (create_time <= '2021-03-01 00:00:00'::timestamp without time zone))
 Planning Time: 0.773 ms
 Execution Time: 0.202 ms
(15 rows)

数据

postgres=# select count(*) from old;
  count  
---------
 1162055
(1 row)    

 

 

参考资料:

https://github.com/digoal/blog/blob/master/201901/20190131_01.md


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM