postgresql-从pg_basebackup卡住说起


背景:

生产环境中在重做备机的时候

pg_basebackup -D -P -v --wal -method=steam

发现数据目录大小一直未增长,但是basebackup的进程还一直在,就夯住了,想到去看下执行pg_basebackup的前提都有哪些,以及官方文档是怎么描述的:

在备份的开始时,需要向从中拿去备份的服务器写一个检查点。尤其在没有使用选项--checkpoint=fast时,这可能需要一点时间,在其间pg_basebackup看起来处于闲置状态。

那就有可能是checkpoint的阶段卡住了

/*

* Start the actual backup

*/

PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);

if (maxrate > 0)

​ maxrate_clause = psprintf("MAX_RATE %u", maxrate);

if (verbose)

​ pg_log_info("initiating base backup, waiting for checkpoint to complete");

if (showprogress && !verbose)

{

​ fprintf(stderr, "waiting for checkpoint");

​ if (isatty(fileno(stderr)))

​ fprintf(stderr, "\r");

​ else

​ fprintf(stderr, "\n");

}

开始前会waiting for checkpoint

那什么时候会触发ckp呢?

/*

* RequestCheckpoint

* Called in backend processes to request a checkpoint

* flags is a bitwise OR of the following:

* CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.

* CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.

* CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,

* ignoring checkpoint_completion_target parameter.

* CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred

* since the last one (implied by CHECKPOINT_IS_SHUTDOWN or

* CHECKPOINT_END_OF_RECOVERY).

* CHECKPOINT_WAIT: wait for completion before returning (otherwise,

* just signal checkpointer to do it, and return).

* CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling.

* (This affects logging, and in particular enables CheckPointWarning.)

*/

  • 库关闭的时候

  • pg_basebackup

  • 达到checkpoint_timeout

  • 达到checkpoint_completion_target 和max_wal_size的时候

  • 手动checkpoint

调度模式的ckp就需要参数的限制来做,如果此时没有自动完成检查点,pg_basebackup就卡住了,

为了立即开始备份,这里手动在主节点上执行checkpoint,发现数据目录大小就开始增长了。

ckp之后会发生什么:

  • 脏数据落盘
  • 发生之后此次checkpoint之前的wal都可以清理

ckp的相关参数:

postgres=# select name,short_desc from pg_settings where name like '%checkpoint%' 
;
             name             |                                        short_desc                                        
------------------------------+------------------------------------------------------------------------------------------
 checkpoint_completion_target | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
 checkpoint_flush_after       | Number of pages after which previously performed writes are flushed to disk.
 checkpoint_timeout           | Sets the maximum time between automatic WAL checkpoints.
 checkpoint_warning           | Enables warnings if checkpoint segments are filled more frequently than this.
 log_checkpoints              | Logs each checkpoint.
(5 rows)

checkpoint_completion_target:

由于每5分钟或达到每个max_wal_size阈值都会发生一次检查点,因此在检查点时间内,共享缓冲区中存在的所有脏页将被刷新到磁盘,从而导致巨大的IO。
checkpoint_completion_target来这里进行救援。
这会使刷新速度变慢,这意味着PostgreSQL应该花费checkpoint_completion_target * checkpoint_timeout的时间来写入数据。
例如,如果我的checkpoint_completion_target为0.5,并且数据库将限制写入,以便最后写入在2.5分钟后完成。

checkpoint_timeout:

自动 WAL 检查点之间的最长时间

checkpoint_flush_after:

在执行检查点时,只要写入的字节数超过checkpoint_flush_after,则尝试强制OS将这些写入操作刷到存储中。这样做将限制内核页面缓存中的脏数据量,从而减少在检查点末尾发出fsync时停顿的可能性。
此设置在某些平台上可能无效。

ckp的作用:

加快数据恢复过程,减缓服务器性能压力

pg_basebackup的参数:

如果想要不等待checkpoint直接开始备份,可以加上参数 -c, --checkpoint=fast|spread


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM