背景:
生產環境中在重做備機的時候
pg_basebackup -D -P -v --wal -method=steam
發現數據目錄大小一直未增長,但是basebackup的進程還一直在,就夯住了,想到去看下執行pg_basebackup的前提都有哪些,以及官方文檔是怎么描述的:
在備份的開始時,需要向從中拿去備份的服務器寫一個檢查點。尤其在沒有使用選項
--checkpoint=fast
時,這可能需要一點時間,在其間pg_basebackup看起來處於閑置狀態。
那就有可能是checkpoint的階段卡住了
/*
* Start the actual backup
*/
PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);
if (maxrate > 0)
maxrate_clause = psprintf("MAX_RATE %u", maxrate);
if (verbose)
pg_log_info("initiating base backup, waiting for checkpoint to complete");
if (showprogress && !verbose)
{
fprintf(stderr, "waiting for checkpoint");
if (isatty(fileno(stderr)))
fprintf(stderr, "\r");
else
fprintf(stderr, "\n");
}
開始前會waiting for checkpoint
那什么時候會觸發ckp呢?
/*
* RequestCheckpoint
* Called in backend processes to request a checkpoint
* flags is a bitwise OR of the following:
* CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
* CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
* CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
* ignoring checkpoint_completion_target parameter.
* CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
* since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
* CHECKPOINT_END_OF_RECOVERY).
* CHECKPOINT_WAIT: wait for completion before returning (otherwise,
* just signal checkpointer to do it, and return).
* CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling.
* (This affects logging, and in particular enables CheckPointWarning.)
*/
-
庫關閉的時候
-
pg_basebackup
-
達到checkpoint_timeout
-
達到checkpoint_completion_target 和max_wal_size的時候
-
手動checkpoint
調度模式的ckp就需要參數的限制來做,如果此時沒有自動完成檢查點,pg_basebackup就卡住了,
為了立即開始備份,這里手動在主節點上執行checkpoint,發現數據目錄大小就開始增長了。
ckp之后會發生什么:
- 臟數據落盤
- 發生之后此次checkpoint之前的wal都可以清理
ckp的相關參數:
postgres=# select name,short_desc from pg_settings where name like '%checkpoint%'
;
name | short_desc
------------------------------+------------------------------------------------------------------------------------------
checkpoint_completion_target | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
checkpoint_flush_after | Number of pages after which previously performed writes are flushed to disk.
checkpoint_timeout | Sets the maximum time between automatic WAL checkpoints.
checkpoint_warning | Enables warnings if checkpoint segments are filled more frequently than this.
log_checkpoints | Logs each checkpoint.
(5 rows)
checkpoint_completion_target:
由於每5分鍾或達到每個max_wal_size閾值都會發生一次檢查點,因此在檢查點時間內,共享緩沖區中存在的所有臟頁將被刷新到磁盤,從而導致巨大的IO。
checkpoint_completion_target來這里進行救援。
這會使刷新速度變慢,這意味着PostgreSQL應該花費checkpoint_completion_target * checkpoint_timeout的時間來寫入數據。
例如,如果我的checkpoint_completion_target為0.5,並且數據庫將限制寫入,以便最后寫入在2.5分鍾后完成。
checkpoint_timeout:
自動 WAL 檢查點之間的最長時間
checkpoint_flush_after:
在執行檢查點時,只要寫入的字節數超過checkpoint_flush_after,則嘗試強制OS將這些寫入操作刷到存儲中。這樣做將限制內核頁面緩存中的臟數據量,從而減少在檢查點末尾發出fsync時停頓的可能性。
此設置在某些平台上可能無效。
ckp的作用:
加快數據恢復過程,減緩服務器性能壓力
pg_basebackup的參數:
如果想要不等待checkpoint直接開始備份,可以加上參數 -c, --checkpoint=fast|spread