postgresql-從pg_basebackup卡住說起


背景:

生產環境中在重做備機的時候

pg_basebackup -D -P -v --wal -method=steam

發現數據目錄大小一直未增長,但是basebackup的進程還一直在,就夯住了,想到去看下執行pg_basebackup的前提都有哪些,以及官方文檔是怎么描述的:

在備份的開始時,需要向從中拿去備份的服務器寫一個檢查點。尤其在沒有使用選項--checkpoint=fast時,這可能需要一點時間,在其間pg_basebackup看起來處於閑置狀態。

那就有可能是checkpoint的階段卡住了

/*

* Start the actual backup

*/

PQescapeStringConn(conn, escaped_label, label, sizeof(escaped_label), &i);

if (maxrate > 0)

​ maxrate_clause = psprintf("MAX_RATE %u", maxrate);

if (verbose)

​ pg_log_info("initiating base backup, waiting for checkpoint to complete");

if (showprogress && !verbose)

{

​ fprintf(stderr, "waiting for checkpoint");

​ if (isatty(fileno(stderr)))

​ fprintf(stderr, "\r");

​ else

​ fprintf(stderr, "\n");

}

開始前會waiting for checkpoint

那什么時候會觸發ckp呢?

/*

* RequestCheckpoint

* Called in backend processes to request a checkpoint

* flags is a bitwise OR of the following:

* CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.

* CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.

* CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,

* ignoring checkpoint_completion_target parameter.

* CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred

* since the last one (implied by CHECKPOINT_IS_SHUTDOWN or

* CHECKPOINT_END_OF_RECOVERY).

* CHECKPOINT_WAIT: wait for completion before returning (otherwise,

* just signal checkpointer to do it, and return).

* CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling.

* (This affects logging, and in particular enables CheckPointWarning.)

*/

  • 庫關閉的時候

  • pg_basebackup

  • 達到checkpoint_timeout

  • 達到checkpoint_completion_target 和max_wal_size的時候

  • 手動checkpoint

調度模式的ckp就需要參數的限制來做,如果此時沒有自動完成檢查點,pg_basebackup就卡住了,

為了立即開始備份,這里手動在主節點上執行checkpoint,發現數據目錄大小就開始增長了。

ckp之后會發生什么:

  • 臟數據落盤
  • 發生之后此次checkpoint之前的wal都可以清理

ckp的相關參數:

postgres=# select name,short_desc from pg_settings where name like '%checkpoint%' 
;
             name             |                                        short_desc                                        
------------------------------+------------------------------------------------------------------------------------------
 checkpoint_completion_target | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
 checkpoint_flush_after       | Number of pages after which previously performed writes are flushed to disk.
 checkpoint_timeout           | Sets the maximum time between automatic WAL checkpoints.
 checkpoint_warning           | Enables warnings if checkpoint segments are filled more frequently than this.
 log_checkpoints              | Logs each checkpoint.
(5 rows)

checkpoint_completion_target:

由於每5分鍾或達到每個max_wal_size閾值都會發生一次檢查點,因此在檢查點時間內,共享緩沖區中存在的所有臟頁將被刷新到磁盤,從而導致巨大的IO。
checkpoint_completion_target來這里進行救援。
這會使刷新速度變慢,這意味着PostgreSQL應該花費checkpoint_completion_target * checkpoint_timeout的時間來寫入數據。
例如,如果我的checkpoint_completion_target為0.5,並且數據庫將限制寫入,以便最后寫入在2.5分鍾后完成。

checkpoint_timeout:

自動 WAL 檢查點之間的最長時間

checkpoint_flush_after:

在執行檢查點時,只要寫入的字節數超過checkpoint_flush_after,則嘗試強制OS將這些寫入操作刷到存儲中。這樣做將限制內核頁面緩存中的臟數據量,從而減少在檢查點末尾發出fsync時停頓的可能性。
此設置在某些平台上可能無效。

ckp的作用:

加快數據恢復過程,減緩服務器性能壓力

pg_basebackup的參數:

如果想要不等待checkpoint直接開始備份,可以加上參數 -c, --checkpoint=fast|spread


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM