PostgreSQL基於時間點故障恢復PITR( point-in-time recovery )


PostgreSQL在使用過程中經常會發生一些失誤的操作,但往往是可以彌補的。但是如果真遇到了無法挽回的誤操作,只能寄希望於有備份了。

 

接下來的故障恢復也是基於有備份的情況,沒有備份的情況,目前還沒有想到怎么做

 

1.首先在數據庫中配置日志歸檔

1)創建歸檔目錄

mkdir -p /var/lib/pgsql/pg10/archive/

 

2)修改postgresql.conf文件

wal_level=replica
archive_mode = on  
archive_command='test ! -f /var/lib/pgsql/9.10/archive/%f && cp %p /var/lib/pgsql/9.10/archive/%f'

 3)重啟數據庫

pg_ctl restart

 

2.對數據庫進行全量備份,這里只是為了測試,就簡單的對目錄進行拷貝即可

cp -r $PGDATA ~/pg10/full_back

 

3.對數據庫進行操作 並記錄對應的日志號

select txid_current();
 txid_current
--------------
          557
(1 row)

              now
-------------------------------
 2018-09-03 18:07:14.288787+08
(1 row)

delete first 100 tuple of run_command
DELETE 99
 count
-------
 99901
(1 row)

select txid_current();
 txid_current
--------------
          559
(1 row)

              now
-------------------------------
 2018-09-03 18:07:14.500745+08
(1 row)

delete last 100 tuple of run_command
DELETE 100
 count
-------
 99801
(1 row)

select txid_current();
 txid_current
--------------
          561
(1 row)

              now
-------------------------------
 2018-09-03 18:07:14.571154+08
(1 row)
checkpoint
CHECKPOINT
pg_switch_wal
 pg_switch_wal
---------------
 0/3005FA0
(1 row)

checkpoint
CHECKPOINT
pg_switch_wal
 pg_switch_wal
---------------
 0/40000E8
(1 row)

 

 

4.設置recovery.conf文件

restore_command = 'cp /var/lib/pgsql/pg10/archive/%f %p' 
recovery_target_xid = '557' 
recovery_target_inclusive = false    
recovery_target_timeline = 'latest'

 

5.以為恢復成功了,結果發現系統只讀,不能寫,paused! 后續補充····,日志:

-bash-4.1$ cat log/postgresql-2018-09-03_181007.log
2018-09-03 18:10:07.160 CST [850] LOG:  database system was interrupted; last known up at 2018-09-03 18:07:12 CST
2018-09-03 18:10:07.160 CST [850] LOG:  creating missing WAL directory "pg_wal/archive_status"
cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory
2018-09-03 18:10:07.431 CST [850] LOG:  starting point-in-time recovery to 2018-09-03 18:07:14.500745+08
2018-09-03 18:10:07.448 CST [850] LOG:  restored log file "000000010000000000000002" from archive
2018-09-03 18:10:07.596 CST [850] LOG:  redo starts at 0/2000028
2018-09-03 18:10:07.613 CST [850] LOG:  consistent recovery state reached at 0/2003C30
2018-09-03 18:10:07.613 CST [848] LOG:  database system is ready to accept read only connections

看其他人使用過程中遇到文件不存在時,會自動創建一個新的時間線,然后恢復完成,而他們都是用的10以前版本,可能因此造成的吧。 

 

6.經過多次分析,在data目錄的pg_wal中也沒有發現 “00000002.history”文件,於是嘗試重新回放日志,終於成功:

postgres=# select pg_wal_replay_resume();
 pg_wal_replay_resume
----------------------

(1 row)

postgres=# select pg_wal_replay_resume();
ERROR:  recovery is not in progress
HINT:  Recovery control functions can only be executed during recovery

postgres=# select count(*) from run_command ;
 count
-------
 99901
(1 row)

postgres=# insert into run_command values (1, 'test new');
INSERT 0 1
postgres=# \q

 

執行pg_wal_replay_resume()的日志:

-bash-4.1$ cat log/postgresql-2018-09-03_181007.log
2018-09-03 18:10:07.160 CST [850] LOG:  database system was interrupted; last known up at 2018-09-03 18:07:12 CST
2018-09-03 18:10:07.160 CST [850] LOG:  creating missing WAL directory "pg_wal/archive_status"
cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory
2018-09-03 18:10:07.431 CST [850] LOG:  starting point-in-time recovery to 2018-09-03 18:07:14.500745+08
2018-09-03 18:10:07.448 CST [850] LOG:  restored log file "000000010000000000000002" from archive
2018-09-03 18:10:07.596 CST [850] LOG:  redo starts at 0/2000028
2018-09-03 18:10:07.613 CST [850] LOG:  consistent recovery state reached at 0/2003C30
2018-09-03 18:10:07.613 CST [848] LOG:  database system is ready to accept read only connections
2018-09-03 18:10:07.646 CST [850] LOG:  restored log file "000000010000000000000003" from archive
2018-09-03 18:10:07.779 CST [866] LOG:  duration: 28.273 ms  statement: select count(*) from run_command
2018-09-03 18:10:07.797 CST [868] ERROR:  cannot execute INSERT in a read-only transaction
2018-09-03 18:10:07.797 CST [868] STATEMENT:  insert into run_command values(1, 'test new')
2018-09-03 18:10:07.804 CST [850] LOG:  recovery stopping before commit of transaction 560, time 2018-09-03 18:07:14.52735+08
2018-09-03 18:10:07.804 CST [850] LOG:  recovery has paused
2018-09-03 18:10:07.804 CST [850] HINT:  Execute pg_wal_replay_resume() to continue.
2018-09-03 18:10:21.263 CST [870] LOG:  duration: 0.697 ms  statement: select pg_wal_replay_resume();
2018-09-03 18:10:21.818 CST [850] LOG:  redo done at 0/3005E90
2018-09-03 18:10:21.818 CST [850] LOG:  last completed transaction was at log time 2018-09-03 18:07:14.496615+08
cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory
2018-09-03 18:10:21.886 CST [850] LOG:  selected new timeline ID: 2
cp: cannot stat `/var/lib/pgsql/pg10/archive/00000001.history': No such file or directory
2018-09-03 18:10:22.145 CST [850] LOG:  archive recovery complete
2018-09-03 18:10:22.476 CST [848] LOG:  database system is ready to accept connections
2018-09-03 18:10:22.775 CST [870] ERROR:  recovery is not in progress
2018-09-03 18:10:22.775 CST [870] HINT:  Recovery control functions can only be executed during recovery.
2018-09-03 18:10:22.775 CST [870] STATEMENT:  select pg_wal_replay_resume();

 

7.然后又嘗試了使用時間和恢復點來回放,都沒問題。

 

8.附上recovery.conf文件的配置:

在恢復過程中,用戶可以通過使用recovery.conf文件來指定恢復的各個參數,如下:

歸檔恢復設置
restore_command:用於獲取一個已歸檔段的XLOG日志文件的命令
archive_cleanup_command:清除不在需要的XLOG日志文件的命令
recovery_end_command:歸檔恢復結束后執行的命令

恢復目標設置(默認情況下,數據庫將會一直恢復到 WAL 日志的末尾)
recovery_target = ’immediate’:在從一個在線備 份中恢復時,這意味着備份結束的那個點
recovery_target_name (string):這個參數指定(pg_create_restore_point()所創建)的已命名的恢復點,將恢復到該恢復點
recovery_target_time (timestamp):這個參數指定恢復到的時間戳
recovery_target_xid (string):這個參數指定恢復到的事務 ID
recovery_target_inclusive (boolean):指定是否在指定的恢復目標之后停止(true),或者在恢復目標之前停止 (false);適用於recovery_target_time或者recovery_target_xid被指定的情況;這個設置分別控制事務是否有准確的目標提交時間或 ID 是否將被包括在該恢復中;默認值為 true
recovery_target_timeline (string):指定恢復到一個特定的時間線
recovery_target_action (enum):指定在達到恢復目標時服務器應該立刻采取的動作,包括pause(暫停)、promote(接受連接)、shutdown(停止服務器),其中pause為默認動作

備庫參數設置
standby_mode(boolean):為on表示作為一個備庫,否則不為備庫
primary_conninfo (string):指定備庫連接主庫的連接字符串
primary_slot_name (string):通過流復制指定主庫的一個復制槽來復制主庫數據,如果沒有設置primary_conninfo,則此參數無效
trigger_file (string):指定一個觸發器文件,該文件存在可以結束備庫的恢復,即升級備庫為一個獨立的主庫
recovery_min_apply_delay (integer):這個參數允許將恢復延遲一段固定的時間,如果沒有指定單位則以毫秒為單位。
如果recovery.conf中同時指定了recoveryTargetXid、recoveryTargetName、recoveryTargetTime時,PostgreSQL會按照RECOVERY_TARGET_XID> RECOVERY_TARGET_NAME > RECOVERY_TARGET_TIME的優先級來獲取最終的目標恢復位點。

如果在recovery.conf指定recovery_targetTimeLine為latest,則可以基於當前TimeLineID為起點尋找最新時間線:

尋找當前TimeLineID的時間線歷史文件“XXX.history”,如果存在則繼續尋找,否則錯誤退出
TimeLineID是線性增長的,將當前TimeLineID自增1尋找是否存在時間線歷史文件,直到不存在對應的時間線歷史文件為止,即可找到最新的時間線。

 

 

后續准備找找如何在沒有備份的情況下,恢復刪除數據。。。。。。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM