2021-09-10 17:22:42.417183T @ startup 00000 [2021-09-10 17:22:42 CST] 0 [9298] LOCATION: StartupXLOG, xlog.c:6347 2021-09-10 17:22:42.417206T @ startup XX000 [2021-09-10 17:22:42 CST] 0 [9298] FATAL: XX000: required WAL directory "pg_wal" does not exist 2021-09-10 17:22:42.417206T @ startup XX000 [2021-09-10 17:22:42 CST] 0 [9298] LOCATION: ValidateXLOGDirectoryStructure, xlog.c:4262 2021-09-10 17:22:42.417407T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG: 00000: startup process (PID 9298) exited with exit code 1 2021-09-10 17:22:42.417407T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION: LogChildExit, postmaster.c:3714 2021-09-10 17:22:42.417417T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG: 00000: aborting startup due to startup process failure 2021-09-10 17:22:42.417417T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION: reaper, postmaster.c:2969 2021-09-10 17:22:42.427171T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG: 00000: database system is shut down 2021-09-10 17:22:42.427171T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION: UnlinkLockFiles, miscinit.c:928
執行pg_resetwal -f PGDATA可以重新初始化wal文件,但是會丟失事務日志以及數據不一致,因為可能有full checkpoint之前的數據丟失,極端情況下某些數據塊丟失。此時初始化WAL文件如下:
[zjh@lightdb1 pgsql13.2]$ cd data/pg_wal/ [zjh@lightdb1 pg_wal]$ ll total 1048576 -rw------- 1 zjh zjh 1073741824 Sep 10 21:44 00000001000000BB00000001 drwx------ 2 zjh zjh 6 Sep 10 21:42 archive_status
再啟動PG,備份、重建。
具體會丟失多少數據,可以通過pg_controldata輸出中的latest checkpoint確認。
如果因為wal_size設置的比較大,希望刪除歷史wal的話,可以通過pg_archivecleanup清理latest checkpoint之前的wal日志,如下:
pg_archivecleanup /data1/zjh/coordinator/pg_wal/ 000000010000000900000023
清理000000010000000900000023之前的wal文件。
確實,比他小的沒有了,但是問題在於之前的日志都還沒刪除。