問題描述

在使用配置了熱備的 PostgreSQL 數據庫時，在執行大量事務時，尤其是一個需要插入幾千萬條數據的 insert 事務時（典型的做法是持續 insert into t select * from t;），后台 csv log 中報錯如下：

2015-07-01 13:25:29.430 CST,,,27738,,51d112c8.6c5a,1,,2015-07-01 13:25:28 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,"libpqrcv_connect, libpqwalreceiver.c:171",""
2015-07-01 13:25:29.430 CST,,,27738,,51d112c8.6c5a,2,,2015-07-01 13:25:28 CST,,0,FATAL,XX000,"could not receive data from WAL stream:FATAL:  requested WAL segment 0000000800002A0000000000 has already been removed
",,,,,,,,"libpqrcv_receive, libpqwalreceiver.c:389",""

問題分析

根據報錯信息分析，推測是主庫大事務產生了大量 xlog，這是因為 PostgreSQL 在執行事務過程中，直到提交時才會發送到備庫。

由於該事務需要執行的時間過長，超過了 checkpoint 的默認間隔，所以導致有的 xlog 還未發送到備庫卻被 remove 掉了。

解決方法

要解決該問題，一般可用的方案有：

方法一：調大參數 wal_keep_segments 的值

將 GUC 參數 wal_keep_segments 設大一些，比如設置為2000，而每個 segment 默認值為16MB，就相當於有 32000MB，那么，最多可保存 30GB 的 xlog ，超過則刪除最早的 xlog 。

不過，該方法並不能從根本上解決該問題。畢竟，在生產環境中或TPCC等測試灌數時，如果某條事務需要插入幾十億條記錄，有可能還是會出現該問題。

方法二：啟用歸檔

歸檔，就是將未發送到備庫的 xlog 備份到某個目錄下，待重啟數據庫時再將其恢復到備庫中去。

GUC 參數設置示例如下：

主庫的 postgresql.conf 文件中：

wal_level = hot_standby
archive_mode = on
archive_command = 'rsync -zaq %p postgres@pg-slave:/var/lib/pgsql/wal_restore/%f && test ! -f /var/lib/pgsql/backup/wal_archive/%f && cp %p /var/lib/pgsql/backup/wal_archive/'
archive_timeout = 300
max_wal_senders = 5
wal_keep_segments = 0

備庫的 postgresql.conf 文件中：

wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f /var/lib/pgsql/backup/wal_archive/%f && cp -i %p /var/lib/pgsql/backup/wal_archive/%f < /dev/null'
hot_standby = on
wal_keep_segments = 1

備庫的 recovery.conf 文件中：

standby_mode = 'on'
primary_conninfo = 'host=pg-master port=5432 user=replicator'
restore_command = 'cp /var/lib/psql/wal_restore/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/wal_restore/ %r'

方法三：啟用 replication slot（PG 9.4 開始支持）

該方法是根本解決方法，不會造成xlog的丟失。也就是說，在 xlog 被拷貝到從庫之前，主庫不會刪除。

啟用方法：

在 postgresql.conf 中添加：

max_replication_slots = 2000

在拷貝到備庫之前，主庫要創建一個 slot：

postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
  slot_name  | xlog_position
-------------+---------------
 node_a_slot |

postgres=# SELECT * FROM pg_replication_slots;
  slot_name  | slot_type | datoid | database | active | xmin | restart_lsn
-------------+-----------+--------+----------+--------+------+-------------
 node_a_slot | physical  |        |          | f      |      |
(1 row)

在備庫的 recovery.conf 文件中添加一行：

standby_mode = 'on'
primary_conninfo = 'host=192.168.4.225 port=19000 user=wslu password=xxxx'
primary_slot_name = 'node_a_slot'

參考

https://www.postgresql.org/docs/9.4/static/runtime-config-replication.html

https://www.postgresql.org/docs/9.4/static/warm-standby.html#CASCADING-REPLICATION
http://blog.2ndquadrant.com/postgresql-9-4-slots/

http://grokbase.com/t/postgresql/pgsql-general/13654jchy3/trouble-with-replication

http://stackoverflow.com/questions/28201475/how-do-i-fix-a-postgresql-9-3-slave-that-cannot-keep-up-with-the-master

歡迎關注我的微信公眾號【數據庫內核】：分享主流開源數據庫和存儲引擎相關技術。

標題	網址
GitHub	https://dbkernel.github.io
知乎	https://www.zhihu.com/people/dbkernel/posts
思否（SegmentFault）	https://segmentfault.com/u/dbkernel
掘金	https://juejin.im/user/5e9d3ed251882538083fed1f/posts
開源中國（oschina）	https://my.oschina.net/dbkernel
博客園（cnblogs）	https://www.cnblogs.com/dbkernel

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 報錯記錄:getOutputStream() has already been called for this response uniapp屬性插值報錯Interpolation inside attributes has been removed. Use v-bind or the colon shorthand instead. spring mvc處理http請求報錯:java.lang.IllegalStateException: getInputStream() has already been called for this request getOutputStream() has already been called for this response spark-shell啟動報錯：Yarn application has already ended! It might have been killed or unable to launch application master 使用redis作為調度中心的celery時啟動多個queue,報錯Probably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database webpack4 使用CommonsChunkPlugin遇到 webpack.optimize.CommonsChunkPlugin has been removed, please use config.optimization.splitChunks instead.問題 Bad state: Stream has already been listened to. 異常處理：getReader() has already been called for this request getWriter() has already been called for this response 的解決辦法