repmgr+pg12構建高可用集群（3）

本文轉載自查看原文 2020-04-24 23:24 884 數據庫

1、前面搭建好了簡單的repmgr集群，這時查看集群和repmgr服務狀態，可知repmgrd並未運行

[postgres@localhost bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=192.168.101.9 port=5432 user=postgres  dbname=postgres
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=192.168.101.7 port=5432 user=postgres  dbname=postgres

[postgres@localhost bin]$ ./repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd     | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+-------------+-----+---------+--------------------
 1  | node1 | primary | * running |          | not running | n/a | n/a     | n/a
 2  | node2 | standby |   running | node1    | not running | n/a | n/a     | n/a

2、修改repmgr.conf參數

vim /etc/repmgr/12/repmgr.conf
failover='automatic'  
promote_command='/usr/pgsql-12/bin/repmgr standby promote' 
follow_command='/usr/pgsql-12/bin/repmgr standby follow'

failover參數有兩個
automatic：表示開啟故障自動切換
manual：不開啟故障自動切換

不開啟故障自動切換，備機檢測到主機故障后的日志如下，可以看到備機不會自動升級為主機

[2020-04-24 22:49:17] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 22:49:17] [WARNING] unable to reconnect to node 1 after 6 attempts [2020-04-24 22:49:17] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate, and will not follow the new primary [2020-04-24 22:49:17] [DETAIL] "failover" is set to "manual" in repmgr.conf [2020-04-24 22:49:17] [HINT] manually execute "repmgr standby follow" to have this node follow the new primary [2020-04-24 22:49:17] [INFO] follower node awaiting notification from a candidate node [2020-04-24 22:50:17] [WARNING] no notification received from new primary after 60 seconds

3、此時開啟集群repmgrd進程

主備機bin目錄下執行：
./repmgrd -d

4、開啟后查看服務狀態

[postgres@localhost bin]$ ./repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 11558 | no      | n/a
 2  | node2 | standby |   running | node1    | running | 10818 | no      | 0 second(s) ago

5、此時模擬主機故障，備機日志如下

[2020-04-24 23:14:02] [INFO] monitoring connection to upstream node "node1" (ID: 1)
[2020-04-24 23:14:38] [WARNING] unable to ping "host=192.168.101.9 port=5432 user=postgres  dbname=postgres"
[2020-04-24 23:14:38] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:14:38] [WARNING] unable to connect to upstream node "node1" (ID: 1)
[2020-04-24 23:14:38] [INFO] checking state of node 1, 1 of 6 attempts
[2020-04-24 23:14:38] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2020-04-24 23:14:38] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:14:38] [INFO] sleeping 10 seconds until next reconnection attempt
[2020-04-24 23:14:48] [INFO] checking state of node 1, 2 of 6 attempts
[2020-04-24 23:14:48] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2020-04-24 23:14:48] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:14:48] [INFO] sleeping 10 seconds until next reconnection attempt
[2020-04-24 23:14:58] [INFO] checking state of node 1, 3 of 6 attempts
[2020-04-24 23:14:58] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2020-04-24 23:14:58] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:14:58] [INFO] sleeping 10 seconds until next reconnection attempt
[2020-04-24 23:15:08] [INFO] checking state of node 1, 4 of 6 attempts
[2020-04-24 23:15:08] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2020-04-24 23:15:08] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:15:08] [INFO] sleeping 10 seconds until next reconnection attempt
[2020-04-24 23:15:18] [INFO] checking state of node 1, 5 of 6 attempts
[2020-04-24 23:15:18] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2020-04-24 23:15:18] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:15:18] [INFO] sleeping 10 seconds until next reconnection attempt
[2020-04-24 23:15:28] [INFO] checking state of node 1, 6 of 6 attempts
[2020-04-24 23:15:28] [WARNING] unable to ping "user=postgres dbname=postgres host=192.168.101.9 port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2020-04-24 23:15:28] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-04-24 23:15:28] [WARNING] unable to reconnect to node 1 after 6 attempts
[2020-04-24 23:15:28] [INFO] 0 active sibling nodes registered
[2020-04-24 23:15:28] [INFO] primary node  "node1" (ID: 1) and this node have the same location ("default")
[2020-04-24 23:15:28] [INFO] no other sibling nodes - we win by default
[2020-04-24 23:15:28] [NOTICE] this node is the only available candidate and will now promote itself
[2020-04-24 23:15:28] [INFO] promote_command is:
  "/usr/pgsql-12/bin/repmgr standby promote"
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
[2020-04-24 23:15:29] [INFO] 0 followers to notify
[2020-04-24 23:15:29] [INFO] switching to primary monitoring mode
[2020-04-24 23:15:29] [NOTICE] monitoring cluster primary "node2" (ID: 2)

可知備機正確升級為主機提供服務

6、查看集群狀態

[postgres@localhost bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------
 1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=192.168.101.9 port=5432 user=postgres  dbname=postgres
 2  | node2 | primary | * running |          | default  | 100      | 2        | host=192.168.101.7 port=5432 user=postgres  dbname=postgres

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 repmgr+pg12構建高可用集群（1） repmgr+pg12構建高可用集群（2） repmgr+pg12集群，掛掉的主機如何手動加入集群構建高可用ZooKeeper集群構建高可用ZooKeeper集群構建高可用ZooKeeper集群 postgresql 高可用 repmgr 的使用之一使用patroni 構建高可用的pg 數據庫工具 | PG 集群復制管理工具 repmgr lvs+keepalive構建高可用集群