一、mysql group replication 生來就要面對兩個問題:
一、主節點宕機如何恢復。
二、多數節點離線的情況下、余下節點如何繼續承載業務。
在這里我們只討論第一個問題、也就是說當主結點宕機之后、我們怎么把它從新加入到高可用集群中去。這個問題又可以細分成
兩種情況:
1、溫和打擊:主結點的數據還在、宕機期間集群中的其它結點的binlog日志也都還在
這種情況下重新啟動mysql group replication 就可修復問題。
2、毀滅打擊:主結點的數據都沒有了
這種情況下要從其余結點備份恢復宕機結點、然后再重啟mysql group replication 就可修復問題。
詳細的修復步驟請看后面的例子
二、環境介紹:
環境簡介
主機名 ip地址 mgr角色
mtls17 10.186.19.17 primary
mtls18 10.186.19.18 seconde
mtls19 10.186.19.19 seconde
集群狀態:
mysql> select * from replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE | | group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE | | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member'; +----------------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d | +----------------------------------+--------------------------------------+ 1 row in set (0.00 sec)
說明:
由上面的信息可以看出mtls17上的mysql為集群當前的primary結點、並且集群的各結點的狀態正常。
三、情況下的故障模擬 + 解決:
1、模擬mtls17結點宕機
ps -ef | grep mysql mysql 24125 1 0 00:04 ? 00:00:14 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf root 26125 22481 0 00:36 pts/0 00:00:00 grep --color=auto mysql [root@mtls17 data]# kill -9 24125 [root@mtls17 data]# ps -ef | grep mysql root 26128 22481 0 00:37 pts/0 00:00:00 grep --color=auto mysql
2、查看余下兩個結點的情況
mysql> melect * from replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE | | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member'; +----------------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e | +----------------------------------+--------------------------------------+ 1 row in set (0.00 sec)
由上面可以看出在mtls17結點上的mysql被kill掉之后、余下的兩個結點組成了新的集群、並且mtls18上的mysql
成為了primary
3、解決primary宕機恢復的問題
systemctl start mysql [root@mtls17 data]# mysql -uroot -pmtls0352 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.20-log MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> start group_replication; Query OK, 0 rows affected (4.03 sec) mysql>
4、檢查問題是否正常解決
select * from replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE | | group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE | | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member'; +----------------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e | +----------------------------------+--------------------------------------+ 1 row in set (0.00 sec)
總論:之前的主結點在宕機之后、通過重啟服務、重啟mysql-group-replication成功的解決了問題。
四、模擬primary結點上的數據已經丟失的情況下、如果恢復結點:
1、退出服務、刪除數據
[root@mtsl18 ~]# ps -ef | grep mysql mysql 10843 1 0 00:04 ? 00:00:19 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf root 13290 9197 0 00:50 pts/0 00:00:00 grep --color=auto mysql [root@mtsl18 ~]# kill -9 10843 [root@mtsl18 ~]# rm -rf /database/mysql/data/3306 [root@mtsl18 ~]# ps -ef | grep mysql root 13339 9197 0 00:50 pts/0 00:00:00 grep --color=auto mysql
這個實驗是接着情況一做下去的、所以primary在mtls18上、所以我們在mtls18上做退出服務、刪除數據的動作
2、查看集群的狀態:
mysql> select * from replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE | | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member'; +----------------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d | +----------------------------------+--------------------------------------+ 1 row in set (0.01 sec)
說明:當mtls18宕機后primary就從mtls18切到了mtls17上去了
3、通過meb備份mtls19用於還原宕機的mtls18
mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp \ --host=localhost --user=root --password=mtls0352 \ --backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging \ backup-to-image MySQL Enterprise Backup version 4.1.0 Linux-2.6.39-400.215.10.el5uek-x86_64 [2017/03/01] Copyright (c) 2003, 2017, Oracle and/or its affiliates. All Rights Reserved. 171201 01:01:36 MAIN INFO: A thread created with Id '140141436434240' 171201 01:01:36 MAIN INFO: Starting with following command line ... mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp --host=localhost --user=root --password=xxxxxxxx --backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging backup-to-image 171201 01:01:36 MAIN INFO: 171201 01:01:36 MAIN INFO: MySQL server version is '5.7.20-log' ....... ........ 171201 01:01:40 MAIN INFO: Full Image Backup operation completed successfully. 171201 01:01:40 MAIN INFO: Backup image created successfully. 171201 01:01:40 MAIN INFO: Image Path = /tmp/2017-12-01T12:30:00.mbi 171201 01:01:40 MAIN INFO: MySQL binlog position: filename mysql-bin.000002, position 1082 ------------------------------------------------------------- Parameters Summary ------------------------------------------------------------- Start LSN : 2609664 End LSN : 2610075 ------------------------------------------------------------- mysqlbackup completed OK!
4、傳輸備份到mtls18
scp /tmp/2017-12-01T12:30:00.mbi mtls18:/tmp/
5、還原備份
mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi \ > --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ \ > copy-back-and-apply-log MySQL Enterprise Backup version 4.1.0 Linux-2.6.39-400.215.10.el5uek-x86_64 [2017/03/01] Copyright (c) 2003, 2017, Oracle and/or its affiliates. All Rights Reserved. 171201 01:09:59 MAIN INFO: A thread created with Id '140530650736448' 171201 01:09:59 MAIN INFO: Starting with following command line ... mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ copy-back-and-apply-log 171201 01:09:59 MAIN INFO: IMPORTANT: Please check that mysqlbackup run completes successfully. ..... ..... 171201 01:10:08 PCR1 INFO: The first data file is '/database/mysql/data/3306/ibdata1' and the new created log files are at '/database/mysql/data/3306/' 171201 01:10:08 MAIN INFO: MySQL server version is '5.7.20-log' 171201 01:10:08 MAIN INFO: Restoring ...5.7.20-log version 171201 01:10:08 MAIN INFO: Apply-log operation completed successfully. 171201 01:10:08 MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK!
6、重啟mtls18上的mysql
[root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/3306 [root@mtsl18 tmp]# systemctl start mysql [root@mtsl18 tmp]# ps -ef | grep mysql mysql 14205 1 24 01:11 ? 00:00:01 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf root 14237 9197 0 01:11 pts/0 00:00:00 grep --color=auto mysql
7、重啟mysql group replication
mysql -uroot -pmtls0352 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.20-log MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> reset master; Query OK, 0 rows affected (0.10 sec) mysql> reset slave; Query OK, 0 rows affected (0.00 sec) mysql> set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ; Query OK, 0 rows affected (0.10 sec) mysql> set sql_log_bin=1; Query OK, 0 rows affected (0.00 sec) mysql> change master to -> master_user='mgr_usr', -> master_password='mgr10352' -> for channel 'group_replication_recovery'; Query OK, 0 rows affected, 2 warnings (0.21 sec) mysql> start group_replication; Query OK, 0 rows affected (3.46 sec)
8、檢查集群的狀態是否正常
mysql> select * from replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE | | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE | | group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.01 sec) mysql> show global status like 'group_replication_primary_member'; +----------------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d | +----------------------------------+--------------------------------------+ 1 row in set (0.01 sec)
五、總結:
對於兩種primary宕故障的修復總結:
1、數據沒有丟、binlog日志也沒有丟 那直接重啟mysql group replication 就行、它會自動修復問題。
2、數據丟失的情況、先備份還原-->重啟mysql group replication 就行。
對於mysql group replication 維護操作復雜性的總結:
總的來說mysql group replication 對dba還是比較友好的、幾個小小的操作就能恢復故障的集群。
六、我寫的關於mysql group replication 的相關文章
1、mysql group replication 安裝與配置詳解:http://www.cnblogs.com/JiangLe/p/6727281.html#3849996
2、mysql group replication 在mysql-5.7.20版本下的可用性報告:http://www.cnblogs.com/JiangLe/p/7809229.html
3、mysql group replication 主節宕機點恢復 https://i.cnblogs.com/EditPosts.aspx?postid=7941929
4、mysql group replication 多數據結點丟失情況下的恢復
5、我寫的全自動化安裝mysql-group-replication 開源工具 https://github.com/Neeky/mysqltools
----