好久沒寫博客了,都長草了。新業務上了5.7沒遇到什么問題,雖然沒遇到什么問題,但不代表沒有問題,我有個習慣就是沒事就喜歡逛逛percona的Blog,於是看到目前最新GA版本5.7.17的2個bug,於是就搭建環境進行bug復現。目前知道的2個bug如下:
1. slave_parallel_workers > 0,也就是開啟了多線程復制的時候如果有延時,那么Seconds_Behind_Master一直是0,不會變化,雖然這個參數不准確,但也是一個衡量指標。准確的復制延時判斷的請看我前面的文章:主從復制延時判斷
2. super_read_only開啟的時候mysql庫中的gtid_executed表會壓縮失敗,至於這個表是干嘛的請參考文章:MySQL 5.7中新增的表gtid_executed,看看是否解決了你的痛點,原文作者是姜承堯,但原作者的連接打不開了。
環境:5.7.17, 1主2從,下面進行第一個bug的復現,其中一個從庫是普通復制,也就是沒開啟多線程復制,另外一個從庫開啟多線程復制。
首先用sysbench寫入100w數據,然后在主庫進行delete操作,模擬延時,然后查看區別。
sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 --mysql-user=root --mysql-socket=/data/mysql/3306/mysqltmp/mysql.sock --mysql-password=123 --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex prepare
普通復制:
mysql> show variables like '%parallel%'; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | slave_parallel_type | DATABASE | | slave_parallel_workers | 0 | +------------------------+----------+ 2 rows in set (0.00 sec) mysql>
多線程復制:
mysql> show variables like '%parallel%'; +------------------------+---------------+ | Variable_name | Value | +------------------------+---------------+ | slave_parallel_type | LOGICAL_CLOCK | | slave_parallel_workers | 8 | +------------------------+---------------+ 2 rows in set (0.02 sec)
准備查看復制延時腳本:
for i in {1..1000}; do ( mysql -uroot -p123 -h 192.168.0.20 -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print "slave_1_not-multi-threaded-repl: " $2}' & sleep 0.1 ; mysql -uroot -p123 -h 192.168.0.30 -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print "slave_2_multi-threaded-repl: " $2}' & ); sleep 1; done
讓這個腳本跑起來,然后在主庫刪除數據,看復制延時的情況。然后在主庫刪除數據:
delete from sbtest where id>100;
運行腳本,查看復制延時情況,輸出如下,可以看到開啟了多線程復制的Seconds_Behind_Master一直為0,不會變化,而普通復制則顯示延時了。
[root@dbserver-yayun-04 ~]# sh a.sh mysql: [Warning] Using a password on the command line interface can be insecure. slave_1_not-multi-threaded-repl: 103 mysql: [Warning] Using a password on the command line interface can be insecure. slave_2_multi-threaded-repl: 0 mysql: [Warning] Using a password on the command line interface can be insecure. slave_1_not-multi-threaded-repl: 104 mysql: [Warning] Using a password on the command line interface can be insecure. slave_2_multi-threaded-repl: 0 mysql: [Warning] Using a password on the command line interface can be insecure. slave_1_not-multi-threaded-repl: 105 mysql: [Warning] Using a password on the command line interface can be insecure. slave_2_multi-threaded-repl: 0 mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. slave_1_not-multi-threaded-repl: 106 slave_2_multi-threaded-repl: 0
Percona給的解決方法是:
SELECT PROCESSLIST_TIME FROM performance_schema.threads WHERE NAME = 'thread/sql/slave_worker' AND (PROCESSLIST_STATE IS NULL or PROCESSLIST_STATE != 'Waiting for an event from Coordinator') ORDER BY PROCESSLIST_TIME DESC LIMIT 1;
下面進行super_read_only開啟以后觸發bug的復現:
1. 其中一個從庫設置gtid_executed_compression_period=1,用來控制每執行多少個事務,對此表進行壓縮,默認值為1000
2. super_read_only開啟,超級用戶都無法更改從庫的數據。
3. 關閉log_slave_updates,如果開啟,gtid_executed表不會實時變更,也不會壓縮。(percona博客中開啟了log_slave_updates也觸發了bug,我認為是博客中有錯誤)
mysql> show variables like '%gtid_ex%'; +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | gtid_executed_compression_period | 1 | +----------------------------------+-------+ 1 row in set (0.01 sec) mysql> show variables like '%log_slave_updates%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | log_slave_updates | OFF | +-------------------+-------+ 1 row in set (0.00 sec) mysql> show variables like '%super%'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | super_read_only | ON | +-----------------+-------+ 1 row in set (0.00 sec) mysql>
下面在主庫運行sysbench進行壓測,產生事務。
sysbench --test=oltp --oltp-table-size=100000 --oltp-read-only=off --init-rng=on --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 --mysql-user=root --mysql-socket=/data/mysql/3306/mysqltmp/mysql.sock --mysql-password=123 --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex run
查看從庫:
mysql> select count(*) from gtid_executed; +----------+ | count(*) | +----------+ | 93 | +----------+ 1 row in set (0.44 sec) mysql> select count(*) from gtid_executed; +----------+ | count(*) | +----------+ | 113 | +----------+ 1 row in set (0.66 sec) mysql>
可以發現並沒有壓縮,一直在增加。
執行show engine innodb status可以看到有線程在壓縮表的,但是沒成功,在回滾
---TRANSACTION 10909611, ACTIVE 2 sec rollback mysql tables in use 1, locked 1 ROLLING BACK 4 lock struct(s), heap size 1136, 316 row lock(s) MySQL thread id 1, OS thread handle 140435435284224, query id 0 Compressing gtid_executed table
查看INNODB_TRX表,也能發現有事務在回滾。
mysql> select trx_id,trx_state,trx_operation_state,trx_isolation_level from information_schema.INNODB_TRX; +-----------------+--------------+---------------------+---------------------+ | trx_id | trx_state | trx_operation_state | trx_isolation_level | +-----------------+--------------+---------------------+---------------------+ | 10919604 | ROLLING BACK | rollback | REPEATABLE READ | | 421910840085200 | RUNNING | starting index read | REPEATABLE READ | +-----------------+--------------+---------------------+---------------------+ 2 rows in set (0.00 sec)
看見現在表已經有很多記錄了:
mysql> select count(*) from gtid_executed; +----------+ | count(*) | +----------+ | 2448 | +----------+ 1 row in set (0.00 sec) mysql>
關閉super_read_only
mysql> set global super_read_only=0; Query OK, 0 rows affected (0.00 sec) mysql> select count(*) from gtid_executed; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.07 sec) mysql> select count(*) from gtid_executed; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec) mysql>
馬上恢復正常了。
參考文章:
https://www.percona.com/blog/2017/02/08/mysql-super_read_only-bugs/
https://www.percona.com/blog/2017/01/27/wrong-seconds_behind_master-with-slave_parallel_workers-0/
http://keithlan.github.io/2017/02/15/gtid_practice/?utm_source=tuicool&utm_medium=referral