昨天不少同學討論《小心,前方有雷 —— sql_slave_skip_counter》,有說作者在玩文字游戲,扯了那么多sql_slave_skip_counter=1不還是跳過一個事務嘛。自己看了幾遍原文,好像是那么回事,但又沒明白slave_exec_mode參數如何影響。只能說一百個讀者有一百種見解,甚至隨着讀者的切入點、知識的變化而改變。
計划用兩篇文章寫寫跳過復制錯誤相關的三個參數sql_slave_skip_counter、slave_skip_errors、slave_exec_mode
一、基本環境
VMware10.0+CentOS6.9+MySQL5.7.19
ROLE | HOSTNAME | BASEDIR | DATADIR | IP | PORT |
Master | ZST1 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.132 | 3306 |
Slave | ZST2 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.133 | 3306 |
基於Row+Position搭建的一主一從異步復制結構:Master->{Slave}
二、sql_slave_skip_counter官方解釋
https://dev.mysql.com/doc/refman/5.7/en/set-global-sql-slave-skip-counter.html
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.SET GLOBAL sql_slave_skip_counter = N
When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.
• For transactional tables, an event group corresponds to a transaction.
• For nontransactional tables, an event group corresponds to a single SQL statement.
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.
三、測試案例
從官方解釋我們知道, sql_slave_skip_counter以event為單位skip,直到skip完第N個event所在的event group才停止。對於事務表,一個event group對應一個事務;對於非事務表,一個event group對應一條SQL語句。一個event group包含多個events。
這里我只針對顯式事務模擬insert遇到Duplicate entry(1062錯誤),知道了問題本質,delete/update中的1032錯誤類似去分析
3.1、測試數據
主庫創建一個事務表和一個非事務表,然后從庫往各表寫入id=1的記錄

# 主庫創建測試表 mydba@192.168.85.132,3306 [replcrash]> create table repl_innodb(id int primary key,name1 char(10),name2 char(10)) engine=innodb; mydba@192.168.85.132,3306 [replcrash]> create table repl_myisam(id int primary key,name1 char(10),name2 char(10)) engine=myisam; # 從庫往測試表中添加數據,不記入binlog mydba@192.168.85.133,3306 [replcrash]> set sql_log_bin=0; mydba@192.168.85.133,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'s1062-1','s1062-1'); mydba@192.168.85.133,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'s1062-1','s1062-1'); mydba@192.168.85.133,3306 [replcrash]> set sql_log_bin=1;
3.2、transactional tables
主庫往事務表中添加數據

# 主庫往事務表中添加數據 mydba@192.168.85.132,3306 [replcrash]> begin; mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> commit; mydba@192.168.85.132,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
很明顯,從庫先寫入數據占用id=1,主庫再寫入數據,復制將主庫id=1的寫入記錄傳遞到從庫,造成從庫key沖突(1062錯誤)
我們嘗試使用sql_slave_skip_counter跳過錯誤(實際遇到1062寫入key沖突,我們應該根據 Duplicate entry 刪除從庫對應記錄)

# 從庫跳過“1個”錯誤,並啟動sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | +----+---------+---------+
從庫不僅跳過了id=1的記錄,還跳過了id=2的記錄
分析:主庫上的begin..commit之間對事務表的操作記錄為一個事務,對應一個event group。id=1應用於從庫遇到Duplicate entry錯誤,我們使用sql_slave_skip_counter跳過這個event之后,還在此group內,需要繼續跳過此group中的后續events。因此在從庫不會有id=2的記錄~

[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=1869 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 1869 #171201 10:15:11 server id 1323306 end_log_pos 1934 CRC32 0x3a86cd44 Anonymous_GTID last_committed=5 sequence_number=6 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 1934 #171201 10:14:43 server id 1323306 end_log_pos 2011 CRC32 0x83c239df Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512094483/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1436549152/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; ==================== repl_innodb表寫入id=1、2的記錄,在一個事務中Start ==================== BEGIN /*!*/; # at 2011 #171201 10:14:43 server id 1323306 end_log_pos 2076 CRC32 0x0f3612fe Table_map: `replcrash`.`repl_innodb` mapped to number 263 # at 2076 #171201 10:14:43 server id 1323306 end_log_pos 2132 CRC32 0x01de5dbd Write_rows: table id 263 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 2132 #171201 10:14:50 server id 1323306 end_log_pos 2197 CRC32 0xf838b054 Table_map: `replcrash`.`repl_innodb` mapped to number 263 # at 2197 #171201 10:14:50 server id 1323306 end_log_pos 2253 CRC32 0xbd9ae02a Write_rows: table id 263 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 2253 #171201 10:15:11 server id 1323306 end_log_pos 2284 CRC32 0x0292df6a Xid = 60 COMMIT/*!*/; ==================== repl_innodb表寫入id=1、2的記錄,在一個事務中End ==================== SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ZST1 logs]#
3.3、nontransactional tables
主庫往非事務表中添加數據

# 主庫往非事務表中添加數據 mydba@192.168.85.132,3306 [replcrash]> begin; mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> commit; mydba@192.168.85.132,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
同理,從庫先寫入數據占用id=1,主庫再寫入數據,復制將主庫id=1的寫入記錄傳遞到從庫,造成從庫key沖突(1062錯誤)
我們嘗試使用sql_slave_skip_counter跳過錯誤(實際遇到1062寫入key沖突,我們應該根據 Duplicate entry 刪除從庫對應記錄)

# 從庫跳過“1個”錯誤,並啟動sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
從庫跳過了id=1的記錄,但復制了id=2的記錄
分析:主庫上的begin..commit之間對非事務表的操作記錄為多個事務,每一條SQL語句對應一個event group。id=1應用於從庫遇到Duplicate entry錯誤,我們使用sql_slave_skip_counter跳過這個event之后,已經到了此group的末尾。SQL thread直接從下一個event group開始,這里就是repl_myisam.id=2的那條語句。因此在從庫會有id=2的記錄~
實際它在執行第一條insert語句后,從庫就報1062錯誤;前面的transactional tables需要在事務commit后從庫才報錯

[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=2284 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 2284 #171201 10:30:31 server id 1323306 end_log_pos 2349 CRC32 0x5d208979 Anonymous_GTID last_committed=6 sequence_number=7 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 2349 #171201 10:30:31 server id 1323306 end_log_pos 2426 CRC32 0xe4ce4da8 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095431/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1436549152/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; ==================== repl_myisam表寫入id=1的記錄Start ==================== BEGIN /*!*/; # at 2426 #171201 10:30:31 server id 1323306 end_log_pos 2491 CRC32 0x76a45e15 Table_map: `replcrash`.`repl_myisam` mapped to number 261 # at 2491 #171201 10:30:31 server id 1323306 end_log_pos 2547 CRC32 0xd187097a Write_rows: table id 261 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 2547 #171201 10:30:31 server id 1323306 end_log_pos 2625 CRC32 0xc8210551 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095431/*!*/; COMMIT /*!*/; # at 2625 ==================== repl_myisam表寫入id=1的記錄End ==================== #171201 10:30:44 server id 1323306 end_log_pos 2690 CRC32 0x22b268fd Anonymous_GTID last_committed=7 sequence_number=8 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 2690 #171201 10:30:44 server id 1323306 end_log_pos 2767 CRC32 0x43061ce5 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095444/*!*/; ==================== repl_myisam表寫入id=2的記錄Start ==================== BEGIN /*!*/; # at 2767 #171201 10:30:44 server id 1323306 end_log_pos 2832 CRC32 0xe1c084b9 Table_map: `replcrash`.`repl_myisam` mapped to number 261 # at 2832 #171201 10:30:44 server id 1323306 end_log_pos 2888 CRC32 0x56bacb73 Write_rows: table id 261 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 2888 #171201 10:30:44 server id 1323306 end_log_pos 2966 CRC32 0x6527c3b6 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095444/*!*/; COMMIT /*!*/; ==================== repl_myisam表寫入id=2的記錄End ==================== SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ZST1 logs]#
3.4、一個事務中包含事務表和非事務表操作
為了方便,我將表中數據置為初始狀態,主庫兩表為空,從庫兩表各有id=1的記錄
主庫往事務表和非事務表中添加數據

# 主庫往事務表、非事務表中添加數據 mydba@192.168.85.132,3306 [replcrash]> begin; mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> commit; mydba@192.168.85.132,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+ mydba@192.168.85.132,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
根據前面的分析,我們知道從庫上的repl_innodb、repl_myisam表都存在key沖突(1062錯誤)
我們嘗試使用sql_slave_skip_counter跳過錯誤(實際遇到1062寫入key沖突,我們應該根據 Duplicate entry 刪除從庫對應記錄)

# 從庫跳過“1個”錯誤,並啟動sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | +----+---------+---------+ mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
從庫repl_innodb表暫時沒有操作;從庫repl_myisam表跳過id=1的記錄,復制了id=2的記錄
注意:此時跳過的是replcrash.repl_myisam上的Duplicate entry錯誤,對於非事務表一條SQL語句對應一個event group,SQL thread直接從下一個event group開始,這里就是repl_myisam.id=2的那條語句。因此從庫repl_myisam表會有id=2的記錄~
緊接着復制又會報key沖突(1062錯誤),因為還有repl_innodb.id=1這個key,我們繼續跳過

# 從庫跳過“1個”錯誤,並啟動sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | +----+---------+---------+ mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
從庫repl_innodb表跳過id=1的記錄,還跳過了id=2的記錄;從庫repl_myisam表暫時沒有操作
主庫上執行的語句明明是先insert repl_innodb,再insert repl_myisam,為什么sql_slave_skip_counter是先跳過repl_myisam表上的錯誤,再跳過repl_innodb上的錯誤?
這就要從事務表和非事務表的區別去分析,主庫顯式的在一個事務中操作事務表+非事務表,實際上所有對事務表的操作是在同一個顯式事務中;所有對非事務表的操作,每條SQL語句單獨對應一個事務。因此主庫上的操作可理解成下面操作:
開啟顯式事務1,往repl_innodb表寫入id=1、2兩條記錄-->開啟事務2,往repl_myisam表寫入id=1記錄,提交事務2-->開啟事務3,往repl_myisam表寫入id=2記錄,提交事務3-->提交顯式事務1
當事務2提交后,從庫報repl_myisam上的Duplicate entry錯誤;我們跳過這個錯誤,當事務3提交后,從庫寫入repl_myisam.id=2的記錄;當事務1提交后,從庫報repl_innodb上的Duplicate entry錯誤;我們再跳過這個錯誤,復制就正常了~
我們看下對應的binlog

[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=2966 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 2966 #171201 10:54:02 server id 1323306 end_log_pos 3031 CRC32 0x9a009a72 Anonymous_GTID last_committed=8 sequence_number=9 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 3031 #171201 10:54:02 server id 1323306 end_log_pos 3108 CRC32 0x9738837a Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096842/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1436549152/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; ==================== repl_myisam表寫入id=1的記錄Start ==================== BEGIN /*!*/; # at 3108 #171201 10:54:02 server id 1323306 end_log_pos 3173 CRC32 0x8c4283c5 Table_map: `replcrash`.`repl_myisam` mapped to number 265 # at 3173 #171201 10:54:02 server id 1323306 end_log_pos 3229 CRC32 0xd8953aae Write_rows: table id 265 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 3229 #171201 10:54:02 server id 1323306 end_log_pos 3307 CRC32 0x218fdb23 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096842/*!*/; COMMIT /*!*/; # at 3307 ==================== repl_myisam表寫入id=1的記錄End ==================== #171201 10:54:36 server id 1323306 end_log_pos 3372 CRC32 0x0ba119ad Anonymous_GTID last_committed=9 sequence_number=10 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 3372 #171201 10:54:36 server id 1323306 end_log_pos 3449 CRC32 0x9bcdeee5 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096876/*!*/; ==================== repl_myisam表寫入id=2的記錄Start ==================== BEGIN /*!*/; # at 3449 #171201 10:54:36 server id 1323306 end_log_pos 3514 CRC32 0xd52491e6 Table_map: `replcrash`.`repl_myisam` mapped to number 265 # at 3514 #171201 10:54:36 server id 1323306 end_log_pos 3570 CRC32 0x23bcd75d Write_rows: table id 265 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 3570 #171201 10:54:36 server id 1323306 end_log_pos 3648 CRC32 0x3ba9a1a1 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096876/*!*/; COMMIT /*!*/; # at 3648 ==================== repl_myisam表寫入id=2的記錄End ==================== #171201 10:54:41 server id 1323306 end_log_pos 3713 CRC32 0x122cdb79 Anonymous_GTID last_committed=10 sequence_number=11 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 3713 #171201 10:53:22 server id 1323306 end_log_pos 3790 CRC32 0x68d45d7b Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096802/*!*/; ==================== repl_innodb表寫入id=1、2的記錄,在一個事務中Start ==================== BEGIN /*!*/; # at 3790 #171201 10:53:22 server id 1323306 end_log_pos 3855 CRC32 0xf4359a8d Table_map: `replcrash`.`repl_innodb` mapped to number 264 # at 3855 #171201 10:53:22 server id 1323306 end_log_pos 3911 CRC32 0x9975aac8 Write_rows: table id 264 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 3911 #171201 10:53:30 server id 1323306 end_log_pos 3976 CRC32 0xc5ac7f71 Table_map: `replcrash`.`repl_innodb` mapped to number 264 # at 3976 #171201 10:53:30 server id 1323306 end_log_pos 4032 CRC32 0x1ad72c78 Write_rows: table id 264 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 4032 #171201 10:54:41 server id 1323306 end_log_pos 4063 CRC32 0x4f265b37 Xid = 83 COMMIT/*!*/; ==================== repl_innodb表寫入id=1、2的記錄,在一個事務中End ==================== SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ZST1 logs]#
可以看到binlog和我們的分析一致,也從側面說明binlog是按事務提交順序寫入的(Redo按事務發生順序寫入)
3.5、N應該設多大
# 主庫往非事務表中添加數據 mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1');
我們習慣性在skip錯誤的時候,將N設成1。潛移默化地認為1就是跳過一個錯誤,管它一個event、一個event group、一條SQL語句、一個事務,反正它就是跳過去了。
根據官方解釋,這條對非事務表的insert操作語句,對應一個event group,它里面實際有多個events,而不是只有一個event!
使用set global sql_slave_skip_counter=1,跳過一個event,由於它還在event group中,它會繼續跳過此group中的后續events!如果只看表象的話,還真以為它在binlog中只有一個event(⊙_⊙)
可以用下面方法驗證,它不只對應一個event

# 從庫設置sql_slave_skip_counter mydba@192.168.85.133,3306 [replcrash]> stop slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=100; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; # 主庫往非事務表中添加數據(主庫空表,從庫存在id=1的記錄) mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1'); # 從庫查看Skip_Counter計數 mydba@192.168.85.133,3306 [replcrash]> pager grep Skip_Counter; mydba@192.168.85.133,3306 [replcrash]> show slave status\G Skip_Counter: 95
在從庫設置跳過100個events,然后在主庫執行這條語句,再到從庫查看show slave status\G返回的Skip_Counter列,你會發現它並不是由100變成99,至於一條語句到底對應多少個events,得自行腦補●-●
不要以為在3.4中set global sql_slave_skip_counter=3;就能跳過repl_myisam.id=1、repl_myisam.id=2、repl_innodb.id=1對應的三條SQL語句
四、總結
寫了那么多,感覺也在玩文字游戲。 sql_slave_skip_counter以event為單位skip,直到skip完第N個event所在的event group才停止。對於事務表,一個event group對應一個事務;對於非事務表,一個event group對應一條SQL語句。一個event group包含多個events。
delete在從庫找不到對應行,sql_slave_skip_counter可能省事,但極有可能跳過其他events,導致主從數據不一致。對於1032、1062錯誤盡量修補數據,讓復制進程在從庫應用變更。