群里好幾位同學問 pt-table-checksum 3.0.4, 主從兩個表數據是不一致,為啥檢測不出來?前段時間自己也測試過,只是沒整理成隨筆^_-
一、基本環境
VMware10.0+CentOS6.9+MySQL5.7.19
ROLE | HOSTNAME | BASEDIR | DATADIR | IP | PORT |
Master | ZST1 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.132 | 3306 |
Slave | ZST2 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.133 | 3306 |
基於Row+Gtid搭建的一主一從復制結構:Master->Slave
二、構造差異數據
借助樣例數據庫sakila做測試

# 主庫flush logs mydba@192.168.85.132,3306 [sakila]> flush logs; # 主庫開啟general_log [root@ZST1 ~]# rm -rf /data/mysql/mysql3306/data/mysql-general.log mydba@192.168.85.132,3306 [sakila]> set global general_log_file='/data/mysql/mysql3306/data/mysql-general.log'; mydba@192.168.85.132,3306 [sakila]> set global general_log =1; mydba@192.168.85.132,3306 [sakila]> show variables like 'general_log%'; # 從庫修改部分數據,造成不一致 mydba@192.168.85.133,3306 [sakila]> delete from sakila.actor where actor_id<=3; # 外鍵約束刪除失敗 mydba@192.168.85.133,3306 [sakila]> update sakila.actor set last_name=first_name where actor_id<=3; # 主庫sakila.actor數據 mydba@192.168.85.132,3306 [sakila]> select * from sakila.actor limit 3; +----------+------------+-----------+---------------------+ | actor_id | first_name | last_name | last_update | +----------+------------+-----------+---------------------+ | 1 | PENELOPE | GUINESS | 2006-02-15 04:34:33 | | 2 | NICK | WAHLBERG | 2006-02-15 04:34:33 | | 3 | ED | CHASE | 2006-02-15 04:34:33 | +----------+------------+-----------+---------------------+ # 從庫sakila.actor數據 mydba@192.168.85.133,3306 [sakila]> select * from sakila.actor limit 3; +----------+------------+-----------+---------------------+ | actor_id | first_name | last_name | last_update | +----------+------------+-----------+---------------------+ | 1 | PENELOPE | PENELOPE | 2017-11-08 09:54:10 | | 2 | NICK | NICK | 2017-11-08 09:54:10 | | 3 | ED | ED | 2017-11-08 09:54:10 | +----------+------------+-----------+---------------------+
從庫修改部分數據,造成主從不一致
三、pt-table-checksum
3.1、檢測數據是否一致
pt-table-checksum可以在任何機器上執行,只要它能連接到Master就行。我是在從庫執行,最后的參數指定到主庫就行

# 運行pt-table-checksum [root@ZST2 ~]# pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --recursion-method=hosts --replicate=sakila.checksums --databases=sakila --tables=actor,city --host=192.168.85.132 --port=3306 --user=mydba --password=mysql5719 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 11-08T09:57:01 0 0 200 1 0 0.075 sakila.actor 11-08T09:57:01 0 0 600 1 0 0.034 sakila.city [root@ZST2 ~]#
DIFFS=0表示沒有差異數據。實際上主從數據不一致,這里卻沒有檢測出來~
主庫得到的general-log、binlog拷貝到其他文件夾,方便后續分析

# 拷貝general-log、binlog文件 [root@ZST1 ~]# cp /data/mysql/mysql3306/data/mysql-general.log /data/backup/mysql-general.log.ptchecksum3306 [root@ZST1 ~]# cp /data/mysql/mysql3306/logs/mysql-bin.000083 /data/backup/mysql-bin.000083.ptchecksum3306
3.2、查看general-log

[root@ZST1 ~]# cat /data/backup/mysql-general.log.ptchecksum3306 /usr/local/mysql/bin/mysqld, Version: 5.7.19-log (MySQL Community Server (GPL)). started with: Tcp port: 3306 Unix socket: /tmp/mysql3306.sock Time Id Command Argument 2017-11-08T01:57:01.750917Z 20 Connect mydba@192.168.85.133 on using TCP/IP 2017-11-08T01:57:01.751564Z 20 Query set autocommit=1 2017-11-08T01:57:01.752220Z 20 Query SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout' 2017-11-08T01:57:01.757028Z 20 Query SET SESSION innodb_lock_wait_timeout=1 2017-11-08T01:57:01.757521Z 20 Query SHOW VARIABLES LIKE 'wait\_timeout' 2017-11-08T01:57:01.760950Z 20 Query SET SESSION wait_timeout=10000 2017-11-08T01:57:01.761400Z 20 Query SELECT @@SQL_MODE 2017-11-08T01:57:01.761772Z 20 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/ 2017-11-08T01:57:01.762140Z 20 Query SELECT @@server_id /*!50038 , @@hostname*/ 2017-11-08T01:57:01.762475Z 20 Query SELECT @@SQL_MODE 2017-11-08T01:57:01.762772Z 20 Query SET SQL_MODE=',NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION' 2017-11-08T01:57:01.763098Z 20 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2017-11-08T01:57:01.763504Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.766949Z 20 Query SELECT @@SERVER_ID 2017-11-08T01:57:01.767470Z 20 Query SHOW SLAVE HOSTS 2017-11-08T01:57:01.787329Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.790712Z 20 Query SELECT @@SERVER_ID 2017-11-08T01:57:01.794388Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.797637Z 20 Query SELECT @@SERVER_ID 2017-11-08T01:57:01.801356Z 20 Query SHOW DATABASES LIKE 'sakila' 2017-11-08T01:57:01.802164Z 20 Query CREATE DATABASE IF NOT EXISTS `sakila` /* pt-table-checksum */ 2017-11-08T01:57:01.802951Z 20 Query USE `sakila` 2017-11-08T01:57:01.803300Z 20 Query SHOW TABLES FROM `sakila` LIKE 'checksums' 2017-11-08T01:57:01.806111Z 20 Query CREATE TABLE IF NOT EXISTS `sakila`.`checksums` ( db CHAR(64) NOT NULL, tbl CHAR(64) NOT NULL, chunk INT NOT NULL, chunk_time FLOAT NULL, chunk_index VARCHAR(200) NULL, lower_boundary TEXT NULL, upper_boundary TEXT NULL, this_crc CHAR(40) NOT NULL, this_cnt INT NOT NULL, master_crc CHAR(40) NULL, master_cnt INT NULL, ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (db, tbl, chunk), INDEX ts_db_tbl (ts, db, tbl) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 2017-11-08T01:57:01.825908Z 20 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 2017-11-08T01:57:01.828793Z 20 Query SELECT CONCAT(@@hostname, @@port) 2017-11-08T01:57:01.844001Z 20 Query SELECT CRC32('test-string') 2017-11-08T01:57:01.844518Z 20 Query SELECT CRC32('a') 2017-11-08T01:57:01.845025Z 20 Query SELECT CRC32('a') 2017-11-08T01:57:01.845517Z 20 Query SHOW VARIABLES LIKE 'wsrep_on' 2017-11-08T01:57:01.849157Z 20 Query SHOW DATABASES 2017-11-08T01:57:01.850038Z 20 Query SHOW /*!50002 FULL*/ TABLES FROM `sakila` 2017-11-08T01:57:01.851486Z 20 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ 2017-11-08T01:57:01.851943Z 20 Query USE `sakila` 2017-11-08T01:57:01.852408Z 20 Query SHOW CREATE TABLE `sakila`.`actor` 2017-11-08T01:57:01.853034Z 20 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 2017-11-08T01:57:01.854092Z 20 Query EXPLAIN SELECT * FROM `sakila`.`actor` WHERE 1=1 2017-11-08T01:57:01.857374Z 20 Query USE `sakila` 2017-11-08T01:57:01.857990Z 20 Query DELETE FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' 2017-11-08T01:57:01.877626Z 20 Query USE `sakila` 2017-11-08T01:57:01.878413Z 20 Query EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`, convert(`first_name` using utf8mb4), convert(`last_name` using utf8mb4), UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`actor` /*explain checksum table*/ 2017-11-08T01:57:01.879347Z 20 Query REPLACE INTO `sakila`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sakila', 'actor', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`, convert(`first_name` using utf8mb4), convert(`last_name` using utf8mb4), UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`actor` /*checksum table*/ 2017-11-08T01:57:01.881166Z 20 Query SHOW WARNINGS 2017-11-08T01:57:01.881764Z 20 Query SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1' 2017-11-08T01:57:01.897051Z 20 Query UPDATE `sakila`.`checksums` SET chunk_time = '0.001821', master_crc = '6816983c', master_cnt = '200' WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1' 2017-11-08T01:57:01.900914Z 20 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 2017-11-08T01:57:01.930534Z 20 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ 2017-11-08T01:57:01.931387Z 20 Query USE `sakila` 2017-11-08T01:57:01.932194Z 20 Query SHOW CREATE TABLE `sakila`.`city` 2017-11-08T01:57:01.933399Z 20 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ 2017-11-08T01:57:01.935136Z 20 Query EXPLAIN SELECT * FROM `sakila`.`city` WHERE 1=1 2017-11-08T01:57:01.940169Z 20 Query USE `sakila` 2017-11-08T01:57:01.941026Z 20 Query DELETE FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'city' 2017-11-08T01:57:01.942010Z 20 Query USE `sakila` 2017-11-08T01:57:01.943012Z 20 Query EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `city_id`, convert(`city` using utf8mb4), `country_id`, UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`city` /*explain checksum table*/ 2017-11-08T01:57:01.945033Z 20 Query REPLACE INTO `sakila`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sakila', 'city', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `city_id`, convert(`city` using utf8mb4), `country_id`, UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`city` /*checksum table*/ 2017-11-08T01:57:01.960088Z 20 Query SHOW WARNINGS 2017-11-08T01:57:01.960938Z 20 Query SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'city' AND chunk = '1' 2017-11-08T01:57:01.961674Z 20 Query UPDATE `sakila`.`checksums` SET chunk_time = '0.015889', master_crc = '4d700c4', master_cnt = '600' WHERE db = 'sakila' AND tbl = 'city' AND chunk = '1' 2017-11-08T01:57:01.964712Z 20 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 2017-11-08T01:57:01.972503Z 20 Quit [root@ZST1 ~]#
general-log邏輯
• 設置SESSION選項
• 創建checksums數據表
• 針對每一張需要check的表執行下面操作
• DELETE:從checksums表中刪除sakila的記錄
• EXPLAIN:分析計算sakila的this_cnt,this_crc的執行計划
• REPLACE INTO:計算sakila的this_cnt,this_crc
• UPDATE:使用this_cnt,this_crc更新master_crc,master_cnt
在主庫上這些以SQL語句的形式執行,且執行時沒有設置SESSION的日志格式為STATEMENT,主庫的binlog_format='ROW',所以binlog里記錄的是語句的最終執行結果(具體的數值,而非SQL語句)
3.3、查看binlog

[root@ZST1 ~]# mysqlbinlog -v --base64-output=decode-rows /data/backup/mysql-bin.000083.ptchecksum3306
binlog邏輯是:首先創建checksums數據表,然后delete->insert->update checksums 具體數值
主庫上最后一個update語句

SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1' UPDATE `sakila`.`checksums` SET chunk_time = '0.001821', master_crc = '6816983c', master_cnt = '200' WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1'
在binlog體現為(原封不動應用到從庫)

[root@ZST1 ~]# mysqlbinlog -v --base64-output=decode-rows /data/backup/mysql-bin.000083.ptchecksum3306 ... COMMIT/*!*/; # at 1604 #171108 9:57:01 server id 1323306 end_log_pos 1669 CRC32 0x16bf0702 GTID last_committed=3 sequence_number=4 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= '8ab82362-9c37-11e7-a858-000c29c1025c:575'/*!*/; # at 1669 #171108 9:57:01 server id 1323306 end_log_pos 1743 CRC32 0xee6b7639 Query thread_id=20 exec_time=0 error_code=0 SET TIMESTAMP=1510106221/*!*/; BEGIN /*!*/; # at 1743 #171108 9:57:01 server id 1323306 end_log_pos 1823 CRC32 0x589cc01f Table_map: `sakila`.`checksums` mapped to number 248 # at 1823 #171108 9:57:01 server id 1323306 end_log_pos 1950 CRC32 0xc0604f63 Update_rows: table id 248 flags: STMT_END_F ### UPDATE `sakila`.`checksums` ### WHERE ### @1='sakila' ### @2='actor' ### @3=1 ### @4=NULL ### @5=NULL ### @6=NULL ### @7=NULL ### @8='6816983c' ### @9=200 ### @10=NULL ### @11=NULL ### @12=1510106221 ### SET ### @1='sakila' ### @2='actor' ### @3=1 ### @4=0.001821 ### @5=NULL ### @6=NULL ### @7=NULL ### @8='6816983c' ### @9=200 ### @10='6816983c' ### @11=200 ### @12=1510106221 # at 1950 #171108 9:57:01 server id 1323306 end_log_pos 1981 CRC32 0x1f197fe6 Xid = 198 COMMIT/*!*/; # at 1981
也就是說從庫不會去計算所謂的CRC32,它直接完整copy主庫的checksums的所有內容
3.4、如何解決
個人認為只有在statement格式下才能進行,因為兩邊要計算CRC32,計算完后再把主上的master_crc、master_cnt更新到從庫,最后在從庫對比master和this相關列。pt-table-checksum 3.0.4在執行時缺少SET @@binlog_format='STATEMENT',建議不要使用。
有一種很挫的方法,僅僅是為了看差異結果(生產環境勿用),執行pt-table-checksum前,在主上 set global binlog_format='STATEMENT';

# 主庫修改binlog_format為statement mydba@192.168.85.132,3306 [sakila]> set global binlog_format='STATEMENT'; # 從庫運行pt-table-checksum [root@ZST2 ~]# pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --recursion-method=hosts --replicate=sakila.checksums --databases=sakila --tables=actor,city --host=192.168.85.132 --port=3306 --user=mydba --password=mysql5719 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 11-08T12:40:27 0 1 200 1 0 0.015 sakila.actor 11-08T12:40:27 0 0 600 1 0 0.024 sakila.city [root@ZST2 ~]#
DIFFS=1,說明sakila.actor表存在差異

# 差異信息 mydba@192.168.85.133,3306 [sakila]> SELECT db,tbl,SUM(this_cnt) AS total_rows,COUNT(*) AS chunks FROM sakila.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db,tbl; +--------+-------+------------+--------+ | db | tbl | total_rows | chunks | +--------+-------+------------+--------+ | sakila | actor | 200 | 1 | +--------+-------+------------+--------+ 1 row in set (0.00 sec)
主要就是查看master_cnt、this_cnt和master_crc、this_crc
四、pt-table-sync
4.1、修復數據不一致
前面已經檢測出主從數據不一致,下面使用pt-table-sync修復數據

# 打印命令 [root@ZST2 ~]# pt-table-sync --replicate=sakila.checksums --sync-to-master h=192.168.85.133,u=mydba,p=mysql5719,P=3306 --databases=sakila --charset=utf8 --print REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('1', 'PENELOPE', 'GUINESS', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/; REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('2', 'NICK', 'WAHLBERG', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/; REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('3', 'ED', 'CHASE', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/; [root@ZST2 ~]# # 執行命令 [root@ZST2 ~]# pt-table-sync --replicate=sakila.checksums --sync-to-master h=192.168.85.133,u=mydba,p=mysql5719,P=3306 --databases=sakila --charset=utf8 --execute REPLACE statements on sakila.actor can adversely affect child table `sakila`.`film_actor` because it has an ON UPDATE CASCADE foreign key constraint. See --[no]check-child-tables in the documentation for more information. --check-child-tables error while doing sakila.actor on 192.168.85.133 [root@ZST2 ~]#
--execute就是執行打印出來的命令,REPLACE INTO實際對應delete、insert操作,由於外鍵約束delete失敗(構造差異數據時就嘗試過delete),修復不成功。
pt-table-checksum及pt-table-sync詳細說明請參考:pt-table-checksum解讀、使用pt-table-checksum及pt-table-sync校驗復制一致性