【MySQL】你以為設置了並行復制就降低延遲了?這個你絕對想不到!


在MySQL官方版本中,為了保證其的高可用性,一般情況我們會采用主從復制的方式來解決。當然,方法很多。而我們今天所要處理的是采用GTID方式並且開了多線程復制后,仍然延遲的情況,糟糕的是,延遲還在不斷擴大!

  • 環境概要
  • 序號 清單 說明
    1 系統 Redhat 6.x(4c,32g)
    2 數據庫 MySQL-5.7.25
    3 同步方式 基於GTID主從同步

  • 環境排查
  • 1)已經配置的重要參數:

    
    #relay for slave
    slave_parallel_type             = LOGICAL_CLOCK
    slave_parallel_workers          = 6
    master_info_repository          = TABLE
    relay_log_info_repository       = TABLE 
    relay_log_recovery              = on
    sync_relay_log                  = 10000
    

    注:此時沒有設置slave_preserve_commit_order參數。

    2)從庫延遲狀態查詢

    
    mysql> show slave status\G
    *************************** 1. row ***************************
                   Slave_IO_State: Queueing master event to the relay log
                      Master_Host: xxx.xxx.xxx.xxx
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.008978
              Read_Master_Log_Pos: 696914605
                   Relay_Log_File: DB41-relay-bin.001259 Relay_Log_Pos: 207377582 Relay_Master_Log_File: mysql-bin.008970 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: neteagle3 Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1068770059 Relay_Log_Space: 8425484286 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 187358 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 42 Master_UUID: eab7fcac-3cda-11e6-ada8-fa163e648db2 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: eab7fcac-3cda-11e6-ada8-fa163e648db2:58031191-59927276 Executed_Gtid_Set: eab7fcac-3cda-11e6-ada8-fa163e648db2:1-58080239:58080241 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) 
    簡單介紹一下幾個指標信息:

    1. Master_Log_File
    2. Read_Master_Log_Pos
    3. Seconds_Behind_Master
    4. Relay_Log_File
    5. Relay_Log_Pos
    6. Relay_Master_Log_File
    7. Exec_Master_Log_Pos

    Master_Log_File, Read_Master_Log_Pos
    這兩個參數是成對的,表示的是從庫IO thread傳輸主庫的binlog日志號及具體位置。

    Relay_Log_File, Relay_Log_Pos
    這兩個參數也是成對的,表示的是從庫sql thread應用中繼日志(relay log)號及具體位置。

    Relay_Master_Log_File, Relay_Log_Pos
    這兩個參數也是成對的,表示的是上一項中的中繼日志對應的主庫binlog日志及具體位置(有點繞)。

    Seconds_Behind_Master
    此參數可簡單理解為主從延遲時間,單位為秒。

    從上面這段MySQL從狀態信息中,我們可以看到,Seconds_Behind_Master: 187358這是從庫sql應用延遲主庫的時間為187358秒,轉換成天,大概兩天多。這說明,我們從庫復制的數據是兩天前的。

    3)驗證並行復制

    
    mysql> show full processlist;
     ---- ------------- -------------- ----------- ------------------ -------- --------------------------------------------------------------- ----------------------- 
    | Id | User        | Host         | db        | Command          | Time   | State                                                         | Info                  |
     ---- ------------- -------------- ----------- ------------------ -------- --------------------------------------------------------------- ----------------------- 
    |  1 | system user |              | NULL      | Connect          |  18204 | Waiting for master to send event                              | NULL                  |
    |  2 | system user |              | NULL      | Connect          |      0 | Waiting for dependent transaction to commit                   | NULL                  |
    |  3 | system user |              | NULL      | Connect          | 154914 | System lock                                                   | NULL                  |
    |  4 | system user |              | NULL      | Connect          | 154914 | Waiting for an event from Coordinator                         | NULL                  |
    |  5 | system user |              | NULL      | Connect          | 154918 | Waiting for an event from Coordinator                         | NULL                  |
    |  6 | system user |              | NULL      | Connect          | 155525 | Waiting for an event from Coordinator                         | NULL                  |
    |  7 | system user |              | NULL      | Connect          | 180427 | Waiting for an event from Coordinator                         | NULL                  |
    |  8 | system user |              | NULL      | Connect          |  18204 | Waiting for an event from Coordinator                         | NULL                  |
    | 10 | root        | localhost    | neteagle3 | Query            |      0 | starting                                                      | show full processlist |
    | 11 | repl        | DBSlave:9683 | NULL      | Binlog Dump GTID |  18156 | Master has sent all binlog to slave; waiting for more updates | NULL                  |
    | 13 | root        | localhost    | neteagle3 | Sleep            |   4962 |                                                               | NULL                  |
     ---- ------------- -------------- ----------- ------------------ -------- --------------------------------------------------------------- ----------------------- 
    
    
    mysql> select * from  performance_schema.replication_applier_status_by_worker ;
     -------------- ----------- ----------- --------------- ----------------------------------------------- ------------------- -------------------- ---------------------- 
    | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                         | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP |
     -------------- ----------- ----------- --------------- ----------------------------------------------- ------------------- -------------------- ---------------------- 
    |              |         1 |        51 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240805 |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         2 |        52 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240210 |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         3 |        53 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80235089 |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         4 |        54 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80191268 |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         5 |        55 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:75296683 |                 0 |                    | 0000-00-00 00:00:00  |
    |              |         6 |        56 | ON            |                                               |                 0 |                    | 0000-00-00 00:00:00  |
     -------------- ----------- ----------- --------------- ----------------------------------------------- ------------------- -------------------- ---------------------- 
    6 rows in set (0.00 sec)
    通過本條查詢,可以看到開啟了6個並行進行復制。
    

  • 細找瓶頸?
  • 通過以上來看,所有一切似乎都正常,並行復制開了,並且查看到CPU,IO,內存均沒有達到瓶頸地步。主庫寫binlog日志大概為2MB/s,這樣的日質量並不是非常高。

    從庫也同樣查看了是否存在鎖的情況,也沒有發現。

    因此反復的在查看slave狀態,看能否發現一些細節,結果還真看到了一些異常現象。Relay_Log_Pos這個參數在頻繁的刷slave狀態時,發現時常會卡着不動(此時已確認沒有看到鎖)。或許問題真正的原因正在這里!

  • 順藤摸瓜!
  • 分析binlog或者relay log日志,看有啥線索:
    
    [mysql@xxx data]$ mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS  relay-bin.001384 --start-position=420090430|more
    
              
              
              
                      
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    # at 420090430 #190923 9:24:28 server id 42 end_log_pos 420090282 CRC32 0xd9097eaf GTID last_committed=57148 sequence_number=57149 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'eab7fcac-3cda-11e6-ada8-fa163e648db2:69415610'/*!*/; \# at 420090495 #190923 9:24:28 server id 42 end_log_pos 420090364 CRC32 0x82b57dfd Query thread_id=95 exec_time=0 error_code=0 SET TIMESTAMP=1569201868/*!*/; SET @@session.pseudo_thread_id=95/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1075838976/*!*/; SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/; /*!\C gbk *//*!*/; SET @@session.character_set_client=28,@@session.collation_connection=28,@@session.collation_server=8/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; BEGIN /*!*/; \# at 420090577 #190923 9:24:28 server id 42 end_log_pos 420090585 CRC32 0x752e27cf Table_map: `net`.`f_event` mapped to number 108 \# at 420090798 #190923 9:24:28 server id 42 end_log_pos 420090812 CRC32 0x72b8e10d Table_map: `net`.`f_eventstorage` mapped to number 245 \# at 420091025 #190923 9:24:28 server id 42 end_log_pos 420091039 CRC32 0x1797f9d8 Table_map: `net`.`f_eventstorage` mapped to number 245 \# at 420091252 #190923 9:24:28 server id 42 end_log_pos 420091106 CRC32 0x8af14ad2 Table_map: `net`.`f_eventdetail` mapped to number 243 \# at 420091319 #190923 9:24:28 server id 42 end_log_pos 420091177 CRC32 0xf1ce87c8 Table_map: `net`.`f_eventoperation` mapped to number 244 \# at 420091390 #190923 9:24:28 server id 42 end_log_pos 420091244 CRC32 0x586c0b9d Table_map: `net`.`f_eventaudit` mapped to number 242 \# at 420091457 #190923 9:24:28 server id 42 end_log_pos 420093382 CRC32 0x505e5408 Update_rows: table id 108 \# at 420093595 #190923 9:24:28 server id 42 end_log_pos 420098858 CRC32 0x0f404509 Update_rows: table id 245 \# at 420099071 #190923 9:24:28 server id 42 end_log_pos 420098910 CRC32 0xb8d9ed15 Write_rows: table id 243 \# at 420099123 #190923 9:24:28 server id 42 end_log_pos 420098966 CRC32 0x3c489a7f Write_rows: table id 244 flags: STMT_END_F 

    我們查看了中繼日志relay-bin.001384卡住的位置號420090430,為設置GTID_NEXT,信息沒什么用。

    繼續看在卡住時刻,數據庫open的是什么表?

    
    mysql>   show open tables where In_use=1;
     ----------- --------------------- -------- ------------- 
    | Database  | Table               | In_use | Name_locked |
     ----------- --------------------- -------- ------------- 
    | net       | f_currentxxx        |      1 |           0 |
     ----------- --------------------- -------- ------------- 
    

    這張表有什么特殊的么?查看其表結構

    
    mysql>         show create table net.f_currentxxx\G
    *************************** 1. row ***************************
           Table: f_currentxxx
    Create Table: CREATE TABLE `f_currentxxx` (
      `serial` int(20) NOT NULL COMMENT 'xxx',
      `audittime` bigint(20) NOT NULL COMMENT 'xxx',
      `type` int(11) DEFAULT NULL COMMENT 'xxx',
      `severity` int(11) DEFAULT NULL COMMENT 'xxx,
      KEY `audittime` (`audittime`)
    ) ENGINE=MEMORY DEFAULT CHARSET=gbk COMMENT='xxx'
    1 row in set (0.00 sec)
    

    有沒有看到什么與眾不同?

    沒錯,就是表的存儲引擎ENGINE=MEMORY。MEMORY的表進行主從復制,首先來說如果從庫不做查詢,一點意義沒有,另外對Memory表做復制,性能是非常堪憂的。如果必須進行同步,考慮將表的存儲引擎改為InnoDB

    
    mysql> select table_name from information_schema.tables where TABLE_SCHEMA='net' and ENGINE='memory';
    
     ---------------------- 
    | table_name           |
     ---------------------- 
    | f_currentxxx         |
     ---------------------- 
    1 row in set, 6 warnings (0.01 sec)
    出於嚴謹,我們將要復制的數據庫進行徹底排查,確實只有這一張表是Memory存儲引擎。
    

  • 措施
  • 停止復制進程,將選項中添加```replicate-ignore-table=net.f_currentxxx```,重新啟動復制進程,觀察slave狀態。
    
    mysql> show slave status\G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: xxx.xxx.xxx.xxx
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.009194
              Read_Master_Log_Pos: 939698255
                   Relay_Log_File: relay-bin.001964
                    Relay_Log_Pos: 444060572
            Relay_Master_Log_File: mysql-bin.009027
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: net
              Replicate_Ignore_DB: mysql
               Replicate_Do_Table: 
           Replicate_Ignore_Table: net.f_currentxxx Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 444060359 Relay_Log_Space: 180287882098 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 179221 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 42 Master_UUID: eab7fcac-3cda-11e6-ada8-fa163e648db2 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: eab7fcac-3cda-11e6-ada8-fa163e648db2:69497322-107886661 Executed_Gtid_Set: 1264a536-da12-11e9-81ea-005056856ba5:1, eab7fcac-3cda-11e6-ada8-fa163e648db2:1-71980857 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) 

    我們可以看到net.f_currentxxx表已經被忽略復制。持續觀察一段時間后,Seconds_Behind_Master在逐漸縮小。
    中繼日志應用速度大約5分鍾一個(每個中繼日志為1GB大小),而主庫binlog日志大約為10分鍾一個(每個binlog日志為1GB大小)。

  • 總結
  • 在梳理了整個處理過程后,其實難度不高,主要是要細心,細心去排查每一個想到的點。 在非輕量級的數據庫中問題發生的概率也會隨着量級的增加而增多。而這恰恰是能夠磨練個人的成長。
    同時,知識的儲備也要充足,這是進階高手的必要前提!


    免責聲明!

    本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



     
    粵ICP備18138465號   © 2018-2025 CODEPRJ.COM