【環境介紹】
系統環境:騰訊雲 5.7.18 + pt-online-schema-change 3.0.5
【情況描述】
使用 pt-online-schema-change對表添加字段操作報錯:
腳本:
[18:07:19][root@ipa2 shengchan]# cat pt_add_column.sh
[18:07:19]#!/bin/bash
[18:07:19]table=$1
[18:07:19]alter_conment=$2
[18:07:19]
[18:07:19]cnn_host='10.1.xx.xx'
[18:07:19]cnn_user='monitor'
[18:07:19]cnn_pwd='xxxx'
[18:07:19]cnn_db='xx'
[18:07:19]
[18:07:19]echo "$table"
[18:07:19]echo "$alter_conment"
[18:07:19]pt-online-schema-change --charset=utf8 --no-version-check --user=${cnn_user} --password=${cnn_pwd} --host=${cnn_host} P=3306,D=${cnn_db},t=$table --alter "${alter_conment}" --alter-foreign-keys-method=auto --no-check-replication-filters --execute
執行:
[18:07:31][root@ipa2 shengchan]# sh pt_add_column.sh X_XXXXXX_MESSAGE "ADD COLUMN FBUSINESS_ID varchar(64) NULL COMMENT '業務id' AFTER FVISIBLE_RANGE,ADD COLUMN FBUSINESS_TYPE varchar(30) NULL COMMENT '業務類型' AFTER FBUSINESS_ID"
[18:07:31]X_XXXXXX_MESSAGE
[18:07:31]ADD COLUMN FBUSINESS_ID varchar(64) NULL COMMENT '業務id' AFTER FVISIBLE_RANGE,ADD COLUMN FBUSINESS_TYPE varchar(30) NULL COMMENT '業務類型' AFTER FBUSINESS_ID
[18:07:31]Cannot connect to A=utf8,P=20127,h=,p=...,u=monitor
[18:07:31]Cannot connect to A=utf8,P=20121,h=,p=...,u=monitor
[18:07:31]No slaves found. See --recursion-method if host TENCENT64.site has slaves.
[18:07:31]Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
[18:07:31]XXeration, tries, wait:
[18:07:31] analyze_table, 10, 1
[18:07:31] cXXy_rows, 10, 0.25
[18:07:31] create_triggers, 10, 1
[18:07:31] drXX_triggers, 10, 1
[18:07:31] swap_tables, 10, 1
[18:07:31] update_foreign_keys, 10, 1
[18:07:32]Child tables:
[18:07:32] `XX`.`X_XXXX_GROUP_MESSAGE_XXXXX` (approx. 5906 rows)
[18:07:32] `XX`.`X_XXXX_MESSAGE_XXX_WAY` (approx. 17489 rows)
[18:07:32]Will automatically choose the method to update foreign keys.
[18:07:32]Altering `XX`.`X_XXXX_MESSAGE`...
[18:07:32]Creating new table...
[18:07:32]Created new table XX._X_XXXX_MESSAGE_new OK.
[18:07:32]Altering new table...
[18:07:32]Altered `XX`.`_X_XXXX_MESSAGE_new` OK.
[18:07:32]2020-10-14T18:06:28 Creating triggers...
[18:07:32]2020-10-14T18:06:28 Created triggers OK.
[18:07:32]2020-10-14T18:06:28 CXXying approximately 1740449 rows...
[18:07:32]Cannot connect to A=utf8,P=20127,h=,p=...,u=monitor
[18:07:32]Cannot connect to A=utf8,P=20121,h=,p=...,u=monitor
......
[18:07:43]Cannot connect to A=utf8,P=20127,h=,p=...,u=monitor
[18:07:43]Cannot connect to A=utf8,P=20121,h=,p=...,u=monitor
[18:07:43]^C# Exiting on SIGINT.
[18:07:43]Not drXXping triggers because the tool was interrupted. To drXX the triggers, execute:
[18:07:43]DRXX TRIGGER IF EXISTS `XX`.`pt_osc_XX_X_XXXX_MESSAGE_del`
[18:07:43]DRXX TRIGGER IF EXISTS `XX`.`pt_osc_XX_X_XXXX_MESSAGE_upd`
[18:07:43]DRXX TRIGGER IF EXISTS `XX`.`pt_osc_XX_X_XXXX_MESSAGE_ins`
[18:07:43]Not drXXping the new table `XX`.`_X_XXXX_MESSAGE_new` because the tool was interrupted. To drXX the new table, execute:
[18:07:43]DRXX TABLE IF EXISTS `XX`.`_X_XXXX_MESSAGE_new`;
[18:07:43]`XX`.`X_XXXX_MESSAGE` was not altered.
[18:07:44][root@ipa2 shengchan]#
【情況分析】
從上面信息查看,沒有檢測連接到兩個端口的數據庫
通過命令查看數據庫從庫的信息,與連接的信息一致:
(monitor@xx.xx.xx.xx) [(none)]> show slave hosts;
+-----------+------+-------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+------+-------+-----------+--------------------------------------+
| 51567 | | 20127 | 51565 | 46439a24-0273-11eb-96a5-a4fa76fdf28e |
| 2180 | | 20121 | 51565 | 78e1111b-0273-11eb-bea1-6c92bf48d2a0 |
+-----------+------+-------+-----------+--------------------------------------+
查看其它文章這個報錯,說是主從的用戶密碼一致,經過驗證用戶,權限都沒有問題(從庫只讀開啟,嚴格把控,出現用戶密碼不一致問題可能性非常小)。
查看OSC資料涉及嘗試查找從庫的參數信息,查看到--recursion-method方式發現從庫:
從這里查看建議使用processlist方式查找從庫,hosts方式不可靠
添加參數--recursion-method=processlist即可;
【注意事項】
注意:重新操作時注意一開始后面提示的信息,顯示的是工具沒有進行刪除觸發器跟新創建的表,需要手工刪除觸發器跟新創建的表(確認好信息操作,防止誤操作)。
查看觸發器:
select TRIGGER_SCHEMA,TRIGGER_NAME,EVENT_OBJECT_TABLE from triggers where EVENT_OBJECT_TABLE='';