MySQL高可用復制管理工具 —— Orchestrator介紹

本文轉載自查看原文 2019-02-18 11:21 6057 Orchestrator、MHA、Replication、高可用/ MySQL/ 隨筆

背景

在MySQL高可用架構中，目前使用比較多的是Percona的PXC，Galera以及MySQL 5.7之后的MGR等，其他的還有的MHA，今天介紹另一個比較好用的MySQL高可用復制管理工具：Orchestrator（orch）。

Orchestrator（orch）：go編寫的MySQL高可用性和復制拓撲管理工具，支持復制拓撲結構的調整，自動故障轉移和手動主從切換等。后端數據庫用MySQL或SQLite存儲元數據，並提供Web界面展示MySQL復制的拓撲關系及狀態，通過Web可更改MySQL實例的復制關系和部分配置信息，同時也提供命令行和api接口，方便運維管理。相對比MHA來看最重要的是解決了管理節點的單點問題，其通過raft協議保證本身的高可用。GitHub的一部分管理也在用該工具進行管理。關於Orchestrator更詳細的介紹可以看Github的介紹，大致的特點有：

① 自動發現MySQL的復制拓撲，並且在web上展示。

② 重構復制關系，可以在web進行拖圖來進行復制關系變更。

③ 檢測主異常，並可以自動或手動恢復，通過Hooks進行自定義腳本。

④ 支持命令行和web界面管理復制。

環境：

系統：
Ubuntu 16.04

三台主機：
test1:192.168.163.131
test2:192.168.163.132
test3:192.168.163.133

后端MySQL端口：3306
測試MySQL端口：3307

安裝

根據需要下載合適的包進行安裝，本文是基於Ubuntu16.04安裝的，下載好deb包后，需要安裝jq的依賴包（apt-get install jq）。安裝完成之后，相應的目錄為：

/usr/local/orchestrator -rwxr-xr-x 1 root root  20M 1月  16 21:49 orchestrator
-rw-r--r-- 1 root root 5.1K 1月  16 21:49 orchestrator-sample.conf.json
-rw-r--r-- 1 root root 4.4K 1月  16 21:49 orchestrator-sample-sqlite.conf.json
drwxr-xr-x 7 root root 4.0K 2月  15 19:03 resources

orchestrator：應用程序

*.json：默認的配置模板

resources：orchestrator相關的文件：client、web、偽GTID等相關文件。

配置

配置的相關參數了解后，大致說明如下（可能會有偏差、后續不定時更新）：

        Debug:                                      false,  --設置debug模式
        EnableSyslog:                               false, -- 是否把日志輸出到系統日志里
        ListenAddress:                              ":3000", -- web http tpc 監聽端口
        ListenSocket:                               "", -- 套接字文件，默認空，和ListenAddress互斥。
        HTTPAdvertise:                              "", --可選，為raft設置。通知相關http信息
        AgentsServerPort:                           ":3001",--回調接口
        StatusEndpoint:                             "/api/status",--狀態查看，默認為'/ api / status'
        StatusOUVerify:                             false,--如果為true，請嘗試在Mutual TLS打開時驗證OU。 默認為false
        BackendDB:                                  "mysql",--后端數據庫類型，可選mysql或則sqlite3
        SQLite3DataFile:                            "", --sqlite3的數據文件
        SkipOrchestratorDatabaseUpdate:             false,-- 如果為true，不檢查后端數據庫模式，也不嘗試更新它。 當運行多個版本的orchestrator時很有用
        PanicIfDifferentDatabaseDeploy:             false, --如果為true，此進程發現協調器后端的數據庫由不同版本配置，則發生混亂
        RaftBind:                                   "127.0.0.1:10008",
        RaftAdvertise:                              "",
        RaftDataDir:                                "",
        DefaultRaftPort:                            10008, --如果RaftNodes未指定端口，使用此端口
        RaftNodes:                                  []string{}, --raft初始化連接
        ExpectFailureAnalysisConcensus:             true,
        MySQLOrchestratorMaxPoolConnections:        128, --限制后端數據庫的並發數
        MySQLOrchestratorPort:                      3306, --后端數據庫端口
        MySQLTopologyUseMutualTLS:                  false,--是否啟用TLS身份驗證
        MySQLTopologyUseMixedTLS:                   true,--是否混合TLS和非TLS身份驗證
        MySQLOrchestratorUseMutualTLS:              false,--是否為Orchestrator MySQL實例啟用TLS身份驗證
        MySQLConnectTimeoutSeconds:                 2,--數據庫連接超時時間，秒。
        MySQLOrchestratorReadTimeoutSeconds:        30,--讀操作超時
        MySQLDiscoveryReadTimeoutSeconds:           10,--發現查詢的超時
        MySQLTopologyReadTimeoutSeconds:            600,--發現查詢之外查詢的超時
        MySQLConnectionLifetimeSeconds:             0,--活躍狀態的的時間
        DefaultInstancePort:                        3306,--數據庫默認端口
        TLSCacheTTLFactor:                          100,--為TLS信息緩存到期的InstancePollSeconds的因子
        InstancePollSeconds:                        5,--實例之間讀取間隔
        InstanceWriteBufferSize:                    100,--實例寫入緩沖區大小
        BufferInstanceWrites:                       false,--在后端表上設置為“true”以進行寫入優化，寫入可能過時並覆蓋非陳舊數據
        InstanceFlushIntervalMilliseconds:          100,--實例寫入緩沖區刷新之間的最大間隔時間
        SkipMaxScaleCheck:                          false,--如果沒有MaxScale BinlogServer，請將其設置為“true”以保存一些無意義的查詢
        UnseenInstanceForgetHours:                  240,--忽略不可見的實例的小時數
        SnapshotTopologiesIntervalHours:            0,--快照拓撲調用之間的小時間隔。 默認值：0（禁用）
        DiscoverByShowSlaveHosts:                   false,--在PROCESSLIST之前嘗試SHOW SLAVE HOSTS
        UseSuperReadOnly:                           false,--每當它設置read_only時，orchestrator應該是super_read_only
        DiscoveryMaxConcurrency:                    300,--實例發現時go的最大進程數量。
        DiscoveryQueueCapacity:                     100000,--發現隊列的緩沖區大小。 應該大於發現的數據庫實例的數量
        DiscoveryQueueMaxStatisticsSize:            120,--發現隊列的第二次統計數據的最大數量
        DiscoveryCollectionRetentionSeconds:        120,--保留發現集合信息的秒數
        InstanceBulkOperationsWaitTimeoutSeconds:   10,--在進行批量操作時等待單個實例的時間
        HostnameResolveMethod:                      "default",
        MySQLHostnameResolveMethod:                 "@@hostname",
        SkipBinlogServerUnresolveCheck:             true, --跳過檢查未解析的主機名是否解析為binlog服務器的相同主機名
        ExpiryHostnameResolvesMinutes:              60, --主機名解析到期之前的分鍾數
        RejectHostnameResolvePattern:               "",--不接受解析主機名的正則表達式。 這樣做是為了避免因網絡故障而存儲錯誤
        ReasonableReplicationLagSeconds:            10,--復制延遲高於該值表示異常
        ProblemIgnoreHostnameFilters:               []string{},--將與給定的regexp過濾器匹配的主機名最小化問題
        VerifyReplicationFilters:                   false, --在拓撲重構之前檢查復制篩選器
        ReasonableMaintenanceReplicationLagSeconds: 20,--高於此值會上移和下移
        CandidateInstanceExpireMinutes:             60,--該時間之后，使用實例作為候選副本的建議已過期。
        AuditLogFile:                               "", --審計操作的日志文件名。 空的時候禁用。
        AuditToSyslog:                              false, --審計日志是否寫入到系統日志
        AuditToBackendDB:                           false, --審計日志是否入庫，表為audit，默認true
        RemoveTextFromHostnameDisplay:              "",--去除群集/群集頁面上的主機名的文本
        ReadOnly:                                   false,
        AuthenticationMethod:                       "",--身份驗證類型。可選值有：
"" for none, "basic" for BasicAuth,
"multi" for advanced BasicAuth, 
"proxy" for forwarded credentials via reverse proxy, 通過反向代理轉發憑證
"token" for token based access

        HTTPAuthUser:                               "", --HTTP基本身份驗證的用戶名，空表示禁用身份驗證
        HTTPAuthPassword:                           "", --HTTP基本身份驗證的密碼，空表示禁用密碼
        AuthUserHeader:                             "X-Forwarded-User",--當AuthenticationMethod為“proxy”時，HTTP標頭指示auth用戶
        PowerAuthUsers:                             []string{"*"},--在AuthenticationMethod ==“proxy”上，可以更改的用戶列表。 所有其他都是只讀的
        PowerAuthGroups:                            []string{},--經過身份驗證的用戶必須是unix組列表成員
        AccessTokenUseExpirySeconds:                60,--必須使用已頒發token的時間
        AccessTokenExpiryMinutes:                   1440,--訪問的到期的時間
        ClusterNameToAlias:                         make(map[string]string),
        DetectClusterAliasQuery:                    "",--可選查詢（在拓撲實例上執行），返回集群的別名
        DetectClusterDomainQuery:                   "",--可選查詢（在拓撲實例上執行），返回此集群主服務器的VIP / CNAME /別名/任何域名
        DetectInstanceAliasQuery:                   "",--可選查詢（在拓撲實例上執行），返回實例的別名
        DetectPromotionRuleQuery:                   "",--可選查詢（在拓撲實例上執行），返回實例的提升規則
        DataCenterPattern:                          "", --一個組的正則表達式模式，從主機名中提取數據中心名稱
        PhysicalEnvironmentPattern:                 "",--一個組的正則表達式模式，從主機名中提取物理環境信息
        DetectDataCenterQuery:                      "",--可選查詢（在拓撲實例上執行），返回實例的數據中心，覆蓋DataCenterPattern，對無法通過主機名推斷DC非常有用
        DetectPhysicalEnvironmentQuery:      "",--可選查詢（在拓撲實例上執行），返回實例的物理環境。覆蓋PhysicalEnvironmentPattern，對無法通過主機名推斷出env非常有用
        DetectSemiSyncEnforcedQuery:                "",--可選查詢（在拓撲實例上執行）以確定是否對主寫入完全強制執行半同步
        SupportFuzzyPoolHostnames:                  true,--應該“submit-pool-instances”命令能夠傳遞模糊實例列表（模糊意味着非fqdn，但足夠獨特，可以識別）。 默認值為“true”，表示后端數據庫上有更多查詢
        InstancePoolExpiryMinutes:                  60,--database_instance_pool的過期的時間
        PromotionIgnoreHostnameFilters:             []string{},--不使用主機名匹配模式來提升副本
        ServeAgentsHttp:                            false,--產生另一個專用於orchestrator-agent的HTTP接口
        AgentsUseSSL:                               false,--當“true”orchestrator將使用SSL偵聽代理端口以及通過SSL連接到代理時
        AgentsUseMutualTLS:                         false,--當“true”時，使用相互TLS服務器與代理通信
        AgentSSLValidOUs:                           []string{},--使用相互TLS與代理進行通信
        AgentSSLSkipVerify:                         false,--為代理使用SSL
        AgentSSLPrivateKeyFile:                     "",
        AgentSSLCertFile:                           "",
        AgentSSLCAFile:                             "",
        UseSSL:                                     false,--在Web端口上使用SSL
        UseMutualTLS:                               false,--“true”時使用TLS作為服務器的Web和API連接
        SSLValidOUs:                                []string{},--使用TLS交互
        SSLSkipVerify:                              false,--使用SSL時，是否應忽略SSL認證錯誤
        SSLPrivateKeyFile:                          "",
        SSLCertFile:                                "",
        SSLCAFile:                                  "",
        AgentPollMinutes:                           60,--代理輪詢的分鍾數
        UnseenAgentForgetHours:                     6,--忘記不可見的代理的小時數
        StaleSeedFailMinutes:                       60,--過時（無進展）被視為失敗的分鍾數
        SeedAcceptableBytesDiff:                    8192,--種子源和目標數據大小之間的字節差異仍被視為成功復制
        SeedWaitSecondsBeforeSend:                  2,--在代理上開始發送數據命令之前等待的秒數
        AutoPseudoGTID:                             false, --是否自動將Pseudo-GTID條目注入主服務器
        PseudoGTIDPattern:                          "",--在二進制日志中查找的模式，用於創建唯一條目（偽GTID）。 為空時，禁用基於偽GTID的重構。
        PseudoGTIDPatternIsFixedSubstring:          false,--如為true，則PseudoGTIDPattern不被視為正則表達式而是固定子字符串
        PseudoGTIDMonotonicHint:                    "",--Pseudo-GTID條目中的子字符串，表示Pseudo-GTID條目預計會單調遞增
        DetectPseudoGTIDQuery:                      "",--可選查詢，用於確定是否在實例上啟用了偽gtid
        BinlogEventsChunkSize:                      10000,--SHOW BINLOG | RELAYLOG EVENTS LIMIT的塊大小。 較小意味着更少的鎖定和工作要做
        SkipBinlogEventsContaining:                 []string{},--掃描/比較Pseudo-GTID的binlog時，跳過包含給定文本的條目。 這些不是正則表達式（掃描binlog時會消耗太多的CPU），只需查找子字符串。
        ReduceReplicationAnalysisCount:             true,--如果為true，則復制分析將報告可能首先處理問題的可能性的實例。 如果為false，則為每個已知實例提供一個條目
        FailureDetectionPeriodBlockMinutes:         60,--實例的故障發現保持“活動”的時間，以避免實例失敗的並發“發現”; 如果有的話，這會在任何恢復過程之前。
        RecoveryPeriodBlockMinutes:                 60,--實例的恢復保持“活動”的時間，以避免並發恢復
        RecoveryPeriodBlockSeconds:                 3600, --實例的恢復保持“活動”的時間，以避免並發恢復
        RecoveryIgnoreHostnameFilters:              []string{},--恢復分析將完全忽略與給定模式匹配的主機
        RecoverMasterClusterFilters:                []string{},--只對匹配這些正則表達式模式的集群進行主恢復（“*”模式匹配所有）
        RecoverIntermediateMasterClusterFilters:    []string{},--只對匹配這些正則表達式模式的集群進行恢復（“*”模式匹配所有內容）
        ProcessesShellCommand:                      "bash",--執行命令腳本的Shell
        OnFailureDetectionProcesses:                []string{},--檢測故障轉移方案時執行（在決定是否進行故障轉移之前）。 可以並且應該使用其中一些占位符{failureType}，{failureDescription}，{command}，{failedHost}，{failureCluster}，{failureClusterAlias}，{failureClusterDomain}，{failedPort}，{successorHost}，{successorPort}，{ successorAlias}，{countReplicas}，{replicaHosts}，{isDowntimed}，{autoMasterRecovery}，{autoIntermediateMasterRecovery}
        PreGracefulTakeoverProcesses:      []string{},--在主變為只讀之前立即執行。 可以並且應該使用其中一些占位符：{failureType}，{failureDescription}，{command}，{failedHost}，{failureCluster}，{failureClusterAlias}，{failureClusterDomain}，{failedPort}，{successorHost}，{successorPort}，{ successorAlias}，{countReplicas}，{replicaHosts}，{isDowntimed}
        PreFailoverProcesses:                       []string{},--在執行恢復操作之前立即執行。任何這些進程的失敗（非零退出代碼）都會中止恢復。提示：這使您有機會根據系統的某些內部狀態中止恢復。 可以並且應該使用其中一些占位符：{failureType}，{failureDescription}，{command}，{failedHost}，{failureCluster}，{failureClusterAlias}，{failureClusterDomain}，{failedPort}，{successorHost}，{successorPort}，{ successorAlias}，{countReplicas}，{replicaHosts}，{isDowntimed}
        PostMasterFailoverProcesses:                []string{},--在主恢復成功結束時執行（未定義的執行順序）。 使用與PostFailoverProcesses相同的占位符
        PostIntermediateMasterFailoverProcesses:    []string{},--在成功的中間主恢復結束時執行（未定義的執行順序）。 使用與PostFailoverProcesses相同的占位符
        PostFailoverProcesses:                      []string{},--在成功恢復結束時執行（包括並添加到上述兩個）。 可以並且應該使用其中一些占位符：{failureType}，{failureDescription}，{command}，{failedHost}，{failureCluster}，{failureClusterAlias}，{failureClusterDomain}，{failedPort}，{successorHost}，{successorPort}，{ successorAlias}，{countReplicas}，{replicaHosts}，{isDowntimed}，{isSuccessful}，{lostReplicas}
        PostUnsuccessfulFailoverProcesses:          []string{},--在任何不成功的恢復結束時執行。（未定義的執行順序）。 可以並且應該使用其中一些占位符：{failureType}，{failureDescription}，{command}，{failedHost}，{failureCluster}，{failureClusterAlias}，{failureClusterDomain}，{failedPort}，{successorHost}，{successorPort}，{ successorAlias}，{countReplicas}，{replicaHosts}，{isDowntimed}，{isSuccessful}，{lostReplicas}
        PostGracefulTakeoverProcesses:              []string{},--在舊主位於新晉升的主之后執行。 使用與PostFailoverProcesses相同的占位符
        CoMasterRecoveryMustPromoteOtherCoMaster:   true,--當'false'時，任何都可以得到提升（候選人比其他人更受歡迎）。 當'true'時，將促進其他共同主人或否則失敗
        DetachLostSlavesAfterMasterFailover(DetachLostReplicasAfterMasterFailover):        true,--恢復過程中可能會丟失一些副本。如果為true，將通過detach-replica命令強制中斷其復制，以確保沒有人認為它們完全正常運行。
        ApplyMySQLPromotionAfterMasterFailover:     true,--將重置slave all並在提升的master上設置read_only = 0，默認true。
        PreventCrossDataCenterMasterFailover:       false,--如果為true（默認值：false），則不允許跨DC主故障轉移，orchestrator將盡其所能僅在同一DC內進行故障轉移，否則不進行故障轉移。
        MasterFailoverLostInstancesDowntimeMinutes: 0,--在主故障轉移（包括失敗的主副本和丟失的副本）之后丟失的任何服務器停機的分鍾數。 0表示禁用
        MasterFailoverDetachSlaveMasterHost(MasterFailoverDetachReplicaMasterHost):        false,--orchestrator是否應該在新升級的master上發出detach-replica-master-host（這樣可以確保新master不會嘗試復制舊的master，如果它恢復生命）。 默認為'false'。 如果ApplyMySQLPromotionAfterMasterFailover為'true'則無意義。

        FailMasterPromotionIfSQLThreadNotUpToDate:  false,--如果為true，並且發生主故障轉移，如果候選主服務器未消耗所有中繼日志（延遲），則中止並顯示錯誤
        PostponeSlaveRecoveryOnLagMinutes（PostponeReplicaRecoveryOnLagMinutes）:          0,--在崩潰恢復時，滯后超過給定分鍾的副本僅在主/ IM被選出並執行進程后才在恢復過程的后期復活。 值為0將禁用此功能
        RemoteSSHForMasterFailover:                 false,--orchestrator是否應該在主故障轉移時嘗試遠程-shsh relaylog-synching？ 需要RemoteSSHCommand
        RemoteSSHCommand:                           "",--一個`ssh`命令，由恢復進程用於讀取/應用relaylogs。 如果提供，則此變量必須包含文本“{hostname}”。 遠程SSH登錄必須具有讀/寫中繼日志的權限。 示例：“setuidgid remoteuser ssh {hostname}”
        RemoteSSHCommandUseSudo:                    true,--是否orchestrator應該在SSH命令下在遠程主機上應用'sudo'
        OSCIgnoreHostnameFilters:                   []string{},--建議將忽略與給定模式匹配的副本主機名
        GraphiteAddr:                               "",
        GraphitePath:                               "",
        GraphiteConvertHostnameDotsToUnderscores:   true,
        GraphitePollSeconds:                        60,
        URLPrefix:                                  "",
        DiscoveryIgnoreReplicaHostnameFilters: []string{},
        ConsulAddress:                         "",
        ConsulAclToken:                        "",
        ZkAddress:                             "",
        KVClusterMasterPrefix:                 "mysql/master",
        WebMessage:                            "",

View Code

按照本文的測試用例，定制了一個相關的模板（/etc/orchestrator.conf.json）：

{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "Aa123456",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "123456",
  "MySQLOrchestratorCredentialsConfigFile": "",
  "MySQLOrchestratorSSLPrivateKeyFile": "",
  "MySQLOrchestratorSSLCertFile": "",
  "MySQLOrchestratorSSLCAFile": "",
  "MySQLOrchestratorSSLSkipVerify": true,
  "MySQLOrchestratorUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,
  "MySQLTopologyReadTimeoutSeconds": 3,
  "MySQLDiscoveryReadTimeoutSeconds": 3,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 3,
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "SkipMaxScaleCheck":true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 1440,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ":3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "SlaveLagQuery": "",
  "DetectClusterAliasQuery":  "SELECT cluster_name FROM meta.cluster WHERE cluster_name = left(@@hostname,4) ",
  "DetectClusterDomainQuery": "SELECT cluster_domain FROM meta.cluster WHERE cluster_name = left(@@hostname,4) ",
  "DetectInstanceAliasQuery": "SELECT @@hostname as instance_alias",
  "DetectPromotionRuleQuery": "",
  "DetectDataCenterQuery": "SELECT data_center FROM meta.cluster WHERE cluster_name = left(@@hostname,4) ",
  "PhysicalEnvironmentPattern": "",
  "PromotionIgnoreHostnameFilters": [],
  "DetachLostReplicasAfterMasterFailover": true,
  "DetectSemiSyncEnforcedQuery": "SELECT 0 AS semisync FROM DUAL WHERE NOT EXISTS (SELECT 1 FROM performance_schema.global_variables WHERE VARIABLE_NAME = 'rpl_semi_sync_master_wait_no_slave' AND VARIABLE_VALUE = 'ON') UNION SELECT 1 FROM DUAL WHERE EXISTS (SELECT 1 FROM performance_schema.global_variables WHERE VARIABLE_NAME = 'rpl_semi_sync_master_wait_no_slave' AND VARIABLE_VALUE = 'ON')",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "AutoPseudoGTID":true,
  "PseudoGTIDPattern": "drop view if exists `meta`.`_pseudo_gtid_hint__asc:",
  "PseudoGTIDPatternIsFixedSubstring": true,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "select count(*) as pseudo_gtid_exists from meta.pseudo_gtid_status where anchor = 1 and time_generated > now() - interval 2 hour",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPeriodBlockSeconds": 31,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": ["*"],
  "RecoverIntermediateMasterClusterFilters": ["*"],
  "OnFailureDetectionProcesses": [
    "echo '②  Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo '①   Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo '③  Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostMasterFailoverProcesses": [
    "echo '④  Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '⑤  (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [
    "echo '⑧  >> /tmp/recovery.log'"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo '⑥ Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo '⑦ Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "MasterFailoverDetachSlaveMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,

  "RaftEnabled": true,
  "BackendDB": "mysql",
  "RaftBind": "192.168.163.131",
  "RaftDataDir": "/var/lib/orchestrator",
  "DefaultRaftPort": 10008,
  "RaftNodes": [
    "192.168.163.131",
    "192.168.163.132",
    "192.168.163.133"
    ],
 "ConsulAddress": "",
 "ConsulAclToken": ""
}

View Code

這里列出說明幾個比較重要的參數：

ListenAddress：　　
web界面的http端口
MySQLOrchestratorHost
orch后端數據庫地址
MySQLOrchestratorPort
orch后端數據庫端口
MySQLOrchestratorDatabase
orch后端數據庫名
MySQLOrchestratorUser
orch后端數據庫用戶名（明文）
MySQLOrchestratorPassword
orch后端數據庫密碼（明文）
MySQLOrchestratorCredentialsConfigFile
后端數據庫用戶名密碼的配置文件「 "MySQLOrchestratorCredentialsConfigFile": "/etc/mysql/orchestrator-backend.cnf" 」，格式：
```
[client]
user=orchestrator_srv
password=${ORCHESTRATOR_PASSWORD}
```
后端MySQL數據庫的用戶權限需要是：
```
CREATE USER 'orchestrator_srv'@'orc_host' IDENTIFIED BY 'orc_server_password';
GRANT ALL ON orchestrator.* TO 'orchestrator_srv'@'orc_host';
```
MySQLTopologyUser
被管理的MySQL的用戶（明文）
MySQLTopologyPassword
被管理的MySQL的密碼（密文）
MySQLTopologyCredentialsConfigFile
被管理的MySQL的用戶密碼配置文件「"/etc/mysql/orchestrator-topology.cnf"」，格式：
```
[client]
user=orchestrator_srv
password=${ORCHESTRATOR_PASSWORD}
```

被管理MySQL數據庫的用戶權限需要是：

CREATE USER 'orchestrator'@'orc_host' IDENTIFIED BY 'orc_topology_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, REPLICATION CLIENT, RELOAD ON *.* TO 'orchestrator'@'orc_host';
GRANT SELECT ON meta.* TO 'orchestrator'@'orc_host';
GRANT SELECT ON ndbinfo.processes TO 'orchestrator'@'orc_host'; -- Only for NDB Cluster

InstancePollSeconds
orch探測MySQL的間隔秒數
MySQLConnectTimeoutSeconds
orch連接MySQL的超時時間
MySQLOrchestratorReadTimeoutSeconds
后端MySQL讀超時時間
MySQLTopologyReadTimeoutSeconds
被管理MySQL讀超時時間，用於除發現查詢以外的所有查詢
MySQLDiscoveryReadTimeoutSeconds
被管理MySQL讀超時時間，用於發現
DefaultInstancePort
被管理MySQL的默認端口
DiscoverByShowSlaveHosts
通過show slave hosts 來發現拓撲結構
UnseenInstanceForgetHours
忽略看不見的實例的小時數
HostnameResolveMethod
解析主機名，使用主機名：default；不解析用none，直接用IP
MySQLHostnameResolveMethod
解析主機名，發出select @@hostname；發出select @@report_host（需要配置report_host）。不解析用""，直接用IP。
InstanceBulkOperationsWaitTimeoutSeconds
進行批量操作時等待單個實例的時間
ReasonableReplicationLagSeconds
復制延遲高於該值表示異常
VerifyReplicationFilters
在拓撲重構之前檢查復制篩選器
ReasonableMaintenanceReplicationLagSeconds
復制延遲高於該值會上下移動調整MySQL拓撲
CandidateInstanceExpireMinutes
該時間之后，使用實例作為候選從庫（在主故障轉移時提升）的建議到期
ReplicationLagQuery（SlaveLagQuery）
使用SHOW SLAVE STATUS進行延遲判斷，力度為秒。使用pt-heartbeat。這提供了亞秒級的力度，允許自己設置查詢「 "select absolute_lag from meta.heartbeat_view"」
DetectClusterAliasQuery
查詢集群別名的query，信息放到每個被管理實例的meta庫的cluster表中「"select ifnull(max(cluster_name), '') as cluster_alias from meta.cluster where anchor=1"」
DetectClusterDomainQuery
查詢集群Domain的query，信息放到每個被管理實例的meta庫的cluster表中「select ifnull(max(cluster_domain), '') as cluster_domain from meta.cluster where anchor=1」
DetectInstanceAliasQuery
查詢實例的別名
DetectDataCenterQuery
查詢數據中心的query，信息放到每個被管理實例的meta庫的cluster表中「"select substring_index(substring_index(@@hostname, '-',3), '-', -1) as dc"」
DetachLostReplicasAfterMasterFailover（DetachLostSlavesAfterMasterFailover）
是否強制分離在主恢復中不會丟失的從庫
DetectSemiSyncEnforcedQuery
檢測是否強制半同步
AutoPseudoGTID
是否自動將Pseudo-GTID條目注入主服務器，對於沒用GTID的復制推薦使用。要是使用了GTID的復制，設置“false”即可。
RecoveryPeriodBlockSeconds
在該時間內再次出現故障，不會進行遷移，避免出現並發恢復和不穩定。
FailureDetectionPeriodBlockMinutes
在該時間內再次出現故障，不會被多次發現。
RecoverMasterClusterFilters
只對匹配這些正則表達式模式的集群進行主恢復（“*”模式匹配所有）。
RecoverIntermediateMasterClusterFilters
只對匹配這些正則表達式模式的集群進行主恢復（“*”模式匹配所有）。
OnFailureDetectionProcesses
檢測故障轉移時執行，屬於Hooks。
PreGracefulTakeoverProcesses
在主變為只讀之前立即執行，屬於Hooks。
PreFailoverProcesses
在執行恢復操作之前立即執行，屬於Hooks。
PostMasterFailoverProcesses
在主恢復成功結束時執行，屬於Hooks。
PostFailoverProcesses
在成功恢復結束時執行，屬於Hooks。
PostUnsuccessfulFailoverProcesses
在任何不成功的恢復結束時執行，屬於Hooks。
PostIntermediateMasterFailoverProcesses
在成功的中間主恢復結束時執行，屬於Hooks。
PostGracefulTakeoverProcesses
在舊主位於新晉升的主之后執行，屬於Hooks。
CoMasterRecoveryMustPromoteOtherCoMaster
當'false'時，任何實例都可以得到提升；當'true'時，將提升共同主人否則失敗。
ApplyMySQLPromotionAfterMasterFailover
將重置slave all並在提升的master上設置read_only = 0，默認true
PreventCrossDataCenterMasterFailover
如果為true（默認值：false），則不允許跨DC主故障轉移，orchestrator將盡其所能僅在同一DC內進行故障轉移，否則不進行故障轉移。
MasterFailoverDetachReplicaMasterHost（MasterFailoverDetachSlaveMasterHost）
否應該在新升級的master上發出detach-replica-master-host，這樣可以確保新master不會嘗試復制正常之后的舊的master。如果參數ApplyMySQLPromotionAfterMasterFailover為True，則該參數無意義。
MasterFailoverLostInstancesDowntimeMinutes
主故障轉移后丟失的任何服務器停機的分鍾數（包括失敗的主和丟失的從）。 0表示禁用。
PostponeReplicaRecoveryOnLagMinutes（PostponeSlaveRecoveryOnLagMinutes）
在崩潰恢復時，延遲超過給定分鍾的從庫在主被選出后才復活。值為0將禁用此功能。
BackendDB
后端數據庫類型。
RaftEnabled
是否開啟Raft，保證orch的高可用。
RaftDataDir
Raft的數據目錄。
RaftBind
Raft 的 bind地址。
DefaultRaftPort
Raft的端口。
RaftNodes
Raft的節點。
ConsulAddress
Consul的地址。
ConsulAclToken
Consul的token。

運行部署

環境：

　在三台測試機上各自安裝MySQL2個實例：orch用的后端MySQL（3306）和被orch管理的MySQL（3307）。按照給出的配置模板，首先在后端數據庫的實例上創建賬號：

CREATE USER 'orchestrator'@'127.0.0.1' IDENTIFIED BY '123456';
GRANT ALL ON orchestrator.* TO 'orchestrator'@'127.0.0.1';

再在被管理的MySQL（3307）實例上創建賬號：

CREATE USER 'orchestrator'@'%' IDENTIFIED BY 'Aa123456';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'%';
GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'%';
GRANT SELECT ON meta.* TO 'orchestrator'@'orc_host';

其中meta庫的作用是自己的query所用到的，如：cluster、pseudo_gtid_status等，后面會有相關說明。

到此，關於orch的環境已經准備完畢，最后只需要把被管理的3台MySQL部署成一主二從即可（rep）:

Master：192.168.163.131:3307
Slave  ：192.168.163.132:3307
Slave  ：192.168.163.133:3307

最后，因為配置文件里寫的是域名(hostname)，所以需要修改三台被管理MySQL的hosts。即：

192.168.163.131 test1
192.168.163.132 test2
192.168.163.133 test3

安裝：

1. 開啟orchestrator

./orchestrator --debug --config=/etc/orchestrator.conf.json http

2. 把配置好的復制實例加入到orchestrator，因為orch可以自動發現整個拓撲的所有實例，所以只需要添加任意一台實例即可，如果沒有發現的話可以再添加。

在web上添加（導航里的Clusters -> Discover）：

添加完成之后，最終的結構圖如下：

總結：

限於篇幅的原因，本文先對orchestrator進行參數和部署做些簡單的說明，對於一些Failover和HA在放在下一篇「MySQL高可用復制管理工具 —— Orchestrator使用」中進行介紹。

參考文檔：

https://github.com/github/orchestrator

https://www.percona.com/blog/2016/03/08/orchestrator-mysql-replication-topology-manager/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 MySQL高可用復制管理工具 —— Orchestrator使用 Orchestrator：MySQL復制拓撲結構管理工具版本管理工具Git（三）Gitlab高可用 MySQL管理工具MySQL Utilities — 介紹與安裝(1) webshell管理工具簡單介紹 mysql 連接管理工具 mysqladmin(MySQL管理工具) mysql 免費的圖形管理工具工具 | PG 集群復制管理工具 repmgr mysql 常用圖形管理工具