MySQL高可用主从集群(三)——Orchestrator集群部署

本文介绍了Orchestrator,一个MySQL高可用性和复制管理工具,详细阐述了其部署步骤,包括下载、安装、配置MySQL集群发现,以确保在主节点故障时能自动选举新主节点,保持集群服务的高可用性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

上一篇MySQL高可用主从集群(二)中讲解了如何部署ProxySQL中间件以及通过ProxySQL代理MySQL主从集群并实现读写分离,极大的提高了MySQL集群的性能,但仍存在一些问题,当MySQL集群发生故障时,例如主节点宕机,集群将无法继续对外提供服务。这时可以通过Orchestrator对MySQL集群进行监测,当集群主节点发生故障时,Orchestrator可以将在剩下的从节点中选举出新的主节点,保证集群的高可用性。

1 Orchestrator介绍

GitHub地址:https://github.com/openark/orchestrator

orchestrator是一个MySQL高可用性和复制管理工具,作为一个服务运行,并提供命令行访问、HTTP API和Web界面。orchestrator支持以下功能:

发现
orchestrator主动遍历您的拓扑,并绘制拓扑图。它读取基本的MySQL信息,如复制状态和配置。它为您提供了拓扑的精美可视化,包括在面对故障时的复制问题。

重构
orchestrator了解复制规则。它了解关于binlog文件位置、GTID、伪GTID和binlog服务器的信息。重构复制拓扑可以简单地将一个复制品拖动到另一个主服务器下。移动复制品是安全的:orchestrator会拒绝非法的重构尝试。通过各种命令行选项可以实现细粒度的控制。

恢复
orchestrator使用全面的方法来检测主服务器和中间主服务器的故障。根据从拓扑本身获取的信息,它可以识别各种故障场景。

可配置的,它可以选择执行自动恢复(或允许用户选择手动恢复类型)。中间主服务器的恢复由orchestrator内部完成。主服务器故障切换由预/后故障钩子支持。

恢复过程利用orchestrator对拓扑的理解和其进行重构的能力。它基于状态而不是配置:orchestrator在恢复过程中通过调查/评估拓扑来选择最佳的恢复方法。

2 orchestrator集群部署

部署一个一主两从的orchestrator集群,当主节点宕机时,会在从节点中选举出新的主节点,保证orchestrator集群的高可用性,并继续为检测MySQL集群服务。

2.1 下载orchestrator

​
wget https://github.com/openark/orchestrator/releases/download/v3.2.2/orchestrator-client-3.2.2-1.x86_64.rpm

wget https://github.com/openark/orchestrator/releases/download/v3.2.2/orchestrator-cli-3.2.2-1.x86_64.rpm

wget https://github.com/openark/orchestrator/releases/download/v3.2.2/orchestrator-3.2.2-1.x86_64.rpm


​


2.2 yum安装orchestrator

进入安装包存放目录

yum localinstall orchestrator-3.2.2-1.x86_64.rpm

yum localinstall orchestrator-client-3.2.2-1.x86_64.rpm

yum localinstall orchestrator-cli-3.2.2-1.x86_64.rpm

2.3 修改orchestrator配置

2.3.1修改orchestrator.service

sudo sed -i 's@ExecStart=/usr/local/orchestrator/orchestrator http@ExecStart=/usr/local/orchestrator/orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json http@g' /etc/systemd/system/orchestrator.service
##重新加载配置文件
systemctl daemon-reload
##创建orchestrator文件夹
mkdir -p /orchestrator

2.3.2配置DB的hosts

配置系统hosts,在orchestrator每个节点上配置MySQL集群的主机和域名,并在每个MySQL节点上进行如下配置

sudo cat >> /etc/hosts << EOF
10.0.0.1    mysql-0001
10.0.0.2    mysql-0002
10.0.0.3    mysql-0003
EOF

配置orchestrator的host配置文件

sudo cat >> /usr/local/orchestrator/host.cnf << EOF
10.0.0.1    mysql-0001
10.0.0.2    mysql-0002
10.0.0.3    mysql-0003

2.3.3配置orchestrator.conf.json

sudo cat >> /usr/local/orchestrator/orchestrator.conf.json << EOF

{
  "Debug": false,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orch_monitor",##配置访问mysql集群的用户
  "MySQLTopologyPassword": "123456",##配置访问mysql集群的密码
  "MySQLTopologyUseMutualTLS": false,
  "BackendDB": "sqlite", ##orchestrator持久化方式,使用mysql时需要配置mysql信息
  "SQLite3DataFile": "/orchestrator/orchestrator.sqlite3",
  "MySQLOrchestratorHost": "",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "",
  "MySQLOrchestratorUser": "",
  "MySQLOrchestratorPassword": "",
  "UseSuperReadOnly": true,
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": false,
  "InstancePollSeconds": 5,
  "RaftEnabled": true,
  "RaftDataDir": "/orchestrator",
  "RaftBind": "10.0.0.6",##配置本节点IP
  "DefaultRaftPort": 10008,##orchestrator集群节点交互端口 
  "RaftNodes": ["10.0.0.6","10.0.0.7","10.0.0.8"],##orchestrator集群节点
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 20,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 120,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [""],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "orch local"
  },
  "ReplicationLagQuery": "",
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 30,
  "FailMasterPromotionOnLagMinutes": 0,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "_intermediate_master_pattern_"
  ],
  "OnFailureDetectionProcesses": [
    "echo '1 Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "/usr/bin/prefailover.sh {failedHost} {failedPort} >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "/usr/bin/postfailover.sh {failedHost} {failedPort} {successorHost} {successorPort}  >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [
    "echo '4 Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo '5 Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "PreventCrossRegionMasterFailover": false,
  "MasterFailoverDetachReplicaMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeReplicaRecoveryOnLagMinutes": 0,
  "DelayMasterPromotionIfSQLThreadNotUpToDate": true,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "ConsulAddress": "",
  "ConsulAclToken": ""
}
EOF

2.3.4配置orchestrator恢复文件

创建prefailover.sh脚本,配置在orchestrator.conf.json,用于orchestrator进行mysql集群恢复操作之前立即执行

创建exec_prefailover.sh脚本,配置在prefailover.sh,用于orchestrator进行mysql集群恢复操作之前立即执行

创建postfailover.sh脚本,配置在orchestrator.conf.json,用于orchestrator进行mysql集群成功的恢复结束时执行

创建exec_postfailover.sh脚本,配置在postfailover.sh,用于orchestrator进行mysql集群成功的恢复结束时执行

##重载服务
systemctl daemon-reload

2.3.5启动orchestrator

##创建持久化文件保存路径
mkdir -p /db/orchestrator
##启动orchestrator
systemctl start orchestrator
systemctl restart orchestrator
systemctl stop orchestrator
systemctl status orchestrator
##查看日志
tail -f -n 200 /var/log/messages

2.3.6配置orchestrator集群环境变量

sudo cat >> /etc/profile.d/orchestrator.sh << EOF
export ORCHESTRATOR_API=" http://10.0.0.6:3000/api   http://10.0.0.7:3000/api   http://10.0.0.8:3000/api"
export ORCHESTRATOR_AUTH_USER=admin
export ORCHESTRATOR_AUTH_PASSWORD=123456
unset i
unset -f pathmunge
export TMOUT=900
EOF
##使环境变量生效
source /etc/profile

2.3.7开放防火墙端口

firewall-cmd --zone=public --add-port=3000/tcp --permanent
firewall-cmd --zone=public --add-port=10008/tcp --permanent
firewall-cmd --reload

2.4 配置mysql集群发现

2.4.1 mysql主库创建orch管理用户

登录mysql集群主库

create user 'orch_monitor'@'%' identified by '123456';
GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'orch_monitor'@'%' ;
GRANT SELECT ON meta.* TO 'orch_monitor'@'%';
ALTER USER 'orch_monitor'@'%' IDENTIFIED WITH mysql_native_password BY '123456';
flush privileges;

2.4.2 mysql所有节点追加配置,my.cnf文件增加

orchestrator管理需要在被管理的mysql配置中加入下面的配置

#mysql本节点ip
report_host=10.0.0.1
report_port=3306

2.4.3 运行发现服务

orchestrator主节点运行发现服务

orchestrator-client -c discover -i 10.0.0.1:3306

orchestrator会自动发现mysql主从集群的拓扑结构

2.4.4 设置提升规则

orchestrator主节点,设置提升规则:这里设置业务库一主二从的一个从库为主要切换节点,当mysql主节点宕机时,orchestrator会邮箱将此从节点提升为mysql的主节点

/usr/bin/orchestrator-client -c register-candidate -i mysql-0002:3306 --promotion-rule prefer

2.4.5 设置主切换点定时检查

在orchestrator每个节点上设置主切换点定时检查

crontab -e
#添加定时任务
*/2 * * * * /usr/bin/perl -le 'sleep rand 10' && /usr/bin/orchestrator-client -c register-candidate -imysql-0002:3306 --promotion-rule prefer

2.5 访问orchestrator web

http://10.0.0.6:3000/

检查mysql集群是否有异常

MySQL高可用主从集群(一)——MySQL主从集群搭建

MySQL高可用主从集群(二)——ProxySQL中间件部署

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值