本文档介绍TDSQL数据库的高可用性最佳实践,包括高可用架构设计、故障切换、监控和测试等方面。风哥教程参考TDSQL官方文档和生产环境经验,提供实用的高可用性配置和操作步骤。
高可用性是数据库系统的重要特性,通过合理的架构设计和配置,可以确保系统在面对各种故障时能够持续提供服务,学习交流加群风哥微信: itpux-com。
本文档将从基础概念、生产环境规划、实施方案、案例分析和经验总结等方面,全面介绍TDSQL高可用性的最佳实践方法。
目录大纲
Part01-基础概念与理论知识
1.1 高可用性基础概念
高可用性是指系统在面对各种故障时能够持续提供服务的能力。对于数据库系统来说,高可用性意味着即使在硬件故障、软件故障或网络故障等情况下,数据库仍然能够正常运行,保证业务的连续性。
高可用性的核心目标是减少系统 downtime,提高服务的可靠性和可用性,风哥提示:高可用性是一个系统性工程,需要从架构设计、硬件配置、软件配置、监控等多个方面进行考虑。
1.2 高可用性指标
常见的高可用性指标包括:
- 可用性:系统能够正常提供服务的时间占总时间的比例,通常用几个9来表示,如99.9%(3个9)、99.99%(4个9)等
- 恢复时间目标(RTO):系统从故障中恢复到正常运行所需的时间
- 恢复点目标(RPO):系统从故障中恢复后,可能丢失的数据量
- 故障切换时间:从主节点故障到备节点接管服务所需的时间
1.3 高可用性架构模式
常见的高可用性架构模式包括:
- 主从架构:一个主节点负责写操作,多个从节点负责读操作,当主节点故障时,从节点晋升为主节点
- 多主架构:多个节点都可以进行写操作,数据通过复制机制同步
- 集群架构:多个节点组成一个集群,共同提供服务,当某个节点故障时,其他节点接管其工作
- Proxy架构:通过代理层实现请求的分发和故障切换
Part02-生产环境规划与建议
2.1 高可用架构规划
高可用架构规划需要考虑以下因素:
- 业务需求:根据业务的重要性和对可用性的要求,选择合适的高可用架构
- 预算限制:根据预算情况,选择成本效益最优的高可用方案
- 技术复杂度:考虑团队的技术能力,选择易于管理和维护的高可用方案
- 扩展性:考虑未来业务的增长,选择具有良好扩展性的高可用方案
2.2 网络与存储规划
网络与存储规划是高可用性的重要组成部分,需要考虑以下因素:
- 网络架构:采用冗余网络架构,确保网络连接的可靠性
- 网络延迟:确保节点之间的网络延迟较低,以保证数据同步的及时性
- 存储方案:采用共享存储或分布式存储,确保数据的可靠性和一致性
- 存储性能:确保存储性能满足业务需求,避免存储成为性能瓶颈
2.3 高可用配置建议
高可用配置建议包括:
- 使用至少3个节点,确保集群的稳定性
- 采用跨机房部署,提高容灾能力
- 配置合理的复制策略,如半同步复制或同步复制
- 设置合理的故障切换阈值,避免误切换
- 定期进行故障切换测试,确保切换流程的可靠性
Part03-生产环境项目实施方案
3.1 主从复制配置
主从复制是实现高可用性的基础,需要正确配置复制参数和监控复制状态。
# 配置主节点
vim /etc/my.cnf
# 添加以下配置
server-id = 1
log-bin = mysql-bin
binlog-format = ROW
sync-binlog = 1
innodb-flush-log-at-trx-commit = 1
# 配置文件编辑成功,无输出
# 重启主节点
systemctl restart mysqld
# 服务重启成功,无输出
# 创建复制用户
mysql -u fgedu -p -e “CREATE USER ‘repl’@’192.168.1.%’ IDENTIFIED BY ‘Repl123!’; GRANT REPLICATION SLAVE ON *.* TO ‘repl’@’192.168.1.%’;”
Enter password:
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
# 查看主节点状态
mysql -u fgedu -p -e “SHOW MASTER STATUS;”
Enter password:
+——————+———-+————–+——————+——————-+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+——————+———-+————–+——————+——————-+
| mysql-bin.000001 | 123456 | | | |
+——————+———-+————–+——————+——————-+
# 配置从节点
vim /etc/my.cnf
# 添加以下配置
server-id = 2
relay-log = relay-bin
read-only = 1
# 配置文件编辑成功,无输出
# 重启从节点
systemctl restart mysqld
# 服务重启成功,无输出
# 配置从节点复制
mysql -u fgedu -p -e “CHANGE MASTER TO MASTER_HOST=’192.168.1.100′, MASTER_USER=’repl’, MASTER_PASSWORD=’Repl123!’, MASTER_LOG_FILE=’mysql-bin.000001′, MASTER_LOG_POS=123456; START SLAVE;”
Enter password:
Query OK, 0 rows affected, 2 warnings (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
# 查看从节点状态
mysql -u fgedu -p -e “SHOW SLAVE STATUS\G”
Enter password:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.100
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 123456
Relay_Log_File: relay-bin.000001
Relay_Log_Pos: 123456
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 123456
Relay_Log_Space: 123456
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 12345678-1234-1234-1234-1234567890ab
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
3.2 故障切换配置
故障切换是高可用性的核心,需要配置自动故障切换机制,确保在主节点故障时能够快速切换到备节点。
# 安装和配置Keepalived
yum install keepalived -y
vim /etc/keepalived/keepalived.conf
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Resolving Dependencies
–> Running transaction check
—> Package keepalived.x86_64 0:2.2.7-1.el9 will be installed
–> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
keepalived x86_64 2.2.7-1.el9 ol9_appstream 324 k
Transaction Summary
================================================================================
Install 1 Package
Total download size: 324 k
Installed size: 934 k
Is this ok [y/d/N]: y
Downloading packages:
keepalived-2.2.7-1.el9.x86_64.rpm | 324 kB 00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : keepalived-2.2.7-1.el9.x86_64 1/1
Running scriptlet: keepalived-2.2.7-1.el9.x86_64 1/1
Verifying : keepalived-2.2.7-1.el9.x86_64 1/1
Installed:
keepalived.x86_64 0:2.2.7-1.el9
Complete!
# 配置Keepalived主节点
cat > /etc/keepalived/keepalived.conf << 'EOF' ! Configuration File for keepalived vrrp_script chk_mysql { script "/etc/keepalived/check_mysql.sh" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.1.200 } track_script { chk_mysql } } EOF
# 配置文件创建成功,无输出
# 创建MySQL检查脚本
cat > /etc/keepalived/check_mysql.sh << 'EOF'
#!/bin/bash
# check_mysql.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
mysql -u fgedu -pFgedu123! -e "SELECT 1" > /dev/null 2>&1
if [ $? -ne 0 ]; then
systemctl stop keepalived
fi
EOF
chmod +x /etc/keepalived/check_mysql.sh
# 脚本创建成功,无输出
# 启动Keepalived服务
systemctl start keepalived
systemctl enable keepalived
# 服务启动成功,无输出
3.3 读写分离配置
读写分离可以提高系统的并发处理能力,减轻主节点的压力。
# 安装和配置ProxySQL
yum install proxysql -y
vim /etc/proxysql.cnf
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Resolving Dependencies
–> Running transaction check
—> Package proxysql.x86_64 0:2.5.4-1.el9 will be installed
–> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
proxysql x86_64 2.5.4-1.el9 ol9_appstream 10 M
Transaction Summary
================================================================================
Install 1 Package
Total download size: 10 M
Installed size: 30 M
Is this ok [y/d/N]: y
Downloading packages:
proxysql-2.5.4-1.el9.x86_64.rpm | 10 MB 00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : proxysql-2.5.4-1.el9.x86_64 1/1
Running scriptlet: proxysql-2.5.4-1.el9.x86_64 1/1
Verifying : proxysql-2.5.4-1.el9.x86_64 1/1
Installed:
proxysql.x86_64 0:2.5.4-1.el9
Complete!
# 配置ProxySQL
mysql -u admin -padmin -h 127.0.0.1 -P 6032 -e “INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (1, ‘192.168.1.100’, 3306), (2, ‘192.168.1.101’, 3306); INSERT INTO mysql_users(username, password, default_hostgroup) VALUES (‘fgedu’, ‘Fgedu123!’, 1); INSERT INTO mysql_query_rules(rule_id, active, match_pattern, destination_hostgroup) VALUES (1, 1, ‘^SELECT.*’, 2); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;”
# 配置成功,无输出
3.4 高可用监控配置
高可用监控是确保系统稳定运行的重要手段,需要配置实时监控和告警机制。
# 安装和配置Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xzf prometheus-2.45.0.linux-amd64.tar.gz
mv prometheus-2.45.0.linux-amd64 /tdsql/app/prometheus
–2026-04-09 12:00:00– https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443… connected.
HTTP request sent, awaiting response… 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/9484247/a6f1f0c6-1a5a-4129-88f4-015049577a9a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20260409%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260409T120000Z&X-Amz-Expires=300&X-Amz-Signature=1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=9484247&response-content-disposition=attachment%3B%20filename%3Dprometheus-2.45.0.linux-amd64.tar.gz&response-content-type=application%2Foctet-stream [following]
–2026-04-09 12:00:00– https://objects.githubusercontent.com/github-production-release-asset-2e65be/9484247/a6f1f0c6-1a5a-4129-88f4-015049577a9a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20260409%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260409T120000Z&X-Amz-Expires=300&X-Amz-Signature=1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=9484247&response-content-disposition=attachment%3B%20filename%3Dprometheus-2.45.0.linux-amd64.tar.gz&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)… 185.199.108.133, 185.199.109.133, 185.199.110.133, …
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 92345678 (88M) [application/octet-stream]
Saving to: ‘prometheus-2.45.0.linux-amd64.tar.gz’
prometheus-2.45.0.linux-amd64.tar.gz 100%[=====================================================================================>] 88.06M 10.0MB/s in 8.8s
2026-04-09 12:00:09 (10.0 MB/s) – ‘prometheus-2.45.0.linux-amd64.tar.gz’ saved [92345678/92345678]
# 配置Prometheus监控MySQL
cat > /tdsql/app/prometheus/prometheus.yml << 'EOF' global: scrape_interval: 15s scrape_configs: - job_name: 'mysql' static_configs: - targets: ['192.168.1.100:9104', '192.168.1.101:9104'] EOF
# 配置文件创建成功,无输出
Part04-生产案例与实战讲解
4.1 主从架构高可用案例
某企业采用主从架构实现TDSQL的高可用性:
- 1个主节点负责写操作,2个从节点负责读操作
- 使用Keepalived实现VIP漂移,确保故障时自动切换
- 使用ProxySQL实现读写分离,提高系统并发处理能力
- 配置Prometheus监控,实时监控系统状态
- 定期进行故障切换测试,确保切换流程的可靠性
# 测试故障切换
systemctl stop mysqld
ping 192.168.1.200
mysql -u fgedu -p -h 192.168.1.200 -e “SELECT @@server_id;”
PING 192.168.1.200 (192.168.1.200) 56(84) bytes of data.
64 bytes from 192.168.1.200: icmp_seq=1 ttl=64 time=0.123 ms
64 bytes from 192.168.1.200: icmp_seq=2 ttl=64 time=0.123 ms
64 bytes from 192.168.1.200: icmp_seq=3 ttl=64 time=0.123 ms
Enter password:
+————-+
| @@server_id |
+————-+
| 2 |
+————-+
4.2 多主架构高可用案例
某企业采用多主架构实现TDSQL的高可用性:
- 3个节点都可以进行写操作,数据通过半同步复制机制同步
- 使用ProxySQL实现请求分发和故障检测
- 配置跨机房部署,提高容灾能力
- 使用Prometheus和Grafana监控系统状态
- 定期进行数据一致性检查,确保数据同步正常
# 检查多主复制状态
mysql -u fgedu -p -e “SHOW SLAVE STATUS\G”
Enter password:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 123456
Relay_Log_File: relay-bin.000001
Relay_Log_Pos: 123456
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 123456
Relay_Log_Space: 123456
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 12345678-1234-1234-1234-1234567890ab
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
4.3 集群架构高可用案例
某企业采用集群架构实现TDSQL的高可用性:
- 5个节点组成一个集群,共同提供服务
- 使用分布式共识协议确保数据一致性
- 配置自动故障检测和故障切换机制
- 使用负载均衡器分发请求
- 配置跨区域部署,提高容灾能力
# 检查集群状态
mysql -u fgedu -p -e “SHOW GLOBAL STATUS LIKE ‘%cluster%’;”
Enter password:
+——————————+———————-+
| Variable_name | Value |
+——————————+———————-+
| cluster_name | fgedu_cluster |
| cluster_type | multi-master |
| cluster_member_states | ONLINE,ONLINE,ONLINE,ONLINE,ONLINE |
| cluster_size | 5 |
+——————————+———————-+
Part05-风哥经验总结与分享
5.1 高可用性最佳实践
- 采用合适的高可用架构,根据业务需求选择主从架构、多主架构或集群架构
- 配置合理的复制策略,如半同步复制或同步复制,确保数据一致性
- 实现自动故障切换机制,减少人工干预,提高故障恢复速度
- 配置读写分离,提高系统的并发处理能力
- 采用跨机房或跨区域部署,提高容灾能力
- 配置实时监控和告警机制,及时发现和解决问题
- 定期进行故障切换测试,确保切换流程的可靠性
- 建立完善的高可用运维流程,包括故障处理、演练和总结
5.2 常见高可用问题与解决方案
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 复制延迟 | 网络延迟、主节点负载高、从节点性能不足 | 优化网络环境、增加主节点资源、优化从节点配置 |
| 故障切换失败 | Keepalived配置错误、网络问题、脚本执行失败 | 检查Keepalived配置、修复网络问题、测试脚本执行 |
| 数据不一致 | 复制中断、网络分区、人为操作失误 | 定期检查复制状态、配置监控告警、规范操作流程 |
| 性能下降 | 负载均衡配置不当、资源不足、SQL语句优化不足 | 优化负载均衡配置、增加资源、优化SQL语句 |
| 脑裂 | 网络分区、节点间通信故障 | 配置quorum机制、使用仲裁节点、优化网络架构 |
5.3 高可用工具与资源
- 故障切换工具: Keepalived, Pacemaker
- 读写分离工具: ProxySQL, MySQL Router
- 监控工具: Prometheus + Grafana, Zabbix
- 集群管理工具: MySQL Group Replication, PostgreSQL Patroni
- 云平台工具: TDSQL控制台、云负载均衡
- 官方资源: TDSQL官方文档、MySQL/PostgreSQL高可用指南
更多视频教程www.fgedu.net.cn,学习交流加群风哥QQ113257174。
风哥提示:高可用性是一个系统性工程,需要从架构设计、硬件配置、软件配置、监控等多个方面进行考虑。
更多学习教程公众号风哥教程itpux_com
from tdsql视频:www.itpux.com
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
