1. 首页 > 国产数据库教程 > openGauss教程 > 正文

opengauss教程FG147-数据库高可用性最佳实践

目录大纲

Part01-基础概念与理论知识

1.1 高可用性概述

高可用性(High Availability,HA)是指系统在面对各种故障时能够持续提供服务的能力。对于数据库系统来说,高可用性主要体现在以下几个方面:

  • 故障检测:及时发现系统故障
  • 故障切换:在主节点故障时自动切换到备用节点
  • 数据同步:确保主备节点数据一致性
  • 恢复能力:在故障后快速恢复系统

风哥提示:高可用性设计需要考虑成本、复杂性和性能之间的平衡。

1.2 openGauss高可用架构

openGauss支持多种高可用架构,主要包括:

  • 主从复制架构:一主多从,主节点负责写操作,从节点负责读操作
  • 级联复制架构:主节点复制到从节点,从节点再复制到其他从节点
  • 多主架构:多个主节点,支持多写操作
  • 分布式架构:数据分布在多个节点,提高可用性和扩展性

Part02-生产环境规划与建议

2.1 高可用架构设计

在生产环境中,高可用架构设计需要考虑以下因素:

  • 节点数量:根据业务需求和预算确定节点数量
  • 网络拓扑:确保网络连接可靠,避免单点故障
  • 存储配置:使用RAID等技术提高存储可靠性
  • 地理位置:考虑跨机房、跨区域部署,提高容灾能力
  • 负载均衡:使用负载均衡技术分发请求

2.2 高可用策略制定

制定有效的高可用策略:

  • 故障切换策略:自动切换与手动切换的选择
  • 数据同步策略:同步复制与异步复制的选择
  • 监控策略:建立完善的监控体系
  • 备份策略:结合高可用与备份,确保数据安全
  • 恢复策略:制定详细的故障恢复流程

Part03-生产环境项目实施方案

3.1 主从复制配置

配置openGauss主从复制:

  • 准备环境:确保主从节点硬件配置相同,网络连接可靠
  • 配置主节点:修改postgresql.conf和pg_hba.conf文件
  • 风哥提示:

  • 配置从节点:使用pg_basebackup创建基础备份,配置recovery.conf文件
  • 启动复制:启动从节点,验证复制状态

3.2 故障切换配置

配置故障切换:

  • 使用pg_rewind:在主节点故障后,快速同步数据
  • 使用流复制:确保数据实时同步
  • 配置自动故障切换:使用第三方工具如Patroni或Keepalived

3.3 监控与告警

建立监控与告警体系:

  • 监控指标:数据库状态、复制延迟、系统资源使用情况等
  • 监控工具:Prometheus、Grafana、Zabbix等
  • 告警机制:设置合理的告警阈值,及时通知运维人员
  • 自动恢复:配置自动故障检测和恢复机制

Part04-生产案例与实战讲解

4.1 主从架构部署实战

案例:部署一主一从架构

# 场景:部署openGauss主从架构
# 操作:配置主节点和从节点学习交流加群风哥微信: itpux-com

# 1. 配置主节点
# 修改postgresql.conf
vim /opengauss/fgdata/postgresql.conf
# 添加以下配置
# wal_level = logical
# max_wal_senders = 10
# hot_standby = on
# max_replication_slots = 10

# 修改pg_hba.conf
vim /opengauss/fgdata/pg_hba.conf
# 添加以下配置
host replication fgedu 192.168.1.11/32 md5

# 重启主节点
gs_ctl restart -D /opengauss/fgdata

# 2. 配置从节点
# 在从节点上执行
pg_basebackup -h 192.168.1.10 -p 5432 -U fgedu -D /opengauss/fgdata -Fp -Xs -P

# 创建recovery.conf文件
vim /opengauss/fgdata/recovery.conf
# 添加以下配置
standby_mode = ‘on’
primary_conninfo = ‘host=192.168.1.10 port=5432 user=fgedu password=Password@123’
recovery_target_timeline = ‘latest’

# 启动从节点
gs_ctl start -D /opengauss/fgdata

# 3. 验证复制状态
# 在主节点上执行
psql -h 192.168.1.10 -p 5432 -U fgedu -d fgedudb -c “SELECT * FROM pg_stat_replication;

# 在从节点上执行
psql -h 192.168.1.11 -p 5432 -U fgedu -d fgedudb -c “SELECT pg_is_in_recovery();

学习交流加群风哥QQ113257174

# 主节点重启输出
waiting for server to shut down…. done
server stopped
waiting for server to start….2024-01-01 10:00:00.000 CST [12345] LOG: starting openGauss 3.0.0 (build 1234567) distributed by openGauss community
2024-01-01 10:00:00.001 CST [12345] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2024-01-01 10:00:00.002 CST [12345] LOG: listening on IPv6 address “::”, port 5432
2024-01-01 10:00:00.003 CST [12345] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2024-01-01 10:00:00.100 CST [12346] LOG: database system was shut down at 2024-01-01 09:59:59 CST
2024-01-01 10:00:00.105 CST [12345] LOG: database system is ready to accept connections
done
server started

# 从节点基础备份输出
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/1234567 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot “pg_basebackup_12345”
pg_basebackup: write-ahead log end point: 0/1234567
pg_basebackup: waiting for background process to finish streaming …
pg_basebackup: base backup completed

# 从节点启动输出
waiting for server to start….2024-01-01 10:05:00.000 CST [12345] LOG: starting openGauss 3.0.0 (build 1234567) distributed by openGauss community
2024-01-01 10:05:00.001 CST [12345] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2024-01-01 10:05:00.002 CST [12345] LOG: listening on IPv6 address “::”, port 5432
2024-01-01 10:05:00.003 CST [12345] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2024-01-01 10:05:00.004 CST [12345] LOG: entering standby mode
2024-01-01 10:05:00.005 CST [12345] LOG: redo starts at 0/1234567
2024-01-01 10:05:00.006 CST [12345] LOG: consistent recovery state reached at 0/1234567
2024-01-01 10:05:00.007 CST [12346] LOG: database system is ready to accept read only connections
done
server started

# 验证复制状态(主节点)
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
—–+———-+———+——————+————-+—————–+————-+—————+————–+——-+———-+———–+———–+————+———–+———–+————+—————+————
1234 | 16384 | fgedu | walreceiver | 192.168.1.11 | | 12345 | 2024-01-01 10:05:00 | | streaming | 0/1234567 | 0/1234567 | 0/1234567 | 0/1234567 | | | | 0 | async

# 验证从节点状态更多视频教程www.fgedu.net.cn
pg_is_in_recovery
——————-
t

4.2 故障切换实战

案例:主节点故障切换

# 场景:主节点故障,手动切换到从节点
# 操作:执行故障切换步骤

# 1. 检查主节点状态
ping 192.168.1.10

# 2. 在从节点上执行故障切换
# 停止从节点
gs_ctl stop -D /opengauss/fgdata

# 移除recovery.conf文件
rm /opengauss/fgdata/recovery.conf

# 创建recovery.signal文件
touch /opengauss/fgdata/recovery.signal

# 启动从节点为新的主节点
gs_ctl start -D /opengauss/fgdata

# 3. 验证新主节点状态
psql -h 192.168.1.11 -p 5432 -U fgedu -d fgedudb -c “SELECT pg_is_in_recovery();

# 4. 重新配置原主节点为新的从节点
# 在原主节点上执行
# 清理数据目录
rm -rf /opengauss/fgdata/*

# 从新主节点创建基础备份
pg_basebackup -h 192.168.1.11 -p 5432 -U fgedu -D /opengauss/fgdata -Fp -Xs -P

# 创建recovery.conf文件更多学习教程公众号风哥教程itpux_com
vim /opengauss/fgdata/recovery.conf
# 添加以下配置
standby_mode = ‘on’
primary_conninfo = ‘host=192.168.1.11 port=5432 user=fgedu password=Password@123’
recovery_target_timeline = ‘latest’

# 启动原主节点为从节点
gs_ctl start -D /opengauss/fgdata

# 检查主节点状态
ping: connect: Network is unreachable

# 从节点停止输出
waiting for server to shut down…. done
server stopped

# 从节点启动为新主节点输出
waiting for server to start….2024-01-01 10:10:00.000 CST [12345] LOG: starting openGauss 3.0.0 (build 1234567) distributed by openGauss community
2024-01-01 10:10:00.001 CST [12345] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2024-01-01 10:10:00.002 CST [12345] LOG: listening on IPv6 address “::”, port 5432
2024-01-01 10:10:00.003 CST [12345] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2024-01-01 10:10:00.100 CST [12346] LOG: database system was shut down at 2024-01-01 10:09:59 CST
2024-01-01 10:10:00.105 CST [12345] LOG: database system is ready to accept connections
done
server started

# 验证新主节点状态
pg_is_in_recoveryfrom DB视频:www.itpux.com
——————-
f

# 原主节点基础备份输出
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2345678 on timeline 2
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot “pg_basebackup_12345”
pg_basebackup: write-ahead log end point: 0/2345678
pg_basebackup: waiting for background process to finish streaming …
pg_basebackup: base backup completed

# 原主节点启动为从节点输出
waiting for server to start….2024-01-01 10:15:00.000 CST [12345] LOG: starting openGauss 3.0.0 (build 1234567) distributed by openGauss community
2024-01-01 10:15:00.001 CST [12345] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2024-01-01 10:15:00.002 CST [12345] LOG: listening on IPv6 address “::”, port 5432
2024-01-01 10:15:00.003 CST [12345] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2024-01-01 10:15:00.004 CST [12345] LOG: entering standby mode
2024-01-01 10:15:00.005 CST [12345] LOG: redo starts at 0/2345678
2024-01-01 10:15:00.006 CST [12345] LOG: consistent recovery state reached at 0/2345678
2024-01-01 10:15:00.007 CST [12346] LOG: database system is ready to accept read only connections
done
server started

4.3 高可用监控实战

案例:使用Prometheus监控复制状态

# 场景:配置Prometheus监控openGauss复制状态
# 操作:安装并配置Prometheus和PostgreSQL exporter

# 1. 安装PostgreSQL exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.13.0/postgres_exporter-0.13.0.linux-amd64.tar.gz
tar -xzf postgres_exporter-0.13.0.linux-amd64.tar.gz
cd postgres_exporter-0.13.0.linux-amd64

# 2. 创建监控用户
psql -h 192.168.1.10 -p 5432 -U fgedu -d fgedudb << EOF CREATE USER pg_exporter WITH PASSWORD 'Exporter@123';
GRANT CONNECT ON DATABASE fgedudb TO pg_exporter;
GRANT SELECT ON pg_stat_database TO pg_exporter;
GRANT SELECT ON pg_stat_replication TO pg_exporter;
EOF

# 3. 配置PostgreSQL exporter
cat > .env << EOF DATA_SOURCE_NAME=postgresql://pg_exporter:Exporter@123@localhost:5432/fgedudb?sslmode=disable EOF # 4. 启动PostgreSQL exporter ./postgres_exporter --web.listen-address=:9187 & # 5. 配置Prometheus cat > /etc/prometheus/prometheus.yml << EOF global: scrape_interval: 15s scrape_configs: - job_name: 'postgres' static_configs: - targets: ['192.168.1.10:9187', '192.168.1.11:9187'] EOF # 6. 启动Prometheus systemctl start prometheus # 7. 配置Grafana仪表板 # 导入PostgreSQL仪表板模板

# 安装PostgreSQL exporter输出
–2024-01-01 10:20:00– https://github.com/prometheus-community/postgres_exporter/releases/download/v0.13.0/postgres_exporter-0.13.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 7654321 (7.3M) [application/octet-stream]
Saving to: ‘postgres_exporter-0.13.0.linux-amd64.tar.gz’

postgres_exporter-0.13.0.linux-amd64.tar.gz 100%[=========================================================================>] 7.30M 2.50MB/s in 2.9s

2024-01-01 10:20:03 (2.50 MB/s) – ‘postgres_exporter-0.13.0.linux-amd64.tar.gz’ saved [7654321/7654321]

# 创建监控用户输出
CREATE ROLE
GRANT
GRANT
GRANT

# 启动PostgreSQL exporter输出
INFO[0000] Established connection to PostgreSQL
INFO[0000] Starting Server: :9187

# 启动Prometheus输出
Job for prometheus.service started.

Part05-风哥经验总结与分享

5.1 高可用性最佳实践总结

  • 架构设计:根据业务需求选择合适的高可用架构,如主从复制、多主架构等
  • 网络配置:确保网络连接可靠,使用冗余网络
  • 存储配置:使用RAID技术提高存储可靠性,考虑使用SAN或NAS
  • 数据同步:根据业务需求选择同步或异步复制
  • 故障切换:配置自动故障切换,减少人工干预
  • 监控告警:建立完善的监控体系,及时发现和处理故障
  • 备份策略:结合高可用与备份,确保数据安全
  • 定期演练:定期进行故障切换演练,确保系统在实际故障时能够正常切换
  • 文档完善:制定详细的高可用方案文档和故障处理流程
  • 人员培训:对运维人员进行高可用相关培训,提高故障处理能力

5.2 常见高可用问题与解决方案

问题1:复制延迟

解决方案:

  • 检查网络连接,确保网络带宽足够
  • 调整wal_sender_delay和wal_receiver_status_interval参数
  • 使用更快的存储设备,如SSD
  • 减少主节点的写入负载

问题2:故障切换失败

解决方案:

  • 检查故障切换配置,确保配置正确
  • 验证网络连接,确保从节点能够访问主节点
  • 检查从节点的状态,确保从节点正常运行
  • 定期进行故障切换演练,发现和解决问题

问题3:脑裂

解决方案:

  • 使用仲裁机制,如Quorum
  • 配置网络心跳检测,及时发现网络故障
  • 使用STONITH(Shoot The Other Node In The Head)技术
  • 制定明确的故障处理流程,避免人工操作导致脑裂

问题4:数据一致性

解决方案:

  • 使用同步复制,确保数据实时同步
  • 定期验证主从节点数据一致性
  • 使用pg_rewind工具修复数据不一致问题
  • 制定数据一致性检查和修复流程

风哥提示:高可用性设计需要考虑成本、复杂性和性能之间的平衡

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息