风哥教程参考DB2官方文档HADR、High Availability等内容,详细介绍HADR运维、HADR监控、HADR故障处理等。更多视频教程www.fgedu.net.cn
目录大纲
Part01-HADR运维概述
1.1 HADR架构
HADR(High Availability Disaster Recovery)架构:
- 主库:Primary数据库,处理读写操作
- 备库:Standby数据库,接收日志并应用
- 日志传输:主库向备库传输日志
- 日志应用:备库应用日志保持同步
1.2 运维内容
- HADR状态监控
- HADR性能优化
- HADR故障切换
- HADR维护操作
- HADR演练测试
Part02-HADR监控
2.1 HADR状态监控
db2pd -db FGEDB -hadr
# 查看HADR详细信息
db2pd -db FGEDB -hadr -detail
# 查看HADR配置
db2 get db cfg for FGEDB | grep HADR
# 查看HADR日志传输状态
db2pd -db FGEDB -hadr -logstream
# HADR监控脚本
#!/bin/bash
# hadr_monitor.sh
DBNAME=FGEDB
LOG_FILE=/db2/logs/hadr_monitor_$(date +%Y%m%d).log
echo “=== HADR Monitor ===” > $LOG_FILE
echo “Time: $(date)” >> $LOG_FILE
# 1. 检查HADR状态
echo “1. HADR Status:” >> $LOG_FILE
db2pd -db $DBNAME -hadr >> $LOG_FILE
# 2. 检查HADR角色
echo “2. HADR Role:” >> $LOG_FILE
db2pd -db $DBNAME -hadr | grep “HADR_ROLE” >> $LOG_FILE
# 3. 检查HADR状态
echo “3. HADR State:” >> $LOG_FILE
db2pd -db $DBNAME -hadr | grep “HADR_STATE” >> $LOG_FILE
# 4. 检查日志延迟
echo “4. Log Gap:” >> $LOG_FILE
db2pd -db $DBNAME -hadr | grep “LOG_GAP” >> $LOG_FILE
# 5. 检查同步模式
echo “5. Sync Mode:” >> $LOG_FILE
db2pd -db $DBNAME -hadr | grep “HADR_SYNCMODE” >> $LOG_FILE
echo “=== Monitor Complete ===” >> $LOG_FILE
2.2 HADR性能监控
db2pd -db FGEDB -hadr -perf
# 监控日志传输速率
db2pd -db FGEDB -hadr -logstream
# 监控网络延迟
ping -c 3
# HADR性能监控脚本
#!/bin/bash
# hadr_perf_monitor.sh
DBNAME=FGEDB
STANDBY_HOST=192.168.1.11
echo “=== HADR Performance Monitor ===”
echo “Time: $(date)”
# 1. 日志传输速率
echo “1. Log Transfer Rate:”
db2pd -db $DBNAME -hadr -logstream | grep “LOG_STREAM”
# 2. 网络延迟
echo “2. Network Latency:”
ping -c 3 $STANDBY_HOST
# 3. 日志延迟
echo “3. Log Gap:”
db2pd -db $DBNAME -hadr | grep “LOG_GAP”
# 4. 重放延迟
echo “4. Replay Delay:”
db2pd -db $DBNAME -hadr | grep “REPLAY_DELAY”
# 5. 缓冲池命中率
echo “5. Bufferpool Hit Ratio:”
db2pd -db $DBNAME -bufferpools | grep -A 5 “BP_NAME”
echo “=== Performance Monitor Complete ===”
Part03-HADR维护
3.1 HADR配置调整
UPDATE DATABASE CONFIGURATION FOR FGEDB USING HADR_SYNCMODE ASYNC IMMEDIATE;
# 修改HADR超时时间
UPDATE DATABASE CONFIGURATION FOR FGEDB USING HADR_TIMEOUT 120 IMMEDIATE;
# 修改HADR窗口大小
UPDATE DATABASE CONFIGURATION FOR FGEDB USING HADR_SPOKEN_LEN 64 IMMEDIATE;
# 重启HADR
db2stop
db2start
db2 start hadr on database FGEDB as primary
# HADR配置调整脚本
#!/bin/bash
# hadr_config_adjust.sh
DBNAME=FGEDB
SYNC_MODE=$1
if [ -z “$SYNC_MODE” ]; then
echo “Usage: $0
echo “Sync modes: SYNC, NEARSYNC, ASYNC”
exit 1
fi
echo “Adjusting HADR sync mode to: $SYNC_MODE”
# 1. 检查当前HADR状态
echo “1. Checking current HADR status…”
db2pd -db $DBNAME -hadr
# 2. 修改同步模式
echo “2. Updating sync mode…”
db2 “UPDATE DATABASE CONFIGURATION FOR $DBNAME USING HADR_SYNCMODE $SYNC_MODE IMMEDIATE”
# 3. 验证修改
echo “3. Verifying configuration…”
db2 get db cfg for $DBNAME | grep HADR_SYNCMODE
echo “Configuration adjusted successfully”
3.2 HADR切换操作
db2 takeover hadr on database FGEDB
# 强制切换
db2 takeover hadr on database FGEDB by force
# 切换后恢复原主库
db2 start hadr on database FGEDB as standby
# HADR切换脚本
#!/bin/bash
# hadr_takeover.sh
DBNAME=FGEDB
TAKEOVER_MODE=$1
if [ -z “$TAKEOVER_MODE” ]; then
TAKEOVER_MODE=”normal”
fi
echo “=== HADR Takeover ===”
echo “Time: $(date)”
echo “Mode: $TAKEOVER_MODE”
# 1. 检查HADR状态
echo “1. Checking HADR status…”
db2pd -db $DBNAME -hadr
# 2. 执行切换
echo “2. Performing takeover…”
if [ “$TAKEOVER_MODE” = “force” ]; then
db2 takeover hadr on database $DBNAME by force
else
db2 takeover hadr on database $DBNAME
fi
# 3. 验证切换结果
echo “3. Verifying takeover…”
db2pd -db $DBNAME -hadr
# 4. 测试连接
echo “4. Testing connection…”
db2 connect to $DBNAME
db2 “SELECT COUNT(*) FROM syscat.tables”
db2 connect reset
echo “=== Takeover Complete ===”
Part04-HADR故障处理
4.1 主库故障处理
db2 takeover hadr on database FGEDB by force
# 检查备库状态
db2pd -db FGEDB -hadr
# 重建原主库为备库
db2 start hadr on database FGEDB as standby
# 主库故障处理脚本
#!/bin/bash
# primary_failure_handling.sh
DBNAME=FGEDB
STANDBY_HOST=192.168.1.11
echo “=== Primary Failure Handling ===”
echo “Time: $(date)”
# 1. 检查主库状态
echo “1. Checking primary status…”
PRIMARY_STATUS=$(db2pd -db $DBNAME -hadr | grep “HADR_ROLE” | awk ‘{print $3}’)
if [ “$PRIMARY_STATUS” = “PRIMARY” ]; then
echo “Primary is still active”
exit 0
fi
# 2. 在备库执行强制切换
echo “2. Performing forced takeover on standby…”
ssh db2inst1@$STANDBY_HOST “db2 takeover hadr on database $DBNAME by force”
# 3. 验证切换结果
echo “3. Verifying takeover…”
ssh db2inst1@$STANDBY_HOST “db2pd -db $DBNAME -hadr”
# 4. 更新应用连接
echo “4. Updating application connections…”
echo “Please update application to connect to new primary: $STANDBY_HOST”
echo “=== Primary Failure Handling Complete ===”
4.2 备库故障处理
db2pd -db FGEDB -hadr
# 重启备库HADR
db2 start hadr on database FGEDB as standby
# 重建备库
db2 restore database FGEDB from /db2backup taken at
db2 rollforward database FGEDB_STANDBY to end of logs and complete
db2 start hadr on database FGEDB_STANDBY as standby
# 备库故障处理脚本
#!/bin/bash
# standby_failure_handling.sh
DBNAME=FGEDB
STANDBY_HOST=192.168.1.11
echo “=== Standby Failure Handling ===”
echo “Time: $(date)”
# 1. 检查备库状态
echo “1. Checking standby status…”
STANDBY_STATUS=$(ssh db2inst1@$STANDBY_HOST “db2pd -db $DBNAME -hadr” | grep “HADR_ROLE” | awk ‘{print $3}’)
if [ “$STANDBY_STATUS” = “STANDBY” ]; then
echo “Standby is active”
exit 0
fi
# 2. 尝试重启备库HADR
echo “2. Restarting standby HADR…”
ssh db2inst1@$STANDBY_HOST “db2 start hadr on database $DBNAME as standby”
# 3. 验证重启结果
echo “3. Verifying standby…”
ssh db2inst1@$STANDBY_HOST “db2pd -db $DBNAME -hadr”
echo “=== Standby Failure Handling Complete ===”
Part05-风哥经验总结与分享
5.1 HADR运维要点
- 定期监控HADR状态
- 定期检查日志延迟
- 定期演练HADR切换
- 制定故障处理预案
- 建立运维知识库
- 持续优化HADR配置
5.2 运维建议
| 运维项目 | 频率 | 责任人 |
|---|---|---|
| HADR状态检查 | 每天 | 运维人员 |
| 日志延迟检查 | 每小时 | 监控系统 |
| HADR切换演练 | 每季度 | 运维团队 |
5.3 运维要点
- 建立完善的监控体系
- 制定详细的运维流程
- 定期进行切换演练
- 建立运维知识库
- 持续优化HADR配置
- 培训运维人员
学习交流加群风哥微信: itpux-com
风哥Oracle/MySQL/PostgreSQL/Greenplum/DB2/Redis等数据库培训课程,10年一线实战经验,企业级培训,真正掌握数据库核心技术!
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
