yashandb教程FG177-YashanDB灾备自动化
本文档风哥主要介绍YashanDB灾备自动化的相关知识,包括灾备自动化的概念、类型、优势、规划策略、配置方法、监控管理等内容,风哥教程参考YashanDB官方文档灾备管理相关内容编写,适合DBA人员在学习和生产环境中使用。更多视频教程www.fgedu.net.cn
Part01-基础概念与理论知识
1.1 灾备自动化概念
灾备自动化是指通过自动化工具和脚本,实现数据库灾备的配置、监控、切换和恢复等操作的过程。灾备自动化可以提高灾备系统的可靠性和效率,减少人工干预,确保在灾难发生时能够快速、准确地进行故障转移和恢复。
- 自动化配置:自动完成灾备系统的配置
- 实时监控:实时监控灾备系统的状态
- 自动切换:在主库故障时自动切换到备库
- 自动恢复:在故障解决后自动恢复主备关系
- 日志管理:自动管理灾备相关的日志
1.2 灾备类型
灾备主要包括以下类型:
- 本地灾备:在同一数据中心内建立备库
- 异地灾备:在不同数据中心建立备库
- 多活灾备:多个数据库实例同时提供服务
- 混合灾备:结合多种灾备方式
1.3 YashanDB灾备自动化优势
YashanDB灾备自动化的主要优势包括:
- 提高可靠性:减少人工干预,降低人为错误
- 快速响应:在灾难发生时快速切换和恢复
- 降低成本:减少人工运维成本
- 简化管理:统一管理灾备系统
- 提高可用性:确保系统的持续可用
Part02-生产环境规划与建议
2.1 YashanDB灾备自动化规划
YashanDB灾备自动化规划要点:
– RTO(恢复时间目标):评估可接受的恢复时间
– RPO(恢复点目标):评估可接受的数据丢失量
– 灾备级别:确定灾备的级别(本地、异地、多活)
– 业务需求:分析业务对可用性的要求
# 架构设计
– 灾备架构:选择合适的灾备架构
– 网络设计:规划网络连接和带宽
– 存储设计:规划存储配置和同步
– 安全设计:设计安全的灾备方案
# 资源规划
– 硬件资源:配置足够的服务器资源
– 网络资源:确保网络带宽和延迟
– 存储资源:规划足够的存储容量
– 人力资源:安排专业的运维人员
# 自动化规划
– 自动化工具:选择合适的自动化工具
– 脚本开发:开发自动化脚本
– 监控系统:配置监控和告警
– 测试计划:制定测试和演练计划
2.2 YashanDB灾备自动化策略
YashanDB灾备自动化策略建议:
– 主从复制:使用主从复制实现灾备
– 多活架构:实现多数据中心多活
– 混合灾备:结合本地和异地灾备
– 云灾备:利用云平台实现灾备
# 切换策略
– 自动切换:在主库故障时自动切换
– 手动切换:由运维人员手动触发切换
– 计划切换:定期进行计划内切换演练
– 故障切换:在主库故障时进行切换
# 恢复策略
– 自动恢复:在故障解决后自动恢复主备关系
– 手动恢复:由运维人员手动恢复
– 增量恢复:使用增量备份进行恢复
– 全量恢复:使用全量备份进行恢复
# 监控策略
– 实时监控:实时监控灾备系统状态
– 告警机制:配置合理的告警阈值
– 日志管理:集中管理灾备相关日志
– 性能监控:监控灾备系统性能
2.3 YashanDB灾备自动化考虑
YashanDB灾备自动化考虑:
- 成本控制:平衡灾备成本和业务需求
- 性能影响:评估灾备对主库性能的影响
- 网络带宽:确保足够的网络带宽
- 安全合规:确保灾备符合安全合规要求
- 测试演练:定期进行灾备测试和演练
Part03-生产环境项目实施方案
3.1 YashanDB灾备自动化配置
3.1.1 YashanDB主从复制配置
# 编辑主库参数文件
vi /yashandb/fgdata/fgedudb/pfile.ora
# 添加以下参数
log_archive_dest_2 = ‘service=standby_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=standby_db’
log_archive_config = ‘dg_config=(primary_db,standby_db)’
db_unique_name = ‘primary_db’
fal_server = ‘standby_db’
fal_client = ‘primary_db’
— 步骤2:配置备库
# 编辑备库参数文件
vi /yashandb/fgdata/standbydb/pfile.ora
# 添加以下参数
log_archive_dest_2 = ‘service=primary_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=primary_db’
log_archive_config = ‘dg_config=(primary_db,standby_db)’
db_unique_name = ‘standby_db’
fal_server = ‘primary_db’
fal_client = ‘standby_db’
standby_file_management = ‘AUTO’
— 步骤3:创建密码文件
# 在主库上创建密码文件
orapwd file=/yashandb/app/dbs/orapwprimary_db password=sys123 force=y
# 复制密码文件到备库
scp /yashandb/app/dbs/orapwprimary_db oracle@standby_host:/yashandb/app/dbs/orapwstandby_db
— 步骤4:配置网络
# 在主库和备库上配置tnsnames.ora
vi /yashandb/app/network/admin/tnsnames.ora
PRIMARY_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = primary_host)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = primary_db)
)
)
STANDBY_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = standby_host)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = standby_db)
)
)
— 步骤5:启动备库到挂载状态
# 在备库上执行
startup mount;
— 步骤6:配置主从复制
# 在主库上执行
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;
# 在备库上执行
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
— 步骤7:验证主从复制状态
# 在主库上执行
SELECT status FROM v$archive_dest WHERE dest_id=2;
# 在备库上执行
SELECT status FROM v$managed_standby;
3.1.2 YashanDB灾备自动化脚本
# vi /yashandb/app/scripts/dr_monitor.sh
#!/bin/bash
# dr_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# 配置信息
PRIMARY_HOST=”primary_host”
STANDBY_HOST=”standby_host”
PRIMARY_SID=”primary_db”
STANDBY_SID=”standby_db”
USERNAME=”sys”
PASSWORD=”sys123″
# 检查主库状态
PRIMARY_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${PRIMARY_HOST}:1521/${PRIMARY_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
)
# 检查备库状态
STANDBY_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_HOST}:1521/${STANDBY_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
)
# 检查复制状态
REPLICATION_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_HOST}:1521/${STANDBY_SID} as sysdba << EOF
SELECT status FROM v\$managed_standby;
EOF
)
# 输出监控结果
echo "Primary Database Status: ${PRIMARY_STATUS}"
echo "Standby Database Status: ${STANDBY_STATUS}"
echo "Replication Status: ${REPLICATION_STATUS}"
# 检查主库是否故障
if [[ "${PRIMARY_STATUS}" != *"OPEN"* ]]; then
echo "Primary database is down, initiating failover..."
# 执行故障转移脚本
/yashandb/app/scripts/dr_failover.sh
fi
-- 步骤2:创建故障转移脚本
# vi /yashandb/app/scripts/dr_failover.sh
#!/bin/bash
# dr_failover.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# 配置信息
STANDBY_HOST="standby_host"
STANDBY_SID="standby_db"
USERNAME="sys"
PASSWORD="sys123"
# 执行故障转移
ssh oracle@${STANDBY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_SID} as sysdba << EOF
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE OPEN;
EOF
"
# 输出结果
echo "Failover completed successfully"
-- 步骤3:创建恢复脚本
# vi /yashandb/app/scripts/dr_recover.sh
#!/bin/bash
# dr_recover.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# 配置信息
PRIMARY_HOST="primary_host"
STANDBY_HOST="standby_host"
PRIMARY_SID="primary_db"
STANDBY_SID="standby_db"
USERNAME="sys"
PASSWORD="sys123"
# 重启原主库到挂载状态
ssh oracle@${PRIMARY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${PRIMARY_SID} as sysdba << EOF
STARTUP MOUNT;
ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
EOF
"
# 输出结果
echo "Recovery completed successfully"
-- 步骤4:设置脚本执行权限
chmod +x /yashandb/app/scripts/dr_monitor.sh
chmod +x /yashandb/app/scripts/dr_failover.sh
chmod +x /yashandb/app/scripts/dr_recover.sh
-- 步骤5:创建定时任务
crontab -e
# 每5分钟执行一次监控脚本
*/5 * * * * /yashandb/app/scripts/dr_monitor.sh >> /yashandb/app/scripts/dr_monitor.log 2>&1
3.2 YashanDB灾备自动化监控
3.2.1 YashanDB灾备监控配置
CREATE TABLE fgedu.dr_monitor (
monitor_id NUMBER(10) PRIMARY KEY,
check_time TIMESTAMP DEFAULT SYSTIMESTAMP,
primary_status VARCHAR2(50),
standby_status VARCHAR2(50),
replication_status VARCHAR2(50),
lag_seconds NUMBER,
error_message VARCHAR2(500)
);
— 步骤2:创建灾备监控存储过程
CREATE OR REPLACE PROCEDURE fgedu.monitor_dr AS
v_primary_status VARCHAR2(50);
v_standby_status VARCHAR2(50);
v_replication_status VARCHAR2(50);
v_lag_seconds NUMBER;
v_error_message VARCHAR2(500);
v_monitor_id NUMBER;
BEGIN
— 获取主库状态
SELECT status INTO v_primary_status
FROM v$instance;
— 获取备库状态
SELECT status INTO v_standby_status
FROM v$instance@standby_db;
— 获取复制状态
SELECT status INTO v_replication_status
FROM v$managed_standby@standby_db
WHERE process = ‘MRP0’;
— 获取延迟时间
SELECT NVL(MAX(sequence#) – NVL(MAX(sequence#), 0), 0) INTO v_lag_seconds
FROM (
SELECT sequence# FROM v$archived_log WHERE applied = ‘YES’
UNION ALL
SELECT sequence# FROM v$log_history
);
— 生成监控ID
SELECT fgedu.dr_monitor_seq.NEXTVAL INTO v_monitor_id FROM DUAL;
— 插入监控记录
INSERT INTO fgedu.dr_monitor (
monitor_id, check_time, primary_status, standby_status, replication_status, lag_seconds, error_message
) VALUES (
v_monitor_id, SYSTIMESTAMP, v_primary_status, v_standby_status, v_replication_status, v_lag_seconds, v_error_message
);
COMMIT;
EXCEPTION
WHEN OTHERS THEN
v_error_message := SQLERRM;
INSERT INTO fgedu.dr_monitor (
monitor_id, check_time, primary_status, standby_status, replication_status, lag_seconds, error_message
) VALUES (
fgedu.dr_monitor_seq.NEXTVAL, SYSTIMESTAMP, ‘ERROR’, ‘ERROR’, ‘ERROR’, NULL, v_error_message
);
COMMIT;
END;
/
— 步骤3:创建灾备监控视图
CREATE OR REPLACE VIEW fgedu.v_dr_monitor AS
SELECT
monitor_id,
check_time,
primary_status,
standby_status,
replication_status,
lag_seconds,
error_message
FROM fgedu.dr_monitor
ORDER BY check_time DESC;
— 步骤4:创建定时任务
BEGIN
DBMS_SCHEDULER.CREATE_JOB(
job_name => ‘FGEDU.MONITOR_DR’,
job_type => ‘PLSQL_BLOCK’,
job_action => ‘BEGIN fgedu.monitor_dr; END;’,
start_date => SYSTIMESTAMP,
repeat_interval => ‘FREQ=MINUTELY; INTERVAL=5;’,
enabled => TRUE,
comments => ‘Monitor disaster recovery status every 5 minutes’
);
END;
/
— 步骤5:查看监控结果
SELECT * FROM fgedu.v_dr_monitor;
— 输出结果
MONITOR_ID CHECK_TIME PRIMARY_STATUS STANDBY_STATUS REPLICATION_STATUS LAG_SECONDS ERROR_MESSAGE
———- ——————- ————– ————– —————— ———– ————–
1 2025-01-20 10:00:00 OPEN MOUNTED APPLYING_LOG 0
2 2025-01-20 09:55:00 OPEN MOUNTED APPLYING_LOG 0
3 2025-01-20 09:50:00 OPEN MOUNTED APPLYING_LOG 0
3.3 YashanDB灾备自动化测试
3.3.1 YashanDB灾备测试脚本
# vi /yashandb/app/scripts/dr_test.sh
#!/bin/bash
# dr_test.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# 配置信息
PRIMARY_HOST=”primary_host”
STANDBY_HOST=”standby_host”
PRIMARY_SID=”primary_db”
STANDBY_SID=”standby_db”
USERNAME=”fgedu”
PASSWORD=”fgedu”
# 步骤1:在主库上创建测试表
ssh oracle@${PRIMARY_HOST} “sqlplus -S ${USERNAME}/${PASSWORD}@${PRIMARY_SID} << EOF
CREATE TABLE fgedu.dr_test (
id NUMBER PRIMARY KEY,
test_data VARCHAR2(100),
create_time TIMESTAMP DEFAULT SYSTIMESTAMP
);
INSERT INTO fgedu.dr_test VALUES (1, 'Test data for DR', SYSTIMESTAMP);
COMMIT;
EOF
"
# 步骤2:等待复制完成
sleep 10
# 步骤3:在备库上验证数据
STANDBY_DATA=$(ssh oracle@${STANDBY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_SID} << EOF
SELECT test_data FROM fgedu.dr_test WHERE id = 1;
EOF
")
echo "Standby database data: ${STANDBY_DATA}"
# 步骤4:模拟主库故障
ssh oracle@${PRIMARY_HOST} "sqlplus -S sys/sys123@${PRIMARY_SID} as sysdba << EOF
SHUTDOWN ABORT;
EOF
"
# 步骤5:执行故障转移
/yashandb/app/scripts/dr_failover.sh
# 步骤6:验证备库成为主库
NEW_PRIMARY_STATUS=$(ssh oracle@${STANDBY_HOST} "sqlplus -S sys/sys123@${STANDBY_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
")
echo "New primary database status: ${NEW_PRIMARY_STATUS}"
# 步骤7:在新主库上插入数据
ssh oracle@${STANDBY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_SID} << EOF
INSERT INTO fgedu.dr_test VALUES (2, 'Test data after failover', SYSTIMESTAMP);
COMMIT;
EOF
"
# 步骤8:恢复原主库
ssh oracle@${PRIMARY_HOST} "sqlplus -S sys/sys123@${PRIMARY_SID} as sysdba << EOF
STARTUP MOUNT;
ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
EOF
"
# 步骤9:验证复制恢复
ssh oracle@${PRIMARY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${PRIMARY_SID} << EOF
SELECT test_data FROM fgedu.dr_test WHERE id = 2;
EOF
"
# 输出结果
echo "DR test completed successfully"
-- 步骤2:设置脚本执行权限
chmod +x /yashandb/app/scripts/dr_test.sh
-- 步骤3:执行测试
/yashandb/app/scripts/dr_test.sh
-- 输出结果
Standby database data: Test data for DR
New primary database status: OPEN
DR test completed successfully
Part04-生产案例与实战讲解
4.1 YashanDB主从复制灾备自动化实战
案例背景:某企业需要实现YashanDB主从复制灾备自动化,确保在主库故障时能够自动切换到备库。
# 主库服务器:192.168.1.100
# 备库服务器:192.168.1.101
— 步骤2:配置主库
# 编辑主库参数文件
vi /yashandb/fgdata/fgedudb/pfile.ora
# 添加以下参数
log_archive_dest_2 = ‘service=standby_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=standby_db’
log_archive_config = ‘dg_config=(primary_db,standby_db)’
db_unique_name = ‘primary_db’
fal_server = ‘standby_db’
fal_client = ‘primary_db’
— 步骤3:配置备库
# 编辑备库参数文件
vi /yashandb/fgdata/standbydb/pfile.ora
# 添加以下参数
log_archive_dest_2 = ‘service=primary_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=primary_db’
log_archive_config = ‘dg_config=(primary_db,standby_db)’
db_unique_name = ‘standby_db’
fal_server = ‘primary_db’
fal_client = ‘standby_db’
standby_file_management = ‘AUTO’
— 步骤4:创建密码文件
# 在主库上创建密码文件
orapwd file=/yashandb/app/dbs/orapwprimary_db password=sys123 force=y
# 复制密码文件到备库
scp /yashandb/app/dbs/orapwprimary_db oracle@192.168.1.101:/yashandb/app/dbs/orapwstandby_db
— 步骤5:配置网络
# 在主库和备库上配置tnsnames.ora
vi /yashandb/app/network/admin/tnsnames.ora
PRIMARY_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.100)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = primary_db)
)
)
STANDBY_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.101)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = standby_db)
)
)
— 步骤6:启动备库到挂载状态
# 在备库上执行
startup mount;
— 步骤7:配置主从复制
# 在主库上执行
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;
# 在备库上执行
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
— 步骤8:创建灾备自动化脚本
# 创建监控脚本
vi /yashandb/app/scripts/dr_monitor.sh
#!/bin/bash
# dr_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
PRIMARY_HOST=”192.168.1.100″
STANDBY_HOST=”192.168.1.101″
PRIMARY_SID=”primary_db”
STANDBY_SID=”standby_db”
USERNAME=”sys”
PASSWORD=”sys123″
PRIMARY_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${PRIMARY_HOST}:1521/${PRIMARY_SID} as sysdba << EOF SELECT status FROM v\$instance; EOF ) if [[ "${PRIMARY_STATUS}" != *"OPEN"* ]]; then echo "Primary database is down, initiating failover..." ssh oracle@${STANDBY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_SID} as sysdba << EOF ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH; ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY; ALTER DATABASE OPEN; EOF " echo "Failover completed successfully" fi # 创建故障转移脚本 vi /yashandb/app/scripts/dr_failover.sh #!/bin/bash # dr_failover.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: `http://www.fgedu.net.cn` STANDBY_HOST="192.168.1.101" STANDBY_SID="standby_db" USERNAME="sys" PASSWORD="sys123" ssh oracle@${STANDBY_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${STANDBY_SID} as sysdba << EOF ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH; ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY; ALTER DATABASE OPEN; EOF " echo "Failover completed successfully" -- 步骤9:设置脚本执行权限 chmod +x /yashandb/app/scripts/dr_monitor.sh chmod +x /yashandb/app/scripts/dr_failover.sh -- 步骤10:创建定时任务 crontab -e # 每5分钟执行一次监控脚本 */5 * * * * /yashandb/app/scripts/dr_monitor.sh >> /yashandb/app/scripts/dr_monitor.log 2>&1
— 步骤11:测试灾备自动化
# 模拟主库故障
ssh oracle@192.168.1.100 “sqlplus -S sys/sys123@primary_db as sysdba << EOF
SHUTDOWN ABORT;
EOF
"
# 查看监控日志
cat /yashandb/app/scripts/dr_monitor.log
-- 输出结果
Primary database is down, initiating failover...
Failover completed successfully
-- 验证备库成为主库
ssh oracle@192.168.1.101 "sqlplus -S sys/sys123@standby_db as sysdba << EOF
SELECT status FROM v\$instance;
EOF
"
-- 输出结果
STATUS
------------
OPEN
4.2 YashanDB多活灾备自动化实战
案例背景:某企业需要实现YashanDB多活灾备自动化,确保多个数据中心的数据库实例同时提供服务。
# 数据中心1:192.168.1.100(主库)
# 数据中心2:192.168.2.100(备库)
— 步骤2:配置主库
# 编辑主库参数文件
vi /yashandb/fgdata/fgedudb/pfile.ora
# 添加以下参数
db_unique_name = ‘dc1_db’
log_archive_config = ‘dg_config=(dc1_db, dc2_db)’
log_archive_dest_2 = ‘service=dc2_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=dc2_db’
fal_server = ‘dc2_db’
fal_client = ‘dc1_db’
— 步骤3:配置备库
# 编辑备库参数文件
vi /yashandb/fgdata/fgedudb/pfile.ora
# 添加以下参数
db_unique_name = ‘dc2_db’
log_archive_config = ‘dg_config=(dc1_db, dc2_db)’
log_archive_dest_2 = ‘service=dc1_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=dc1_db’
fal_server = ‘dc1_db’
fal_client = ‘dc2_db’
standby_file_management = ‘AUTO’
— 步骤4:配置网络
# 在两个数据中心配置tnsnames.ora
vi /yashandb/app/network/admin/tnsnames.ora
DC1_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.100)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = dc1_db)
)
)
DC2_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.100)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = dc2_db)
)
)
— 步骤5:启动备库到挂载状态
# 在备库上执行
startup mount;
— 步骤6:配置主从复制
# 在主库上执行
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;
# 在备库上执行
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
— 步骤7:配置多活
# 在备库上启用只读模式
ALTER DATABASE OPEN READ ONLY;
— 步骤8:创建多活监控脚本
vi /yashandb/app/scripts/multi_active_monitor.sh
#!/bin/bash
# multi_active_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
DC1_HOST=”192.168.1.100″
DC2_HOST=”192.168.2.100″
DC1_SID=”dc1_db”
DC2_SID=”dc2_db”
USERNAME=”sys”
PASSWORD=”sys123″
# 检查数据中心1状态
DC1_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${DC1_HOST}:1521/${DC1_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
)
# 检查数据中心2状态
DC2_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${DC2_HOST}:1521/${DC2_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
)
# 检查复制状态
REPLICATION_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${DC2_HOST}:1521/${DC2_SID} as sysdba << EOF
SELECT status FROM v\$managed_standby;
EOF
)
# 输出监控结果
echo "Data Center 1 Status: ${DC1_STATUS}"
echo "Data Center 2 Status: ${DC2_STATUS}"
echo "Replication Status: ${REPLICATION_STATUS}"
# 检查数据中心1是否故障
if [[ "${DC1_STATUS}" != *"OPEN"* ]]; then
echo "Data Center 1 is down, promoting Data Center 2 to primary..."
ssh oracle@${DC2_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${DC2_SID} as sysdba << EOF
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE OPEN;
EOF
"
echo "Data Center 2 promoted to primary successfully"
fi
-- 步骤9:设置脚本执行权限
chmod +x /yashandb/app/scripts/multi_active_monitor.sh
-- 步骤10:创建定时任务
crontab -e
# 每5分钟执行一次监控脚本
*/5 * * * * /yashandb/app/scripts/multi_active_monitor.sh >> /yashandb/app/scripts/multi_active_monitor.log 2>&1
— 步骤11:测试多活灾备
# 模拟数据中心1故障
ssh oracle@192.168.1.100 “sqlplus -S sys/sys123@dc1_db as sysdba << EOF
SHUTDOWN ABORT;
EOF
"
# 查看监控日志
cat /yashandb/app/scripts/multi_active_monitor.log
-- 输出结果
Data Center 1 is down, promoting Data Center 2 to primary...
Data Center 2 promoted to primary successfully
-- 验证数据中心2成为主库
ssh oracle@192.168.2.100 "sqlplus -S sys/sys123@dc2_db as sysdba << EOF
SELECT status FROM v\$instance;
EOF
"
-- 输出结果
STATUS
------------
OPEN
4.3 YashanDB跨区域灾备自动化实战
案例背景:某企业需要实现YashanDB跨区域灾备自动化,确保在区域故障时能够快速切换到异地备库。
# 北京区域:10.0.1.100(主库)
# 上海区域:10.0.2.100(备库)
— 步骤2:配置主库
# 编辑主库参数文件
vi /yashandb/fgdata/fgedudb/pfile.ora
# 添加以下参数
db_unique_name = ‘beijing_db’
log_archive_config = ‘dg_config=(beijing_db, shanghai_db)’
log_archive_dest_2 = ‘service=shanghai_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=shanghai_db’
fal_server = ‘shanghai_db’
fal_client = ‘beijing_db’
— 步骤3:配置备库
# 编辑备库参数文件
vi /yashandb/fgdata/fgedudb/pfile.ora
# 添加以下参数
db_unique_name = ‘shanghai_db’
log_archive_config = ‘dg_config=(beijing_db, shanghai_db)’
log_archive_dest_2 = ‘service=beijing_db lgwr async valid_for=(all_logfiles,all_roles) db_unique_name=beijing_db’
fal_server = ‘beijing_db’
fal_client = ‘shanghai_db’
standby_file_management = ‘AUTO’
— 步骤4:配置网络
# 在两个区域配置tnsnames.ora
vi /yashandb/app/network/admin/tnsnames.ora
BEIJING_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.100)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = beijing_db)
)
)
SHANGHAI_DB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.2.100)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = shanghai_db)
)
)
— 步骤5:启动备库到挂载状态
# 在备库上执行
startup mount;
— 步骤6:配置主从复制
# 在主库上执行
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;
# 在备库上执行
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
— 步骤7:创建跨区域灾备监控脚本
vi /yashandb/app/scripts/cross_region_dr_monitor.sh
#!/bin/bash
# cross_region_dr_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
BEIJING_HOST=”10.0.1.100″
SHANGHAI_HOST=”10.0.2.100″
BEIJING_SID=”beijing_db”
SHANGHAI_SID=”shanghai_db”
USERNAME=”sys”
PASSWORD=”sys123″
# 检查北京区域状态
BEIJING_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${BEIJING_HOST}:1521/${BEIJING_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
)
# 检查上海区域状态
SHANGHAI_STATUS=$(sqlplus -S ${USERNAME}/${PASSWORD}@${SHANGHAI_HOST}:1521/${SHANGHAI_SID} as sysdba << EOF
SELECT status FROM v\$instance;
EOF
)
# 输出监控结果
echo "Beijing Region Status: ${BEIJING_STATUS}"
echo "Shanghai Region Status: ${SHANGHAI_STATUS}"
# 检查北京区域是否故障
if [[ "${BEIJING_STATUS}" != *"OPEN"* ]]; then
echo "Beijing region is down, promoting Shanghai region to primary..."
ssh oracle@${SHANGHAI_HOST} "sqlplus -S ${USERNAME}/${PASSWORD}@${SHANGHAI_SID} as sysdba << EOF
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE OPEN;
EOF
"
echo "Shanghai region promoted to primary successfully"
fi
-- 步骤8:设置脚本执行权限
chmod +x /yashandb/app/scripts/cross_region_dr_monitor.sh
-- 步骤9:创建定时任务
crontab -e
# 每5分钟执行一次监控脚本
*/5 * * * * /yashandb/app/scripts/cross_region_dr_monitor.sh >> /yashandb/app/scripts/cross_region_dr_monitor.log 2>&1
— 步骤10:测试跨区域灾备
# 模拟北京区域故障
ssh oracle@10.0.1.100 “sqlplus -S sys/sys123@beijing_db as sysdba << EOF
SHUTDOWN ABORT;
EOF
"
# 查看监控日志
cat /yashandb/app/scripts/cross_region_dr_monitor.log
-- 输出结果
Beijing region is down, promoting Shanghai region to primary...
Shanghai region promoted to primary successfully
-- 验证上海区域成为主库
ssh oracle@10.0.2.100 "sqlplus -S sys/sys123@shanghai_db as sysdba << EOF
SELECT status FROM v\$instance;
EOF
"
-- 输出结果
STATUS
------------
OPEN
Part05-风哥经验总结与分享
5.1 YashanDB灾备自动化最佳实践
YashanDB灾备自动化最佳实践:
- 合理规划灾备架构:根据业务需求选择合适的灾备架构
- 自动化配置:使用脚本自动化灾备配置和管理
- 实时监控:建立完善的灾备监控机制
- 定期测试:定期进行灾备测试和演练
- 文档记录:详细记录灾备配置和流程
- 人员培训:对运维人员进行灾备培训
- 持续优化:根据实际情况不断优化灾备方案
- 应急预案:制定详细的灾备应急预案
5.2 YashanDB灾备自动化检查清单
– [ ] 灾备架构是否合理
– [ ] 自动化脚本是否完善
– [ ] 监控机制是否建立
– [ ] 测试演练是否定期进行
– [ ] 文档记录是否完整
– [ ] 人员培训是否到位
– [ ] 应急预案是否制定
– [ ] 性能影响是否评估
– [ ] 安全措施是否实施
– [ ] 成本控制是否合理
# 灾备自动化步骤
1. 需求分析和架构设计
2. 环境准备和配置
3. 自动化脚本开发
4. 监控系统配置
5. 测试和演练
6. 文档编写和培训
7. 上线和维护
8. 定期评估和优化
# 灾备自动化监控要点
– 主库和备库状态
– 复制状态和延迟
– 网络连接状态
– 存储使用情况
– 系统性能指标
– 告警和错误信息
5.3 YashanDB灾备自动化常见问题处理
YashanDB灾备自动化常见问题及处理方法:
– 现象:备库复制延迟增加
– 处理:检查网络带宽、主库负载和备库性能
# 常见问题2:切换失败
– 现象:故障转移时切换失败
– 处理:检查备库状态、复制状态和网络连接
# 常见问题3:监控失效
– 现象:监控系统无法获取灾备状态
– 处理:检查监控配置、网络连接和权限设置
# 常见问题4:脚本执行失败
– 现象:自动化脚本执行失败
– 处理:检查脚本权限、路径和配置
# 常见问题5:密码文件不同步
– 现象:主备库密码文件不同步
– 处理:重新创建和复制密码文件
# 常见问题6:网络连接问题
– 现象:主备库之间网络连接中断
– 处理:检查网络配置、防火墙规则和路由
# 常见问题7:存储空间不足
– 现象:备库存储空间不足
– 处理:增加存储空间或清理空间
# 常见问题8:权限不足
– 现象:脚本执行权限不足
– 处理:调整脚本权限和执行用户
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
