目录大纲
Part01-基础概念与理论知识
1.1 备份策略概述
1.2 备份类型分类
1.3 恢复策略概述
Part02-生产环境规划与建议
2.1 备份计划规划
2.2 备份存储规划
2.3 恢复演练规划
Part03-生产环境项目实施方案
3.1 NameNode元数据备份
3.2 HDFS数据备份
3.3 快照备份实施
3.4 灾难恢复实施
Part04-生产案例与实战讲解
4.1 元数据恢复案例
4.2 数据误删恢复案例
4.3 灾难恢复演练案例
Part05-风哥经验总结与分享
5.1 备份恢复最佳实践
5.2 灾备经验总结
Part01-基础概念与理论知识
1.1 备份策略概述
Hadoop集群备份是保障数据安全的重要手段。更多视频教程www.fgedu.net.cn 备份策略需要根据数据重要性和恢复时间要求制定。
1.2 备份类型分类
Hadoop备份分为多种类型。学习交流加群风哥微信: itpux-com
– 元数据备份:NameNode元数据(fsimage、edits)
– 数据备份:HDFS文件数据
– 快照备份:HDFS快照
– 配置备份:Hadoop配置文件
1.3 恢复策略概述
恢复策略需要根据故障类型制定。from bigdata视频:www.itpux.com
– 确定恢复范围
– 选择恢复方法
– 验证恢复结果
– 记录恢复过程
Part02-生产环境规划与建议
2.1 备份计划规划
备份计划需要根据业务需求制定。更多学习教程公众号风哥教程itpux_com
– 元数据备份:每小时一次
– 增量数据备份:每天一次
– 全量数据备份:每周一次
– 配置备份:每次变更后
2.2 备份存储规划
备份存储需要考虑容量和安全性。学习交流加群风哥QQ113257174
df -h /backup
# 查看备份数据量
du -sh /backup/hadoop/*
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 50T 20T 30T 40% /backup
# 备份数据量
5.0G /backup/hadoop/namenode
100G /backup/hadoop/hdfs
10G /backup/hadoop/config
2.3 恢复演练规划
恢复演练是验证备份有效性的重要手段。风哥提示:建议每季度进行一次恢复演练。
– 制定演练计划
– 准备演练环境
– 执行恢复操作
– 验证恢复结果
– 总结演练经验
Part03-生产环境项目实施方案
3.1 NameNode元数据备份
3.1.1 手动备份元数据
ls -la /bigdata/fgdata/namenode/current/
# 创建元数据备份
hdfs dfsadmin -saveNamespace
# 复制元数据到备份目录
cp -r /bigdata/fgdata/namenode/current/* /backup/hadoop/namenode/$(date +%Y%m%d)/
# 验证备份
ls -la /backup/hadoop/namenode/$(date +%Y%m%d)/
total 102400
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 edits_0000001-0000002
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 fsimage_0000002
-rw-r–r– 1 hdfs hdfs 32 Jan 17 20:00 seen_txid
# 保存命名空间
Save namespace successful
# 备份验证
total 102400
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 edits_0000001-0000002
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 fsimage_0000002
-rw-r–r– 1 hdfs hdfs 32 Jan 17 20:00 seen_txid
3.1.2 自动备份脚本
# namenode_backup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
# NameNode元数据备份脚本
BACKUP_DIR=”/backup/hadoop/namenode”
RETENTION_DAYS=30
# 创建备份目录
mkdir -p ${BACKUP_DIR}/$(date +%Y%m%d)
# 保存命名空间
hdfs dfsadmin -saveNamespace
# 复制元数据
cp -r /bigdata/fgdata/namenode/current/* ${BACKUP_DIR}/$(date +%Y%m%d)/
# 清理旧备份
find ${BACKUP_DIR} -type d -mtime +${RETENTION_DAYS} -exec rm -rf {} \;
echo “NameNode backup completed at $(date)”
./namenode_backup.sh
Save namespace successful
NameNode backup completed at Wed Jan 17 20:30:00 CST 2024
# 备份验证
ls -la /backup/hadoop/namenode/20240117/
total 102400
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:30 edits_0000001-0000002
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:30 fsimage_0000002
3.2 HDFS数据备份
3.2.1 DistCp数据备份
hadoop distcp -update -skipcrccheck /bigdata/warehouse/fgedu hdfs://backup-cluster/bigdata/warehouse/fgedu
# 查看备份进度
yarn application -list -appStates RUNNING | grep distcp
# 验证备份数据
hdfs dfs -ls hdfs://backup-cluster/bigdata/warehouse/fgedu/
24/01/17 21:00:00 INFO tools.DistCp: DistCp job: job_1705473600000_0100
24/01/17 21:00:05 INFO tools.DistCp: Number of paths: 1000
24/01/17 21:30:00 INFO tools.DistCp: DistCp completed successfully
# 备份进度
Total Applications:1
Application-Id Application-Name State
app_1705473600100 distcp RUNNING
# 备份验证
Found 100 items
drwxr-xr-x – fgedu fgedu 0 2024-01-17 21:30 hdfs://backup-cluster/bigdata/warehouse/fgedu/ods
drwxr-xr-x – fgedu fgedu 0 2024-01-17 21:30 hdfs://backup-cluster/bigdata/warehouse/fgedu/dwd
3.2.2 导出数据备份
hdfs dfs -get /bigdata/warehouse/fgedu/important_data.parquet /backup/hadoop/hdfs/
# 压缩备份
tar -czf /backup/hadoop/hdfs/fgedu_data_$(date +%Y%m%d).tar.gz /backup/hadoop/hdfs/important_data.parquet
# 验证备份
ls -la /backup/hadoop/hdfs/fgedu_data_$(date +%Y%m%d).tar.gz
# 导出完成
# 压缩备份
# 压缩完成
# 备份验证
-rw-r–r– 1 root root 1073741824 Jan 17 21:30 /backup/hadoop/hdfs/fgedu_data_20240117.tar.gz
3.3 快照备份实施
3.3.1 创建快照
hdfs dfsadmin -allowSnapshot /bigdata/warehouse/fgedu
# 创建快照
hdfs dfs -createSnapshot /bigdata/warehouse/fgedu daily_backup_$(date +%Y%m%d)
# 查看快照列表
hdfs dfs -ls /bigdata/warehouse/fgedu/.snapshot/
Allowing snapshot on /bigdata/warehouse/fgedu succeeded
# 创建快照
Created snapshot /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117
# 快照列表
Found 7 items
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240111
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240112
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240113
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240114
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240115
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240116
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117
3.3.2 快照恢复
hdfs dfs -cp /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240116/data.parquet /bigdata/warehouse/fgedu/restored/
# 查看恢复结果
hdfs dfs -ls /bigdata/warehouse/fgedu/restored/
# 删除快照
hdfs dfs -deleteSnapshot /bigdata/warehouse/fgedu daily_backup_20240110
# 恢复完成
# 恢复结果
Found 1 items
-rw-r–r– 3 fgedu fgedu 1073741824 2024-01-17 22:30 /bigdata/warehouse/fgedu/restored/data.parquet
# 删除快照
Deleted snapshot /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240110
3.4 灾难恢复实施
3.4.1 NameNode恢复
hdfs –daemon stop namenode
# 清空当前元数据
rm -rf /bigdata/fgdata/namenode/current/*
# 恢复元数据
cp -r /backup/hadoop/namenode/20240117/* /bigdata/fgdata/namenode/current/
# 启动NameNode
hdfs –daemon start namenode
# 验证恢复
hdfs dfsadmin -report
stopping namenode
# 恢复元数据
# 恢复完成
# 启动NameNode
starting namenode, logging to /bigdata/app/hadoop/logs/hadoop-hdfs-namenode-fgedu01.log
# 验证恢复
Live datanodes: 6
Dead datanodes: 0
Total storage: 120 TB
Used storage: 60 TB
# NameNode恢复成功
3.4.2 数据恢复
hadoop distcp hdfs://backup-cluster/bigdata/warehouse/fgedu /bigdata/warehouse/fgedu_restored
# 验证恢复数据
hdfs dfs -ls /bigdata/warehouse/fgedu_restored/
# 数据校验
hdfs fsck /bigdata/warehouse/fgedu_restored -files -blocks
24/01/17 23:00:00 INFO tools.DistCp: DistCp job: job_1705473600000_0200
24/01/17 23:30:00 INFO tools.DistCp: DistCp completed successfully
# 恢复数据验证
Found 100 items
drwxr-xr-x – fgedu fgedu 0 2024-01-17 23:30 /bigdata/warehouse/fgedu_restored/ods
drwxr-xr-x – fgedu fgedu 0 2024-01-17 23:30 /bigdata/warehouse/fgedu_restored/dwd
# 数据校验
Total size: 65970697543680 B
Total blocks: 1000000
Under-replicated blocks: 0
# 数据恢复成功
Part04-生产案例与实战讲解
4.1 元数据恢复案例
NameNode元数据损坏是严重故障,需要快速恢复。更多视频教程www.fgedu.net.cn
# namenode_recovery.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
# NameNode元数据恢复脚本
BACKUP_DIR=”/backup/hadoop/namenode”
LATEST_BACKUP=$(ls -t ${BACKUP_DIR} | head -1)
echo “=== NameNode Recovery Process ===”
echo “Latest backup: ${LATEST_BACKUP}”
# 1. 停止NameNode
echo “Stopping NameNode…”
hdfs –daemon stop namenode
# 2. 备份当前元数据
echo “Backing up current metadata…”
mv /bigdata/fgdata/namenode/current /bigdata/fgdata/namenode/current.corrupt
# 3. 恢复元数据
echo “Restoring metadata…”
mkdir -p /bigdata/fgdata/namenode/current
cp -r ${BACKUP_DIR}/${LATEST_BACKUP}/* /bigdata/fgdata/namenode/current/
# 4. 启动NameNode
echo “Starting NameNode…”
hdfs –daemon start namenode
# 5. 验证恢复
echo “Verifying recovery…”
sleep 30
hdfs dfsadmin -report
echo “=== Recovery Completed ===”
./namenode_recovery.sh
=== NameNode Recovery Process ===
Latest backup: 20240117
Stopping NameNode…
stopping namenode
Backing up current metadata…
Restoring metadata…
Starting NameNode…
starting namenode
Verifying recovery…
Live datanodes: 6
Dead datanodes: 0
=== Recovery Completed ===
4.2 数据误删恢复案例
数据误删是常见问题,需要快速恢复。学习交流加群风哥微信: itpux-com
# 1. 检查回收站
hdfs dfs -ls /user/fgedu/.Trash/Current/bigdata/warehouse/fgedu/
# 2. 从回收站恢复
hdfs dfs -mv /user/fgedu/.Trash/Current/bigdata/warehouse/fgedu/deleted_data.parquet /bigdata/warehouse/fgedu/
# 3. 如果回收站没有,从快照恢复
hdfs dfs -ls /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117/
hdfs dfs -cp /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117/deleted_data.parquet /bigdata/warehouse/fgedu/
# 4. 验证恢复
hdfs dfs -ls /bigdata/warehouse/fgedu/deleted_data.parquet
Found 1 items
-rw-r–r– 3 fgedu fgedu 1073741824 2024-01-17 23:30 /user/fgedu/.Trash/Current/bigdata/warehouse/fgedu/deleted_data.parquet
# 从回收站恢复
# 恢复完成
# 验证恢复
-rw-r–r– 3 fgedu fgedu 1073741824 2024-01-17 23:35 /bigdata/warehouse/fgedu/deleted_data.parquet
# 数据恢复成功
4.3 灾难恢复演练案例
4.3.1 演练计划
# disaster_recovery_drill.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
echo “=== Disaster Recovery Drill ===”
echo “Date: $(date)”
# 1. 模拟NameNode故障
echo “Simulating NameNode failure…”
# 在演练环境执行
# 2. 执行恢复流程
echo “Executing recovery procedure…”
# 执行恢复脚本
# 3. 验证恢复结果
echo “Verifying recovery results…”
hdfs dfsadmin -report
hdfs fsck / -files -blocks | tail -10
# 4. 性能测试
echo “Performance testing…”
hadoop jar /bigdata/app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 10000000 /test/teragen
# 5. 记录演练结果
echo “Recording drill results…”
echo “Recovery time: 30 minutes” >> /backup/drill_results.log
echo “Data integrity: PASSED” >> /backup/drill_results.log
echo “=== Drill Completed ===”
./disaster_recovery_drill.sh
=== Disaster Recovery Drill ===
Date: Wed Jan 17 23:45:00 CST 2024
Simulating NameNode failure…
Executing recovery procedure…
Verifying recovery results…
Live datanodes: 6
Total blocks: 1000000
Under-replicated blocks: 0
Performance testing…
Job job_1705473600000_0300 completed successfully
Recording drill results…
=== Drill Completed ===
Part05-风哥经验总结与分享
5.1 备份恢复最佳实践
在实际生产环境中,备份恢复需要注意以下几点:from bigdata视频:www.itpux.com
1. 建立完善的备份机制
2. 定期验证备份有效性
3. 定期进行恢复演练
4. 建立异地灾备
5. 记录备份恢复过程
5.2 灾备经验总结
5.2.1 灾备建议
– 备份数据要异地存储
– 定期验证备份数据完整性
– 制定详细的恢复流程
– 定期进行恢复演练
– 保留足够的备份历史
5.2.2 备份检查脚本
# backup_check.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
echo “=== Backup Status Check ===”
echo “Date: $(date)”
# 1. 检查元数据备份
echo “=== NameNode Backup ===”
ls -la /backup/hadoop/namenode/ | tail -5
# 2. 检查数据备份
echo “=== HDFS Data Backup ===”
ls -la /backup/hadoop/hdfs/ | tail -5
# 3. 检查快照
echo “=== Snapshot Status ===”
hdfs dfs -ls /bigdata/warehouse/fgedu/.snapshot/ | tail -5
# 4. 检查备份空间
echo “=== Backup Storage ===”
df -h /backup
# 5. 备份完整性检查
echo “=== Backup Integrity ===”
LATEST_BACKUP=$(ls -t /backup/hadoop/namenode | head -1)
if [ -f “/backup/hadoop/namenode/${LATEST_BACKUP}/fsimage_”* ]; then
echo “Backup integrity: OK”
else
echo “Backup integrity: FAILED”
fi
=== Backup Status Check ===
Date: Thu Jan 18 00:00:00 CST 2024
=== NameNode Backup ===
drwxr-xr-x 2 hdfs hdfs 4096 Jan 17 00:00 20240115
drwxr-xr-x 2 hdfs hdfs 4096 Jan 16 00:00 20240116
drwxr-xr-x 2 hdfs hdfs 4096 Jan 17 00:00 20240117
=== HDFS Data Backup ===
-rw-r–r– 1 root root 1073741824 Jan 17 00:00 fgedu_data_20240117.tar.gz
=== Snapshot Status ===
drwxr-xr-x – fgedu fgedu 0 2024-01-17 00:00 daily_backup_20240117
=== Backup Storage ===
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 50T 20T 30T 40% /backup
=== Backup Integrity ===
Backup integrity: OK
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
