1. 首页 > Hadoop教程 > 正文

大数据教程FG024-Hadoop集群备份与恢复实战

内容简介:本文详细介绍Hadoop集群备份与恢复实战,包括NameNode元数据备份、HDFS数据备份、快照备份、灾难恢复等核心内容。风哥教程参考Hadoop官方文档Backup、Restore、Snapshot等内容。

目录大纲

Part01-基础概念与理论知识
  1.1 备份策略概述
  1.2 备份类型分类
  1.3 恢复策略概述
Part02-生产环境规划与建议
  2.1 备份计划规划
  2.2 备份存储规划
  2.3 恢复演练规划
Part03-生产环境项目实施方案
  3.1 NameNode元数据备份
  3.2 HDFS数据备份
  3.3 快照备份实施
  3.4 灾难恢复实施
Part04-生产案例与实战讲解
  4.1 元数据恢复案例
  4.2 数据误删恢复案例
  4.3 灾难恢复演练案例
Part05-风哥经验总结与分享
  5.1 备份恢复最佳实践
  5.2 灾备经验总结

Part01-基础概念与理论知识

1.1 备份策略概述

Hadoop集群备份是保障数据安全的重要手段。更多视频教程www.fgedu.net.cn 备份策略需要根据数据重要性和恢复时间要求制定。

风哥提示:备份是数据安全的最后一道防线,生产环境必须建立完善的备份机制。

1.2 备份类型分类

Hadoop备份分为多种类型。学习交流加群风哥微信: itpux-com

备份类型分类:
– 元数据备份:NameNode元数据(fsimage、edits)
– 数据备份:HDFS文件数据
– 快照备份:HDFS快照
– 配置备份:Hadoop配置文件

1.3 恢复策略概述

恢复策略需要根据故障类型制定。from bigdata视频:www.itpux.com

恢复策略要点:
– 确定恢复范围
– 选择恢复方法
– 验证恢复结果
– 记录恢复过程

Part02-生产环境规划与建议

2.1 备份计划规划

备份计划需要根据业务需求制定。更多学习教程公众号风哥教程itpux_com

备份计划建议:
– 元数据备份:每小时一次
– 增量数据备份:每天一次
– 全量数据备份:每周一次
– 配置备份:每次变更后

2.2 备份存储规划

备份存储需要考虑容量和安全性。学习交流加群风哥QQ113257174

# 查看备份存储空间
df -h /backup
# 查看备份数据量
du -sh /backup/hadoop/*

# 备份存储空间
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 50T 20T 30T 40% /backup

# 备份数据量
5.0G /backup/hadoop/namenode
100G /backup/hadoop/hdfs
10G /backup/hadoop/config

2.3 恢复演练规划

恢复演练是验证备份有效性的重要手段。风哥提示:建议每季度进行一次恢复演练。

恢复演练要点:
– 制定演练计划
– 准备演练环境
– 执行恢复操作
– 验证恢复结果
– 总结演练经验

Part03-生产环境项目实施方案

3.1 NameNode元数据备份

3.1.1 手动备份元数据

# 查看当前元数据
ls -la /bigdata/fgdata/namenode/current/
# 创建元数据备份
hdfs dfsadmin -saveNamespace
# 复制元数据到备份目录
cp -r /bigdata/fgdata/namenode/current/* /backup/hadoop/namenode/$(date +%Y%m%d)/
# 验证备份
ls -la /backup/hadoop/namenode/$(date +%Y%m%d)/

# 当前元数据
total 102400
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 edits_0000001-0000002
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 fsimage_0000002
-rw-r–r– 1 hdfs hdfs 32 Jan 17 20:00 seen_txid

# 保存命名空间
Save namespace successful

# 备份验证
total 102400
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 edits_0000001-0000002
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:00 fsimage_0000002
-rw-r–r– 1 hdfs hdfs 32 Jan 17 20:00 seen_txid

3.1.2 自动备份脚本

#!/bin/bash
# namenode_backup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

# NameNode元数据备份脚本
BACKUP_DIR=”/backup/hadoop/namenode”
RETENTION_DAYS=30

# 创建备份目录
mkdir -p ${BACKUP_DIR}/$(date +%Y%m%d)

# 保存命名空间
hdfs dfsadmin -saveNamespace

# 复制元数据
cp -r /bigdata/fgdata/namenode/current/* ${BACKUP_DIR}/$(date +%Y%m%d)/

# 清理旧备份
find ${BACKUP_DIR} -type d -mtime +${RETENTION_DAYS} -exec rm -rf {} \;

echo “NameNode backup completed at $(date)”

# 备份执行
./namenode_backup.sh
Save namespace successful
NameNode backup completed at Wed Jan 17 20:30:00 CST 2024

# 备份验证
ls -la /backup/hadoop/namenode/20240117/
total 102400
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:30 edits_0000001-0000002
-rw-r–r– 1 hdfs hdfs 1048576 Jan 17 20:30 fsimage_0000002

3.2 HDFS数据备份

3.2.1 DistCp数据备份

# 使用DistCp备份数据
hadoop distcp -update -skipcrccheck /bigdata/warehouse/fgedu hdfs://backup-cluster/bigdata/warehouse/fgedu
# 查看备份进度
yarn application -list -appStates RUNNING | grep distcp
# 验证备份数据
hdfs dfs -ls hdfs://backup-cluster/bigdata/warehouse/fgedu/

# DistCp执行
24/01/17 21:00:00 INFO tools.DistCp: DistCp job: job_1705473600000_0100
24/01/17 21:00:05 INFO tools.DistCp: Number of paths: 1000
24/01/17 21:30:00 INFO tools.DistCp: DistCp completed successfully

# 备份进度
Total Applications:1
Application-Id Application-Name State
app_1705473600100 distcp RUNNING

# 备份验证
Found 100 items
drwxr-xr-x – fgedu fgedu 0 2024-01-17 21:30 hdfs://backup-cluster/bigdata/warehouse/fgedu/ods
drwxr-xr-x – fgedu fgedu 0 2024-01-17 21:30 hdfs://backup-cluster/bigdata/warehouse/fgedu/dwd

3.2.2 导出数据备份

# 导出数据到本地
hdfs dfs -get /bigdata/warehouse/fgedu/important_data.parquet /backup/hadoop/hdfs/
# 压缩备份
tar -czf /backup/hadoop/hdfs/fgedu_data_$(date +%Y%m%d).tar.gz /backup/hadoop/hdfs/important_data.parquet
# 验证备份
ls -la /backup/hadoop/hdfs/fgedu_data_$(date +%Y%m%d).tar.gz

# 数据导出
# 导出完成

# 压缩备份
# 压缩完成

# 备份验证
-rw-r–r– 1 root root 1073741824 Jan 17 21:30 /backup/hadoop/hdfs/fgedu_data_20240117.tar.gz

3.3 快照备份实施

3.3.1 创建快照

# 启用快照
hdfs dfsadmin -allowSnapshot /bigdata/warehouse/fgedu
# 创建快照
hdfs dfs -createSnapshot /bigdata/warehouse/fgedu daily_backup_$(date +%Y%m%d)
# 查看快照列表
hdfs dfs -ls /bigdata/warehouse/fgedu/.snapshot/

# 启用快照
Allowing snapshot on /bigdata/warehouse/fgedu succeeded

# 创建快照
Created snapshot /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117

# 快照列表
Found 7 items
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240111
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240112
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240113
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240114
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240115
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240116
drwxr-xr-x – fgedu fgedu 0 2024-01-17 22:00 /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117

3.3.2 快照恢复

# 从快照恢复文件
hdfs dfs -cp /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240116/data.parquet /bigdata/warehouse/fgedu/restored/
# 查看恢复结果
hdfs dfs -ls /bigdata/warehouse/fgedu/restored/
# 删除快照
hdfs dfs -deleteSnapshot /bigdata/warehouse/fgedu daily_backup_20240110

# 快照恢复
# 恢复完成

# 恢复结果
Found 1 items
-rw-r–r– 3 fgedu fgedu 1073741824 2024-01-17 22:30 /bigdata/warehouse/fgedu/restored/data.parquet

# 删除快照
Deleted snapshot /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240110

3.4 灾难恢复实施

3.4.1 NameNode恢复

# 停止NameNode
hdfs –daemon stop namenode
# 清空当前元数据
rm -rf /bigdata/fgdata/namenode/current/*
# 恢复元数据
cp -r /backup/hadoop/namenode/20240117/* /bigdata/fgdata/namenode/current/
# 启动NameNode
hdfs –daemon start namenode
# 验证恢复
hdfs dfsadmin -report

# 停止NameNode
stopping namenode

# 恢复元数据
# 恢复完成

# 启动NameNode
starting namenode, logging to /bigdata/app/hadoop/logs/hadoop-hdfs-namenode-fgedu01.log

# 验证恢复
Live datanodes: 6
Dead datanodes: 0
Total storage: 120 TB
Used storage: 60 TB
# NameNode恢复成功

3.4.2 数据恢复

# 从备份集群恢复数据
hadoop distcp hdfs://backup-cluster/bigdata/warehouse/fgedu /bigdata/warehouse/fgedu_restored
# 验证恢复数据
hdfs dfs -ls /bigdata/warehouse/fgedu_restored/
# 数据校验
hdfs fsck /bigdata/warehouse/fgedu_restored -files -blocks

# 数据恢复
24/01/17 23:00:00 INFO tools.DistCp: DistCp job: job_1705473600000_0200
24/01/17 23:30:00 INFO tools.DistCp: DistCp completed successfully

# 恢复数据验证
Found 100 items
drwxr-xr-x – fgedu fgedu 0 2024-01-17 23:30 /bigdata/warehouse/fgedu_restored/ods
drwxr-xr-x – fgedu fgedu 0 2024-01-17 23:30 /bigdata/warehouse/fgedu_restored/dwd

# 数据校验
Total size: 65970697543680 B
Total blocks: 1000000
Under-replicated blocks: 0
# 数据恢复成功

Part04-生产案例与实战讲解

4.1 元数据恢复案例

NameNode元数据损坏是严重故障,需要快速恢复。更多视频教程www.fgedu.net.cn

#!/bin/bash
# namenode_recovery.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

# NameNode元数据恢复脚本
BACKUP_DIR=”/backup/hadoop/namenode”
LATEST_BACKUP=$(ls -t ${BACKUP_DIR} | head -1)

echo “=== NameNode Recovery Process ===”
echo “Latest backup: ${LATEST_BACKUP}”

# 1. 停止NameNode
echo “Stopping NameNode…”
hdfs –daemon stop namenode

# 2. 备份当前元数据
echo “Backing up current metadata…”
mv /bigdata/fgdata/namenode/current /bigdata/fgdata/namenode/current.corrupt

# 3. 恢复元数据
echo “Restoring metadata…”
mkdir -p /bigdata/fgdata/namenode/current
cp -r ${BACKUP_DIR}/${LATEST_BACKUP}/* /bigdata/fgdata/namenode/current/

# 4. 启动NameNode
echo “Starting NameNode…”
hdfs –daemon start namenode

# 5. 验证恢复
echo “Verifying recovery…”
sleep 30
hdfs dfsadmin -report

echo “=== Recovery Completed ===”

# 恢复执行
./namenode_recovery.sh
=== NameNode Recovery Process ===
Latest backup: 20240117

Stopping NameNode…
stopping namenode

Backing up current metadata…

Restoring metadata…

Starting NameNode…
starting namenode

Verifying recovery…
Live datanodes: 6
Dead datanodes: 0

=== Recovery Completed ===

4.2 数据误删恢复案例

数据误删是常见问题,需要快速恢复。学习交流加群风哥微信: itpux-com

# 数据误删恢复
# 1. 检查回收站
hdfs dfs -ls /user/fgedu/.Trash/Current/bigdata/warehouse/fgedu/

# 2. 从回收站恢复
hdfs dfs -mv /user/fgedu/.Trash/Current/bigdata/warehouse/fgedu/deleted_data.parquet /bigdata/warehouse/fgedu/

# 3. 如果回收站没有,从快照恢复
hdfs dfs -ls /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117/
hdfs dfs -cp /bigdata/warehouse/fgedu/.snapshot/daily_backup_20240117/deleted_data.parquet /bigdata/warehouse/fgedu/

# 4. 验证恢复
hdfs dfs -ls /bigdata/warehouse/fgedu/deleted_data.parquet

# 回收站检查
Found 1 items
-rw-r–r– 3 fgedu fgedu 1073741824 2024-01-17 23:30 /user/fgedu/.Trash/Current/bigdata/warehouse/fgedu/deleted_data.parquet

# 从回收站恢复
# 恢复完成

# 验证恢复
-rw-r–r– 3 fgedu fgedu 1073741824 2024-01-17 23:35 /bigdata/warehouse/fgedu/deleted_data.parquet
# 数据恢复成功

4.3 灾难恢复演练案例

4.3.1 演练计划

#!/bin/bash
# disaster_recovery_drill.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

echo “=== Disaster Recovery Drill ===”
echo “Date: $(date)”

# 1. 模拟NameNode故障
echo “Simulating NameNode failure…”
# 在演练环境执行

# 2. 执行恢复流程
echo “Executing recovery procedure…”
# 执行恢复脚本

# 3. 验证恢复结果
echo “Verifying recovery results…”
hdfs dfsadmin -report
hdfs fsck / -files -blocks | tail -10

# 4. 性能测试
echo “Performance testing…”
hadoop jar /bigdata/app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 10000000 /test/teragen

# 5. 记录演练结果
echo “Recording drill results…”
echo “Recovery time: 30 minutes” >> /backup/drill_results.log
echo “Data integrity: PASSED” >> /backup/drill_results.log

echo “=== Drill Completed ===”

# 演练执行
./disaster_recovery_drill.sh
=== Disaster Recovery Drill ===
Date: Wed Jan 17 23:45:00 CST 2024

Simulating NameNode failure…

Executing recovery procedure…

Verifying recovery results…
Live datanodes: 6
Total blocks: 1000000
Under-replicated blocks: 0

Performance testing…
Job job_1705473600000_0300 completed successfully

Recording drill results…

=== Drill Completed ===

Part05-风哥经验总结与分享

5.1 备份恢复最佳实践

在实际生产环境中,备份恢复需要注意以下几点:from bigdata视频:www.itpux.com

风哥经验总结:
1. 建立完善的备份机制
2. 定期验证备份有效性
3. 定期进行恢复演练
4. 建立异地灾备
5. 记录备份恢复过程

5.2 灾备经验总结

5.2.1 灾备建议

风哥提示:灾备是数据安全的最后一道防线,必须认真对待。

灾备注意事项:
– 备份数据要异地存储
– 定期验证备份数据完整性
– 制定详细的恢复流程
– 定期进行恢复演练
– 保留足够的备份历史

5.2.2 备份检查脚本

#!/bin/bash
# backup_check.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

echo “=== Backup Status Check ===”
echo “Date: $(date)”

# 1. 检查元数据备份
echo “=== NameNode Backup ===”
ls -la /backup/hadoop/namenode/ | tail -5

# 2. 检查数据备份
echo “=== HDFS Data Backup ===”
ls -la /backup/hadoop/hdfs/ | tail -5

# 3. 检查快照
echo “=== Snapshot Status ===”
hdfs dfs -ls /bigdata/warehouse/fgedu/.snapshot/ | tail -5

# 4. 检查备份空间
echo “=== Backup Storage ===”
df -h /backup

# 5. 备份完整性检查
echo “=== Backup Integrity ===”
LATEST_BACKUP=$(ls -t /backup/hadoop/namenode | head -1)
if [ -f “/backup/hadoop/namenode/${LATEST_BACKUP}/fsimage_”* ]; then
echo “Backup integrity: OK”
else
echo “Backup integrity: FAILED”
fi

# 备份检查结果
=== Backup Status Check ===
Date: Thu Jan 18 00:00:00 CST 2024

=== NameNode Backup ===
drwxr-xr-x 2 hdfs hdfs 4096 Jan 17 00:00 20240115
drwxr-xr-x 2 hdfs hdfs 4096 Jan 16 00:00 20240116
drwxr-xr-x 2 hdfs hdfs 4096 Jan 17 00:00 20240117

=== HDFS Data Backup ===
-rw-r–r– 1 root root 1073741824 Jan 17 00:00 fgedu_data_20240117.tar.gz

=== Snapshot Status ===
drwxr-xr-x – fgedu fgedu 0 2024-01-17 00:00 daily_backup_20240117

=== Backup Storage ===
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 50T 20T 30T 40% /backup

=== Backup Integrity ===
Backup integrity: OK

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息