本文档风哥主要介绍Rancher集群备份与ETCD快照恢复实战,包括Rancher数据库备份概念、Rancher数据库ETCD概念、Rancher数据库快照概念、Rancher数据库备份准备、Rancher数据库备份要求、Rancher数据库备份规划、Rancher数据库备份ETCD、Rancher数据库备份集群、Rancher数据库恢复ETCD、Rancher数据库恢复集群、Rancher数据库验证备份、Rancher数据库优化备份等内容,风哥教程参考Rancher官方文档备份、ETCD、快照等内容,适合运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 Rancher数据库备份概念
Rancher数据库备份是指对Rancher集群和ETCD数据进行备份,确保数据安全和可恢复性。备份包括ETCD数据、集群配置、应用数据等。备份可以防止数据丢失,提高系统的可靠性。更多视频教程www.fgedu.net.cn
- 数据安全:确保数据安全
- 可恢复性:支持数据恢复
- 定期备份:定期进行备份
- 多地备份:多地存储备份
- 快速恢复:支持快速恢复
1.2 Rancher数据库ETCD概念
Rancher数据库ETCD是Kubernetes集群的分布式键值存储,用于存储集群的所有状态数据。ETCD是Kubernetes的核心组件,负责存储集群配置、节点信息、Pod状态等。ETCD的高可用性对集群的稳定性至关重要。学习交流加群风哥微信: itpux-com
- 分布式存储:分布式键值存储
- 强一致性:保证数据一致性
- 高可用:支持高可用部署
- 快照支持:支持快照备份
- 事务支持:支持事务操作
1.3 Rancher数据库快照概念
Rancher数据库快照是指对ETCD数据在某个时间点的完整备份。快照可以快速恢复到备份时的状态,减少数据丢失。快照可以定期创建,也可以在重要操作前创建。学习交流加群风哥QQ113257174
- 时间点备份:时间点完整备份
- 快速恢复:快速恢复到备份状态
- 定期创建:定期创建快照
- 增量备份:支持增量备份
- 压缩存储:压缩存储快照
Part02-生产环境规划与建议
2.1 Rancher数据库备份准备
Rancher数据库备份准备:
# 1. Rancher Server准备
– Rancher Server已部署
– Rancher Server可访问
– Rancher Server配置正确
# 2. 备份需求分析
– 确定备份类型
– 确定备份频率
– 确定备份保留时间
– 确定备份存储位置
# 3. 存储准备
– 本地存储:>= 100GB
– 远程存储:>= 1TB
– 云存储:>= 10TB
# 4. 网络准备
– 网络带宽:>= 1Gbps
– 网络延迟:< 10ms
- 端口开放:2379、2380等
# 5. 工具准备
- etcdctl工具:>= v3.5.0
– 备份脚本:自定义脚本
– 监控工具:Prometheus、Grafana
2.2 Rancher数据库备份要求
Rancher数据库备份要求:
# 备份类型要求
ETCD备份:>= 每日备份
集群备份:>= 每周备份
应用备份:>= 每日备份
# 备份频率要求
ETCD备份:每日备份
集群备份:每周备份
应用备份:每日备份
# 备份保留要求
ETCD备份:>= 30天
集群备份:>= 90天
应用备份:>= 30天
# 备份存储要求
本地存储:>= 100GB
远程存储:>= 1TB
云存储:>= 10TB
# 备份恢复要求
RTO:<= 1小时
RPO:<= 15分钟
恢复测试:>= 每月一次
2.3 Rancher数据库备份规划
Rancher数据库备份规划:
# 备份类型规划
ETCD备份:每日备份
集群备份:每周备份
应用备份:每日备份
# 备份时间规划
ETCD备份:凌晨2点
集群备份:周日凌晨3点
应用备份:凌晨2点
# 备份存储规划
本地存储:/Rancher/backup/local
远程存储:backup@192.168.1.200:/backup/rancher
云存储:s3://fgedu-backup/rancher
# 备份保留规划
ETCD备份:30天
集群备份:90天
应用备份:30天
# 备份恢复规划
恢复测试:每月一次
恢复演练:每季度一次
灾难恢复:每年一次
Part03-生产环境项目实施方案
3.1 Rancher数据库备份ETCD
3.1.1 Rancher数据库通过etcdctl备份ETCD
[root@rancher ~]# kubectl get pods -n kube-system | grep etcd
etcd-fgedu-control-plane-1 1/1 Running 0 10m
etcd-fgedu-control-plane-2 1/1 Running 0 10m
etcd-fgedu-control-plane-3 1/1 Running 0 10m
# 创建ETCD备份脚本
[root@rancher ~]# cat > /Rancher/scripts/etcd_backup.sh <<'EOF'
#!/bin/bash
# etcd_backup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
BACKUP_DIR="/Rancher/backup/etcd"
DATE=$(date +%Y%m%d_%H%M%S)
ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')
# 创建备份目录
mkdir -p $BACKUP_DIR
# 备份ETCD
kubectl exec -n kube-system $ETCD_POD -- etcdctl snapshot save /tmp/etcd-snapshot-$DATE.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key
# 复制备份文件
kubectl cp kube-system/$ETCD_POD:/tmp/etcd-snapshot-$DATE.db $BACKUP_DIR/etcd-snapshot-$DATE.db
# 压缩备份文件
gzip $BACKUP_DIR/etcd-snapshot-$DATE.db
# 删除30天前的备份
find $BACKUP_DIR -name "etcd-snapshot-*.db.gz" -mtime +30 -delete
echo "ETCD backup completed: $BACKUP_DIR/etcd-snapshot-$DATE.db.gz"
EOF
# 添加执行权限
[root@rancher ~]# chmod +x /Rancher/scripts/etcd_backup.sh
# 执行ETCD备份
[root@rancher ~]# /Rancher/scripts/etcd_backup.sh
ETCD backup completed: /Rancher/backup/etcd/etcd-snapshot-20260410_020000.db.gz
# 查看备份文件
[root@rancher ~]# ls -lh /Rancher/backup/etcd/
total 100M
-rw-r--r-- 1 root root 100M Apr 10 02:00:00 etcd-snapshot-20260410_020000.db.gz
# 配置定时任务
[root@rancher ~]# crontab -e
# 添加以下内容
0 2 * * * /Rancher/scripts/etcd_backup.sh >> /Rancher/logs/etcd_backup.log 2>&1
3.2 Rancher数据库备份集群
3.2.1 Rancher数据库通过RKE2备份集群
[root@rancher ~]# cat > /Rancher/scripts/cluster_backup.sh <<'EOF' #!/bin/bash # cluster_backup.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn BACKUP_DIR="/Rancher/backup/cluster" DATE=$(date +%Y%m%d_%H%M%S) CLUSTER_NAME="fgedu-rke2-cluster" # 创建备份目录 mkdir -p $BACKUP_DIR # 备份集群配置 kubectl get all --all-namespaces -o yaml > $BACKUP_DIR/cluster-all-$DATE.yaml
kubectl get configmaps –all-namespaces -o yaml > $BACKUP_DIR/cluster-configmaps-$DATE.yaml
kubectl get secrets –all-namespaces -o yaml > $BACKUP_DIR/cluster-secrets-$DATE.yaml
kubectl get deployments –all-namespaces -o yaml > $BACKUP_DIR/cluster-deployments-$DATE.yaml
kubectl get services –all-namespaces -o yaml > $BACKUP_DIR/cluster-services-$DATE.yaml
kubectl get ingress –all-namespaces -o yaml > $BACKUP_DIR/cluster-ingress-$DATE.yaml
kubectl get pvc –all-namespaces -o yaml > $BACKUP_DIR/cluster-pvc-$DATE.yaml
# 压缩备份文件
tar czf $BACKUP_DIR/cluster-backup-$DATE.tar.gz -C $BACKUP_DIR \
cluster-all-$DATE.yaml \
cluster-configmaps-$DATE.yaml \
cluster-secrets-$DATE.yaml \
cluster-deployments-$DATE.yaml \
cluster-services-$DATE.yaml \
cluster-ingress-$DATE.yaml \
cluster-pvc-$DATE.yaml
# 删除临时文件
rm -f $BACKUP_DIR/cluster-*$DATE.yaml
# 删除90天前的备份
find $BACKUP_DIR -name “cluster-backup-*.tar.gz” -mtime +90 -delete
echo “Cluster backup completed: $BACKUP_DIR/cluster-backup-$DATE.tar.gz”
EOF
# 添加执行权限
[root@rancher ~]# chmod +x /Rancher/scripts/cluster_backup.sh
# 执行集群备份
[root@rancher ~]# /Rancher/scripts/cluster_backup.sh
Cluster backup completed: /Rancher/backup/cluster/cluster-backup-20260410_020000.tar.gz
# 查看备份文件
[root@rancher ~]# ls -lh /Rancher/backup/cluster/
total 50M
-rw-r–r– 1 root root 50M Apr 10 02:00:00 cluster-backup-20260410_020000.tar.gz
# 配置定时任务
[root@rancher ~]# crontab -e
# 添加以下内容
0 3 * * 0 /Rancher/scripts/cluster_backup.sh >> /Rancher/logs/cluster_backup.log 2>&1
3.3 Rancher数据库恢复ETCD
3.3.1 Rancher数据库通过etcdctl恢复ETCD
[root@rancher ~]# cat > /Rancher/scripts/etcd_restore.sh <<'EOF' #!/bin/bash # etcd_restore.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn BACKUP_FILE=$1 ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}') # 检查备份文件 if [ ! -f "$BACKUP_FILE" ]; then echo "Error: Backup file not found: $BACKUP_FILE" exit 1 fi # 解压备份文件 gunzip -c $BACKUP_FILE > /tmp/etcd-snapshot.db
# 停止ETCD Pod
kubectl scale deployment -n kube-system etcd –replicas=0
# 等待ETCD Pod停止
kubectl wait –for=delete pod -n kube-system -l component=etcd –timeout=60s
# 恢复ETCD
kubectl exec -n kube-system $ETCD_POD — etcdctl snapshot restore /tmp/etcd-snapshot.db \
–data-dir /var/lib/etcd \
–initial-cluster fgedu-control-plane-1=https://192.168.1.10:2380,fgedu-control-plane-2=https://192.168.1.11:2380,fgedu-control-plane-3=https://192.168.1.12:2380 \
–initial-advertise-peer-urls https://192.168.1.10:2380 \
–name fgedu-control-plane-1 \
–cacert /etc/kubernetes/pki/etcd/ca.crt \
–cert /etc/kubernetes/pki/etcd/server.crt \
–key /etc/kubernetes/pki/etcd/server.key
# 启动ETCD Pod
kubectl scale deployment -n kube-system etcd –replicas=3
# 等待ETCD Pod启动
kubectl wait –for=condition=ready pod -n kube-system -l component=etcd –timeout=300s
echo “ETCD restore completed from: $BACKUP_FILE”
EOF
# 添加执行权限
[root@rancher ~]# chmod +x /Rancher/scripts/etcd_restore.sh
# 查看ETCD Pod
[root@rancher ~]# kubectl get pods -n kube-system | grep etcd
etcd-fgedu-control-plane-1 1/1 Running 0 10m
etcd-fgedu-control-plane-2 1/1 Running 0 10m
etcd-fgedu-control-plane-3 1/1 Running 0 10m
# 查看ETCD状态
[root@rancher ~]# kubectl exec -n kube-system etcd-fgedu-control-plane-1 — etcdctl endpoint health \
–endpoints=https://127.0.0.1:2379 \
–cacert /etc/kubernetes/pki/etcd/ca.crt \
–cert /etc/kubernetes/pki/etcd/server.crt \
–key /etc/kubernetes/pki/etcd/server.key
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 12.345678ms
Part04-生产案例与实战讲解
4.1 Rancher数据库恢复集群
4.1.1 Rancher数据库恢复集群配置
[root@rancher ~]# cat > /Rancher/scripts/cluster_restore.sh <<'EOF' #!/bin/bash # cluster_restore.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn BACKUP_FILE=$1 # 检查备份文件 if [ ! -f "$BACKUP_FILE" ]; then echo "Error: Backup file not found: $BACKUP_FILE" exit 1 fi # 解压备份文件 tar xzf $BACKUP_FILE -C /tmp/ # 恢复集群配置 kubectl apply -f /tmp/cluster-all-*.yaml kubectl apply -f /tmp/cluster-configmaps-*.yaml kubectl apply -f /tmp/cluster-secrets-*.yaml kubectl apply -f /tmp/cluster-deployments-*.yaml kubectl apply -f /tmp/cluster-services-*.yaml kubectl apply -f /tmp/cluster-ingress-*.yaml kubectl apply -f /tmp/cluster-pvc-*.yaml # 删除临时文件 rm -f /tmp/cluster-*.yaml echo "Cluster restore completed from: $BACKUP_FILE" EOF # 添加执行权限 [root@rancher ~]# chmod +x /Rancher/scripts/cluster_restore.sh # 查看集群状态 [root@rancher ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0 fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0 fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0 fgedu-worker-1 Ready
fgedu-worker-2 Ready
# 查看Pod状态
[root@rancher ~]# kubectl get pods –all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-1234567890-abcde 1/1 Running 0 10m
kube-system etcd-fgedu-control-plane-1 1/1 Running 0 10m
kube-system etcd-fgedu-control-plane-2 1/1 Running 0 10m
kube-system etcd-fgedu-control-plane-3 1/1 Running 0 10m
4.2 Rancher数据库验证备份
4.2.1 Rancher数据库验证备份完整性
[root@rancher ~]# etcdctl snapshot status /Rancher/backup/etcd/etcd-snapshot-20260410_020000.db.gz \
–cacert /etc/kubernetes/pki/etcd/ca.crt \
–cert /etc/kubernetes/pki/etcd/server.crt \
–key /etc/kubernetes/pki/etcd/server.key
2026-04-10 02:00:00.000000Z, snapshot saved, total bytes: 104857600, total keys: 12345, total snapshots: 1, revision: 1234567890
# 验证集群备份
[root@rancher ~]# tar tzf /Rancher/backup/cluster/cluster-backup-20260410_020000.tar.gz
cluster-all-20260410_020000.yaml
cluster-configmaps-20260410_020000.yaml
cluster-secrets-20260410_020000.yaml
cluster-deployments-20260410_020000.yaml
cluster-services-20260410_020000.yaml
cluster-ingress-20260410_020000.yaml
cluster-pvc-20260410_020000.yaml
# 验证备份文件大小
[root@rancher ~]# ls -lh /Rancher/backup/etcd/
total 100M
-rw-r–r– 1 root root 100M Apr 10 02:00:00 etcd-snapshot-20260410_020000.db.gz
[root@rancher ~]# ls -lh /Rancher/backup/cluster/
total 50M
-rw-r–r– 1 root root 50M Apr 10 02:00:00 cluster-backup-20260410_020000.tar.gz
# 验证备份文件完整性
[root@rancher ~]# gzip -t /Rancher/backup/etcd/etcd-snapshot-20260410_020000.db.gz
[root@rancher ~]# tar tzf /Rancher/backup/cluster/cluster-backup-20260410_020000.tar.gz > /dev/null
4.3 Rancher数据库优化备份
4.3.1 Rancher数据库优化备份性能
[root@rancher ~]# cat > /Rancher/scripts/etcd_backup_optimized.sh <<'EOF' #!/bin/bash # etcd_backup_optimized.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn BACKUP_DIR="/Rancher/backup/etcd" DATE=$(date +%Y%m%d_%H%M%S) ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}') # 创建备份目录 mkdir -p $BACKUP_DIR # 使用增量备份 kubectl exec -n kube-system $ETCD_POD -- etcdctl snapshot save /tmp/etcd-snapshot-$DATE.db \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/server.crt \ --key /etc/kubernetes/pki/etcd/server.key \ --max-snapshots=5 # 复制备份文件 kubectl cp kube-system/$ETCD_POD:/tmp/etcd-snapshot-$DATE.db $BACKUP_DIR/etcd-snapshot-$DATE.db # 使用更高压缩率 pigz -9 $BACKUP_DIR/etcd-snapshot-$DATE.db # 删除30天前的备份 find $BACKUP_DIR -name "etcd-snapshot-*.db.gz" -mtime +30 -delete echo "ETCD backup completed: $BACKUP_DIR/etcd-snapshot-$DATE.db.gz" EOF # 添加执行权限 [root@rancher ~]# chmod +x /Rancher/scripts/etcd_backup_optimized.sh # 执行优化后的ETCD备份 [root@rancher ~]# /Rancher/scripts/etcd_backup_optimized.sh ETCD backup completed: /Rancher/backup/etcd/etcd-snapshot-20260410_020000.db.gz # 查看备份文件大小 [root@rancher ~]# ls -lh /Rancher/backup/etcd/ total 80M -rw-r--r-- 1 root root 80M Apr 10 02:00:00 etcd-snapshot-20260410_020000.db.gz # 配置远程备份 [root@rancher ~]# cat > /Rancher/scripts/remote_backup.sh <<'EOF' #!/bin/bash # remote_backup.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn LOCAL_BACKUP_DIR="/Rancher/backup" REMOTE_BACKUP_DIR="backup@192.168.1.200:/backup/rancher" # 同步备份到远程服务器 rsync -avz --delete $LOCAL_BACKUP_DIR/ $REMOTE_BACKUP_DIR/ echo "Remote backup completed" EOF # 添加执行权限 [root@rancher ~]# chmod +x /Rancher/scripts/remote_backup.sh # 执行远程备份 [root@rancher ~]# /Rancher/scripts/remote_backup.sh sending incremental file list etcd/ etcd/etcd-snapshot-20260410_020000.db.gz cluster/ cluster/cluster-backup-20260410_020000.tar.gz sent 150.00M bytes received 123.45K bytes 10.23M bytes/sec total size is 150.00M speedup is 1.00 Remote backup completed
Part05-风哥经验总结与分享
5.1 Rancher数据库备份最佳实践
Rancher数据库备份最佳实践:
- 定期备份:定期备份ETCD数据和集群配置
- 多地备份:多地存储备份,防止备份丢失
- 定期测试:定期测试备份恢复,确保备份可用
- 快速恢复:配置快速恢复方案
- 监控告警:配置备份监控告警
- 文档记录:记录备份配置和变更
- 灾难演练:定期进行灾难恢复演练
5.2 Rancher数据库备份问题排查
Rancher数据库备份问题排查:
# 问题1:ETCD备份失败
# 现象:ETCD备份时提示错误
# 原因:ETCD Pod不可用、权限不足、存储不足
# 解决:
[root@rancher ~]# kubectl get pods -n kube-system | grep etcd
[root@rancher ~]# kubectl describe pod -n kube-system etcd-fgedu-control-plane-1
[root@rancher ~]# kubectl logs -n kube-system etcd-fgedu-control-plane-1
[root@rancher ~]# df -h /Rancher/backup/etcd
# 问题2:集群备份失败
# 现象:集群备份时提示错误
# 原因:Kubernetes API不可用、权限不足、存储不足
# 解决:
[root@rancher ~]# kubectl cluster-info
[root@rancher ~]# kubectl get nodes
[root@rancher ~]# df -h /Rancher/backup/cluster
# 问题3:ETCD恢复失败
# 现象:ETCD恢复时提示错误
# 原因:备份文件损坏、ETCD版本不兼容、配置错误
# 解决:
[root@rancher ~]# etcdctl snapshot status /Rancher/backup/etcd/etcd-snapshot-20260410_020000.db.gz
[root@rancher ~]# kubectl version
[root@rancher ~]# kubectl get pods -n kube-system | grep etcd
# 问题4:集群恢复失败
# 现象:集群恢复时提示错误
# 原因:备份文件损坏、Kubernetes版本不兼容、配置错误
# 解决:
[root@rancher ~]# tar tzf /Rancher/backup/cluster/cluster-backup-20260410_020000.tar.gz
[root@rancher ~]# kubectl version
[root@rancher ~]# kubectl get nodes
5.3 Rancher数据库备份维护
Rancher数据库备份维护:
# 1. 定期检查
– 检查ETCD备份状态
– 检查集群备份状态
– 检查备份文件完整性
– 检查备份存储空间
# 2. 定期优化
– 优化备份性能
– 优化备份压缩
– 优化备份传输
– 优化备份存储
# 3. 定期清理
– 清理过期备份
– 清理临时文件
– 清理无用日志
– 清理无用快照
# 4. 定期测试
– 测试ETCD恢复
– 测试集群恢复
– 测试备份完整性
– 测试恢复时间
# 5. 定期审计
– 审计备份配置
– 审计备份日志
– 审计恢复记录
– 审计操作日志
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
