1. 首页 > KubeSphere教程 > 正文

KubeSphere-045-多地域集群部署与跨地域灾难恢复实践

Multi-Region Cluster Deployment and Cross-Regional Disaster Recovery Practice

HTML-GF-Middleware 培训文档

目录

1. 基础概念

1.1 多地域集群概述

多地域集群部署是指在不同的地理位置部署多个Kubernetes集群,通过KubeSphere的多集群管理功能实现统一管理和跨地域灾难恢复。多地域集群部署具有以下优势:

  • 高可用性:通过多地域部署提高系统可用性
  • 低延迟:将应用部署在离用户最近的地域
  • 灾难恢复:通过跨地域备份实现灾难恢复
  • 合规性:满足数据驻留和合规要求
  • 负载均衡:通过多地域部署实现负载均衡

1.2 跨地域灾难恢复

跨地域灾难恢复是指在不同地域之间进行数据备份和应用同步,当主地域发生故障时,能够快速切换到备地域。跨地域灾难恢复包括:

  • 数据同步:在不同地域之间同步数据
  • 应用同步:在不同地域之间同步应用
  • 故障检测:检测主地域故障
  • 故障切换:将流量切换到备地域
  • 故障恢复:主地域恢复后切换回主地域

1.3 KubeSphere多集群管理

KubeSphere提供了强大的多集群管理功能,包括: 风哥提示: 学习交流加群风哥微信: itpux-com 学习交流加群风哥QQ113257174 更多视频教程www.fgedu.net.cn 更多学习教程公众号风哥教程itpux_com from K8S+DB视频:www.itpux.com

  • 集群注册:注册和管理多个集群
  • 统一视图:在统一视图中查看所有集群
  • 应用分发:在多个集群之间分发应用
  • 跨集群监控:监控所有集群的状态
  • 跨集群日志:收集和查看所有集群的日志

2. 生产环境规划

2.1 架构规划

2.1.1 主备架构

# 主备架构
# – 主集群(Primary Cluster):北京地域
# – 备集群(Secondary Cluster):上海地域
# – 数据同步:实时同步
# – 故障切换:手动切换

2.1.2 双活架构

# 双活架构
# – 集群1(Cluster 1):北京地域
# – 集群2(Cluster 2):上海地域
# – 数据同步:双向同步
# – 流量分发:负载均衡

2.2 网络规划

2.2.1 网络连接

# 网络连接
# – VPN连接:连接不同地域的私有网络
# – 专线连接:高速稳定的专线连接
# – 带宽:至少1Gbps
# – 延迟:小于50ms

2.2.2 DNS配置

# DNS配置
# – 主DNS:北京地域
# – 备DNS:上海地域
# – 健康检查:定期检查集群健康状态
# – 故障切换:自动切换到健康的集群

2.3 数据同步规划

2.3.1 同步策略

# 同步策略
# – 实时同步:关键数据实时同步
# – 定期同步:非关键数据定期同步
# – 增量同步:只同步变更的数据
# – 冲突解决:主地域优先

2.3.2 数据一致性

# 数据一致性
# – 强一致性:关键数据保证强一致性
# – 最终一致性:非关键数据保证最终一致性
# – 数据校验:定期校验数据一致性

3. 实施步骤

3.1 部署主集群

3.1.1 准备主集群

# 下载KubeKey
wget https://github.com/kubesphere/kubekey/releases/download/v3.4.1/kubekey-v3.4.1-linux-amd64.tar.gz
–2026-01-15 10:00:00– https://github.com/kubesphere/kubekey/releases/download/v3.4.1/kubekey-v3.4.1-linux-amd64.tar.gz
Resolving github.com… 140.82.112.4
Connecting to github.com|140.82.112.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘kubekey-v3.4.1-linux-amd64.tar.gz’

# 解压安装包
tar -xzf kubekey-v3.4.1-linux-amd64.tar.gz

# 创建主集群配置文件
cat > primary-cluster-config.yaml <<EOF
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: primary-cluster
spec:
hosts:
– {name: primary-node1, address: 192.168.1.101, internalAddress: 192.168.1.101, user: root, password: “password”}
– {name: primary-node2, address: 192.168.1.102, internalAddress: 192.168.1.102, user: root, password: “password”}
– {name: primary-node3, address: 192.168.1.103, internalAddress: 192.168.1.103, user: root, password: “password”}
roleGroups:
etcd:
– primary-node1
– primary-node2
– primary-node3
control-plane:
– primary-node1
– primary-node2
– primary-node3
worker:
– primary-node1
– primary-node2
– primary-node3
controlPlaneEndpoint:
internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: “”
port: 6443
kubernetes:
version: v1.26.5
clusterName: cluster.local
autoRenewCerts: true
containerManager: containerd
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18

kubeServiceCIDR: 10.233.0.0/18
multusCNI:
enabled: false
registry:
privateRegistry: “”
namespaceOverride: “”
registryMirrors: []
insecureRegistries: []
addons: []

apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.4.1
spec:
persistence:
storageClass: “”
authentication:
jwtSecret: “”
local_registry: “”
namespace_override: “”
dev_tag: “”
etcd:
monitoring: false
endpointIps: 192.168.1.101,192.168.1.102,192.168.1.103
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: nginx
openldap:
enabled: false
redis:
enabled: false
enableHA: false
s3gateway:
enabled: false
minio:
replicas: 1
resources: {}
monitoring:
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
– nvidia.com/gpu
– nvidia.com/gpumig
– nvidia.com/gpumig2
resourceSharing:
enabled: false
openpitrix:
store:
enabled: true
servicemesh:
enabled: true
istio:
components:
ingressGateways:
– name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
– “”
nodeLimit: 100
iptables-manager:
enabled: true
mode: “external”
edgeService:
enabled: false
gatekeeper:
enabled: false
base:
enabled: false
replicas: 1
audit:
enabled: false
replicas: 1
violation:
enabled: false
replicas: 1
networkpolicy:
enabled: true
multicluster:
clusterRole: none
terminal:
timeout: 600
logging:
enabled: true
logsidecar:
enabled: true
replicas: 2
metrics_server:
enabled: false
tracing:
enabled: true
provider: jaeger
jaeger:
replicas: 1
notification:
enabled: true
alerting:
enabled: true
EOF
primary-cluster-config.yaml created

# 创建主集群
./kk create cluster -f primary-cluster-config.yaml
_ __ _ _ __
| | / / | | | | / /
| |/ / _ _| |__ ___| |/ / ___ _ _
| \| | | | ‘_ \ / _ \ \ / _ \ | | |
| |\ \ |_| | |_) | __/ |\ \ __/ |_| |
\_| \_/\__,_|_.__/ \___|_| \_/\___|\__, |
__/ |
|___/


KubeSphere v3.4.1
KubeKey v3.4.1

Cluster creation started!

3.1.2 验证主集群

# 查看集群状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
primary-node1 Ready control-plane 10m v1.26.5
primary-node2 Ready control-plane 10m v1.26.5
primary-node3 Ready control-plane 10m v1.26.5

# 查看KubeSphere状态
kubectl get pods -n kubesphere-system
NAME READY STATUS RESTARTS AGE
ks-apiserver-7d6f8b9c5d-abc123 1/1 Running 0 10m
ks-console-7d6f8b9c5d-def456 1/1 Running 0 10m
ks-controller-manager-7d6f8b9c5d-ghi789 1/1 Running 0 10m

3.2 部署备集群

3.2.1 准备备集群

# 创建备集群配置文件
cat > secondary-cluster-config.yaml <<EOF
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: secondary-cluster
spec:
hosts:
– {name: secondary-node1, address: 192.168.2.101, internalAddress: 192.168.2.101, user: root, password: “password”}
– {name: secondary-node2, address: 192.168.2.102, internalAddress: 192.168.2.102, user: root, password: “password”}
– {name: secondary-node3, address: 192.168.2.103, internalAddress: 192.168.2.103, user: root, password: “password”}
roleGroups:
etcd:
– secondary-node1
– secondary-node2
– secondary-node3
control-plane:
– secondary-node1
– secondary-node2
– secondary-node3
worker:
– secondary-node1
– secondary-node2
– secondary-node3
controlPlaneEndpoint:
internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: “”
port: 6443
kubernetes:
version: v1.26.5
clusterName: cluster.local
autoRenewCerts: true
containerManager: containerd
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
multusCNI:
enabled: false
registry:
privateRegistry: “”
namespaceOverride: “”
registryMirrors: []
insecureRegistries: []
addons: []

apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.4.1
spec:
persistence:
storageClass: “”
authentication:
jwtSecret: “”
local_registry: “”
namespace_override: “”
dev_tag: “”
etcd:
monitoring: false
endpointIps: 192.168.2.101,192.168.2.102,192.168.2.103
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: nginx
openldap:
enabled: false
redis:
enabled: false
enableHA: false
s3gateway:
enabled: false
minio:
replicas: 1
resources: {}
monitoring:
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
– nvidia.com/gpu
– nvidia.com/gpumig
– nvidia.com/gpumig2
resourceSharing:
enabled: false
openpitrix:
store:
enabled: true
servicemesh:
enabled: true
istio:
components:

ingressGateways:
– name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
– “”
nodeLimit: 100
iptables-manager:
enabled: true
mode: “external”
edgeService:
enabled: false
gatekeeper:
enabled: false
base:
enabled: false
replicas: 1
audit:
enabled: false
replicas: 1
violation:
enabled: false
replicas: 1
networkpolicy:
enabled: true
multicluster:
clusterRole: member
terminal:
timeout: 600
logging:
enabled: true
logsidecar:
enabled: true
replicas: 2
metrics_server:
enabled: false
tracing:
enabled: true
provider: jaeger
jaeger:
replicas: 1
notification:
enabled: true
alerting:
enabled: true
EOF
secondary-cluster-config.yaml created

# 创建备集群
./kk create cluster -f secondary-cluster-config.yaml
_ __ _ _ __
| | / / | | | | / /
| |/ / _ _| |__ ___| |/ / ___ _ _
| \| | | | ‘_ \ / _ \ \ / _ \ | | |
| |\ \ |_| | |_) | __/ |\ \ __/ |_| |
\_| \_/\__,_|_.__/ \___|_| \_/\___|\__, |
__/ |
|___/

KubeSphere v3.4.1
KubeKey v3.4.1

Cluster creation started!

3.2.2 验证备集群

# 查看集群状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
secondary-node1 Ready control-plane 10m v1.26.5
secondary-node2 Ready control-plane 10m v1.26.5
secondary-node3 Ready control-plane 10m v1.26.5

# 查看KubeSphere状态
kubectl get pods -n kubesphere-system
NAME READY STATUS RESTARTS AGE
ks-apiserver-7d6f8b9c5d-abc123 1/1 Running 0 10m
ks-console-7d6f8b9c5d-def456 1/1 Running 0 10m
ks-controller-manager-7d6f8b9c5d-ghi789 1/1 Running 0 10m

3.3 配置多集群管理

3.3.1 获取备集群Token

# 在备集群上获取Token
kubectl -n kubesphere-system get cm ks-installer -o yaml | grep -i “cluster_connection_password” | awk ‘{print $2}’ | base64 -d
abc123def456

# 获取备集群API Server地址
kubectl config view –minify | grep server
server: https://192.168.2.101:6443

3.3.2 注册备集群

# 在主集群上注册备集群
cat <<EOF | kubectl apply -f –
apiVersion: cluster.kubesphere.io/v1alpha1
kind: Cluster
metadata:
name: secondary-cluster
labels:
cluster.kubesphere.io/role: member
spec:
connection:
kubeconfig: |-
apiVersion: v1
kind: Config
clusters:
– cluster:
certificate-authority-data: LS0tLS1CRUdJTi…
server: https://192.168.2.101:6443
name: secondary-cluster
contexts:
– context:
cluster: secondary-cluster
user: kubesphere
name: secondary-cluster
current-context: secondary-cluster
users:
– name: kubesphere
user:

token: abc123def456
enabled: true
provider: “”
region: “”
zones:
– “”
EOF
cluster.cluster.kubesphere.io/secondary-cluster created

# 查看集群状态
kubectl get clusters
NAME STATUS AGE
primary-cluster Ready 30m
secondary-cluster Ready 10m

3.4 配置数据同步

3.4.1 部署Velero

# 在主集群上部署Velero
velero install \
–provider aws \
–plugins velero/velero-plugin-for-aws:v1.8.0 \
–bucket velero-backups \
–secret-file ./minio-credentials \
–use-volume-snapshots=false \
–backup-location-config region=minio,s3ForcePathStyle=”true”,s3Url=http://minio.velero.svc.cluster.local:9000
Velero is installed! ⛵ Use ‘kubectl logs deployment/velero -n velero’ to view the logs.

# 在备集群上部署Velero
velero install \
–provider aws \
–plugins velero/velero-plugin-for-aws:v1.8.0 \
–bucket velero-backups \
–secret-file ./minio-credentials \
–use-volume-snapshots=false \
–backup-location-config region=minio,s3ForcePathStyle=”true”,s3Url=http://minio.velero.svc.cluster.local:9000
Velero is installed! ⛵ Use ‘kubectl logs deployment/velero -n velero’ to view the logs.

3.4.2 配置定时备份

# 在主集群上配置定时备份
velero schedule create primary-daily-backup –schedule=”0 2 * * *” –include-namespaces myapp
Schedule “primary-daily-backup” created successfully.

# 查看定时备份
velero schedule get
NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
primary-daily-backup Enabled 2026-01-15 11:00:00 +0000 UTC 0 2 * * * 720h0m0s 1m ago

3.5 配置故障切换

3.5.1 配置DNS故障切换

# 配置DNS故障切换
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: myapp-dns
namespace: default
spec:
type: ExternalName
externalName: myapp.primary-cluster.svc.cluster.local
EOF
service/myapp-dns created

# 配置健康检查
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Endpoints
metadata:
name: myapp-dns
namespace: default
subsets:
– addresses:
– ip: 192.168.1.101
ports:
– port: 80
EOF
endpoints/myapp-dns created

3.5.2 配置负载均衡

# 配置全局负载均衡
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: ConfigMap
metadata:
name: glb-config
namespace: default
data:
config.yaml: |-
clusters:
– name: primary-cluster
endpoint: http://myapp.primary-cluster.svc.cluster.local:80
weight: 100
– name: secondary-cluster
endpoint: http://myapp.secondary-cluster.svc.cluster.local:80
weight: 0
healthCheck:
interval: 30s
timeout: 5s
unhealthyThreshold: 3
healthyThreshold: 2
EOF
configmap/glb-config created

4. 实战案例

4.1 跨地域应用部署

4.1.1 创建应用

# 在主集群上创建应用
kubectl create namespace myapp
namespace/myapp created

# 部署应用
kubectl create deployment nginx –image=nginx:latest -n myapp
deployment.apps/nginx created

# 创建Service

kubectl expose deployment nginx –port=80 -n myapp
service/nginx exposed

# 查看应用
kubectl get all -n myapp
NAME READY STATUS RESTARTS AGE
pod/nginx-7d6f8b9c5d-abc123 1/1 Running 0 1m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx ClusterIP 10.233.123.456 80/TCP 1m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 1m

NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-7d6f8b9c5d 1 1 1 1m

4.1.2 创建备份

# 创建备份
velero backup create myapp-backup –include-namespaces myapp
Backup request “myapp-backup” submitted successfully.
Run `velero backup describe myapp-backup` or `velero backup logs myapp-backup` for more details.

# 等待备份完成
velero backup wait myapp-backup
Backup request “myapp-backup” completed successfully!

# 查看备份
velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
myapp-backup Completed 0 0 2026-01-15 11:10:00 +0000 UTC 29d default

4.1.3 在备集群上恢复

# 在备集群上恢复应用
velero restore create –from-backup myapp-backup
Restore request “myapp-backup-20260115111500” submitted successfully.
Run `velero restore describe myapp-backup-20260115111500` or `velero restore logs myapp-backup-20260115111500` for more details.

# 等待恢复完成
velero restore wait myapp-backup-20260115111500
Restore request “myapp-backup-20260115111500” completed successfully!

# 查看应用
kubectl get all -n myapp
NAME READY STATUS RESTARTS AGE
pod/nginx-7d6f8b9c5d-abc123 1/1 Running 0 30s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx ClusterIP 10.233.123.456 80/TCP 30s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 30s

NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-7d6f8b9c5d 1 1 1 30s

4.2 故障切换演练

4.2.1 模拟主集群故障

# 停止主集群API Server(模拟故障)
ssh primary-node1 “sudo systemctl stop kubelet”
Connection to primary-node1 closed by remote host

# 等待一段时间
sleep 30

# 尝试访问主集群
kubectl get nodes
The connection to the server 192.168.1.101:6443 was refused – did you specify the right host or port?

4.2.2 切换到备集群

# 切换到备集群
kubectl config use-context secondary-cluster
Switched to context “secondary-cluster”.

# 验证备集群状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
secondary-node1 Ready control-plane 30m v1.26.5
secondary-node2 Ready control-plane 30m v1.26.5
secondary-node3 Ready control-plane 30m v1.26.5

# 验证应用状态
kubectl get pods -n myapp
NAME READY STATUS RESTARTS AGE
nginx-7d6f8b9c5d-abc123 1/1 Running 0 10m

# 更新DNS指向备集群
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Endpoints
metadata:
name: myapp-dns
namespace: default
subsets:
– addresses:
– ip: 192.168.2.101
ports:
– port: 80
EOF
endpoints/myapp-dns configured

4.2.3 恢复主集群

# 恢复主集群
ssh primary-node1 “sudo systemctl start kubelet”
Connection to primary-node1 closed by remote host

# 等待主集群恢复
sleep 60

# 切换回主集群
kubectl config use-context primary-cluster
Switched to context “primary-cluster”.

# 验证主集群状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
primary-node1 Ready control-plane 30m v1.26.5
primary-node2 Ready control-plane 30m v1.26.5
primary-node3 Ready control-plane 30m v1.26.5

# 同步数据到主集群
velero backup create secondary-to-primary –include-namespaces myapp
Backup request “secondary-to-primary” submitted successfully.

# 在主集群上恢复
velero restore create –from-backup secondary-to-primary
Restore request “secondary-to-primary-20260115113000” submitted successfully.

# 更新DNS指向主集群
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Endpoints
metadata:
name: myapp-dns
namespace: default
subsets:
– addresses:
– ip: 192.168.1.101
ports:
– port: 80
EOF
endpoints/myapp-dns configured

5. 经验总结

5.1 最佳实践

5.1.1 多集群部署最佳实践

  • 统一管理:使用KubeSphere统一管理多个集群
  • 网络隔离:确保不同地域之间的网络隔离
  • 安全加固:为每个集群配置安全策略
  • 监控告警:配置跨集群监控和告警
  • 定期演练:定期进行故障切换演练

5.1.2 数据同步最佳实践

  • 实时同步:关键数据实时同步
  • 增量同步:使用增量同步减少带宽消耗
  • 数据校验:定期校验数据一致性
  • 冲突解决:建立明确的冲突解决策略
  • 备份验证:定期验证备份的完整性

5.2 常见问题

5.2.1 集群连接问题

  • 问题1:集群无法连接
  • 解决方案:检查网络连接和防火墙配置
  • 问题2:Token过期
  • 解决方案:重新生成Token并更新配置
  • 问题3:API Server不可访问
  • 解决方案:检查API Server状态和网络配置

5.2.2 数据同步问题

  • 问题1:同步失败
  • 解决方案:检查网络连接和存储配置
  • 问题2:同步延迟过高
  • 解决方案:优化同步策略,增加带宽
  • 问题3:数据不一致
  • 解决方案:检查同步配置和冲突解决策略

5.3 性能优化

5.3.1 网络优化

  • 专线连接:使用专线连接提高网络性能
  • 带宽优化:根据需求配置足够的带宽
  • 压缩传输:使用压缩减少数据传输量
  • 缓存优化:使用缓存减少重复传输

5.3.2 同步优化

  • 增量同步:使用增量同步减少数据传输量
  • 并行同步:并行执行同步任务提高效率
  • 优先级同步:根据优先级同步关键数据
  • 批量同步:批量同步减少同步次数

5.4 安全建议

5.4.1 网络安全

  • 加密传输:使用加密保护数据传输
  • 网络隔离:隔离不同地域的网络
  • 访问控制:配置严格的访问控制策略
  • 防火墙规则:配置防火墙规则限制访问

5.4.2 数据安全

  • 数据加密:对敏感数据进行加密
  • 密钥管理:使用密钥管理系统管理密钥
  • 访问审计:启用访问审计记录操作日志
  • 定期备份:定期备份数据确保数据安全

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息