Kubernetes教程FG031-集群主从架构原理与实战解析
本文档风哥主要介绍Kubernetes集群主从架构的原理与实战,包括主从架构概述、Kubernetes主从架构、高可用架构、主从架构规划、高可用规划、最佳实践规划、主从架构实施方案、高可用实施方案、备份与恢复实施方案、主从架构案例、高可用案例、备份与恢复案例等内容,风哥教程参考Kubernetes官方文档和高可用架构相关文档,适合DevOps工程师和系统管理员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 主从架构概述
主从架构是一种常见的分布式系统架构,由一个主节点(Master)和多个从节点(Slave)组成。主节点负责管理和协调整个系统,从节点负责执行具体的任务。主从架构的主要特点包括:
- 集中管理:主节点集中管理整个系统,简化管理复杂度
- 任务分配:主节点将任务分配给从节点执行,提高系统效率
- 数据同步:主节点与从节点之间进行数据同步,确保数据一致性
- 容错机制:当主节点故障时,从节点可以接管主节点的工作,确保系统可用性
1.2 Kubernetes主从架构
Kubernetes采用主从架构,由控制平面(Control Plane)和工作节点(Worker Nodes)组成。控制平面作为主节点,负责管理整个集群;工作节点作为从节点,负责运行容器化应用。
Kubernetes控制平面组件包括:
- kube-apiserver:API服务器,处理所有的API请求
- etcd:分布式键值存储,存储集群配置和状态
- kube-scheduler:调度器,负责Pod的调度
- kube-controller-manager:控制器管理器,管理各种控制器
- cloud-controller-manager:云控制器管理器,与云服务提供商交互
Kubernetes工作节点组件包括:
- kubelet:节点代理,管理容器的生命周期
- kube-proxy:网络代理,负责网络规则和负载均衡
- 容器运行时:如Docker、containerd等,负责容器的运行
1.3 高可用架构
Kubernetes高可用架构是指通过部署多个控制平面节点,确保当某个控制平面节点故障时,其他节点可以接管其工作,从而保证集群的可用性。高可用架构的主要特点包括:
- 多控制平面节点:部署多个控制平面节点,避免单点故障
- 负载均衡:通过负载均衡器分发API请求到多个控制平面节点
- etcd集群:部署etcd集群,确保数据的一致性和可用性
- 自动故障转移:当控制平面节点故障时,自动切换到其他节点
Part02-生产环境规划与建议
2.1 主从架构规划
生产环境Kubernetes主从架构规划:
– 节点规划:
– 控制平面节点:至少1个,生产环境建议3个或5个
– 工作节点:根据应用需求确定数量
– 节点配置:根据负载和应用需求确定CPU、内存、存储配置
– 网络规划:
– 控制平面网络:确保控制平面节点之间的网络连通性
– 工作节点网络:确保工作节点之间的网络连通性
– 服务网络:为集群服务分配CIDR
– Pod网络:为Pod分配CIDR
– 存储规划:
– etcd存储:使用高性能存储,如SSD
– 应用存储:根据应用需求选择存储类型
– 备份存储:用于存储备份数据
– 安全规划:
– 网络策略:限制Pod之间的网络通信
– RBAC:配置基于角色的访问控制
– 证书管理:使用TLS加密通信
– 密钥管理:安全存储密钥和证书
– 监控规划:
– 控制平面监控:监控控制平面组件的状态
– 工作节点监控:监控工作节点的状态
– 应用监控:监控应用的运行状态
– 网络监控:监控网络性能和连通性
– 备份规划:
– etcd备份:定期备份etcd数据
– 应用数据备份:定期备份应用数据
– 配置备份:备份集群配置
– 扩展性规划:
– 控制平面扩展:根据负载扩展控制平面节点
– 工作节点扩展:根据应用需求扩展工作节点
– 存储扩展:根据数据增长扩展存储
– 升级规划:
– 控制平面升级:滚动升级控制平面组件
– 工作节点升级:滚动升级工作节点
– 应用升级:滚动升级应用
2.2 高可用规划
生产环境Kubernetes高可用规划:
– 控制平面高可用:
– 部署3个或5个控制平面节点
– 配置负载均衡器,分发API请求
– 使用etcd集群,确保数据一致性
– 配置健康检查,及时发现故障节点
– 工作节点高可用:
– 部署多个工作节点,避免单点故障
– 使用Pod反亲和性,确保应用分布在不同节点
– 配置Pod存活和就绪探针,确保应用健康
– 实施自动扩缩容,根据负载调整Pod数量
– 网络高可用:
– 使用高可用的网络插件,如Calico、Flannel等
– 配置网络策略,提高网络安全性
– 监控网络性能,及时发现网络问题
– 实施网络隔离,提高网络可靠性
– 存储高可用:
– 使用高可用的存储解决方案,如Ceph、GlusterFS等
– 配置PersistentVolumeClaim,确保数据持久化
– 实施存储备份和恢复策略
– 监控存储性能,及时发现存储问题
– 应用高可用:
– 部署多个Pod副本,确保应用可用性
– 使用Deployment或StatefulSet,管理应用生命周期
– 配置服务发现和负载均衡
– 实施健康检查,及时发现应用故障
– 灾备规划:
– 建立跨区域或跨云的灾备集群
– 实施数据同步,确保灾备集群数据一致性
– 定期进行灾备演练,确保灾备方案有效
– 建立灾难恢复流程,应对突发情况
– 监控和告警:
– 部署Prometheus和Grafana,监控集群状态
– 配置告警规则,及时通知异常情况
– 建立监控仪表盘,直观查看系统状态
– 定期分析监控数据,优化系统性能
– 运维自动化:
– 实施CI/CD流程,自动化部署和升级
– 配置自动扩缩容,根据负载调整资源
– 建立自动化备份和恢复流程
– 自动化测试和验证,确保系统可靠性
2.3 最佳实践规划
生产环境Kubernetes集群主从架构的最佳实践规划:
– 架构设计:
– 采用多控制平面节点,确保高可用性
– 使用负载均衡器,分发API请求
– 部署etcd集群,确保数据一致性
– 设计合理的网络架构,确保网络可靠性
– 资源规划:
– 为控制平面节点分配足够的CPU和内存
– 为工作节点分配足够的资源,满足应用需求
– 配置资源请求和限制,避免资源争用
– 实施资源配额,限制命名空间的资源使用
– 安全管理:
– 配置RBAC,限制访问权限
– 使用网络策略,限制Pod之间的通信
– 实施TLS加密,保护通信安全
– 定期安全审计,发现和修复安全漏洞
– 监控和告警:
– 部署全面的监控系统,监控集群状态
– 配置合理的告警规则,及时发现异常
– 建立监控仪表盘,直观查看系统状态
– 定期分析监控数据,优化系统性能
– 备份和恢复:
– 定期备份etcd数据,确保数据安全
– 实施应用数据备份,确保业务连续性
– 测试备份恢复流程,确保备份有效性
– 建立灾难恢复计划,应对突发情况
– 升级和迁移:
– 制定详细的升级计划,确保升级安全
– 测试升级过程,避免升级失败
– 建立回滚机制,应对升级问题
– 规划应用迁移策略,确保迁移平滑
– 文档和流程:
– 编写详细的架构文档,指导系统设计
– 建立操作手册,指导日常运维
– 记录常见问题和解决方案,便于故障处理
– 定期更新文档,保持文档时效性
– 团队协作:
– 建立跨团队协作机制,共同维护系统
– 明确责任分工,确保系统稳定运行
– 定期召开技术会议,讨论系统优化
– 分享经验和知识,提高团队能力
Part03-生产环境项目实施方案
3.1 主从架构实施方案
生产环境Kubernetes主从架构实施方案:
– 单控制平面节点部署:
1. 准备控制平面节点:
$ hostnamectl set-hostname fgedu-master
$ echo “192.168.1.100 fgedu-master” >> /etc/hosts
2. 安装Docker:
$ sudo yum install -y yum-utils
$ sudo yum-config-manager –add-repo https://download.docker.com/linux/centos/docker-ce.repo
$ sudo yum install -y docker-ce docker-ce-cli containerd.io
$ sudo systemctl start docker
$ sudo systemctl enable docker
3. 安装kubeadm、kubelet和kubectl:
$ cat <
8. 安装Docker和Kubernetes组件:
# 同控制平面节点安装步骤
9. 加入工作节点:
$ sudo kubeadm join 192.168.1.100:6443 –token
10. 验证集群状态:,风哥提示:。
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
– 多控制平面节点部署:
1. 准备控制平面节点:
$ hostnamectl set-hostname fgedu-master1
$ echo “192.168.1.100 fgedu-master1” >> /etc/hosts
$ echo “192.168.1.101 fgedu-master2” >> /etc/hosts
$ echo “192.168.1.102 fgedu-master3″ >> /etc/hosts
2. 安装Docker和Kubernetes组件:
# 同单控制平面节点安装步骤
3. 初始化第一个控制平面节点:
$ sudo kubeadm init –control-plane-endpoint=”192.168.1.200:6443” –upload-certs –pod-network-cidr=10.244.0.0/16
4. 配置kubectl:
# 同单控制平面节点配置步骤
5. 安装网络插件:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
6. 加入其他控制平面节点:
$ sudo kubeadm join 192.168.1.200:6443 –token
7. 加入工作节点:
$ sudo kubeadm join 192.168.1.200:6443 –token
8. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
3.2 高可用实施方案
生产环境Kubernetes高可用实施方案。,风哥提示:。
– 负载均衡器配置:
1. 部署HAProxy:
$ sudo yum install -y haproxy
$ sudo cat > /etc/haproxy/haproxy.cfg << 'EOF' global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 frontend kubernetes-frontend bind 192.168.1.200:6443 mode tcp option tcplog default_backend kubernetes-backend backend kubernetes-backend mode tcp option tcplog option tcp-check balance roundrobin server fgedu-master1 192.168.1.100:6443 check fall 3 rise 2 server fgedu-master2 192.168.1.101:6443 check fall 3 rise 2 server fgedu-master3 192.168.1.102:6443 check fall 3 rise 2 EOF $ sudo systemctl start haproxy $ sudo systemctl enable haproxy 2. 部署Keepalived: $ sudo yum install -y keepalived $ sudo cat > /etc/keepalived/keepalived.conf << 'EOF' global_defs { notification_email { admin@example.com } notification_email_from keepalived@example.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id LVS_DEVEL } vrrp_script check_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 101 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.1.200 } track_script { check_haproxy } } EOF $ sudo systemctl start keepalived $ sudo systemctl enable keepalived - etcd集群配置: 1. 检查etcd状态: $ kubectl get pods -n kube-system | grep etcd etcd-fgedu-master1 1/1 Running 0 1d etcd-fgedu-master2 1/1 Running 0 1d etcd-fgedu-master3 1/1 Running 0 1d 2. 备份etcd数据: $ kubectl -n kube-system exec etcd-fgedu-master1 -- etcdctl snapshot save /tmp/etcd-snapshot.db --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key 3. 恢复etcd数据: $ kubectl -n kube-system exec etcd-fgedu-master1 -- etcdctl snapshot restore /tmp/etcd-snapshot.db --data-dir=/var/lib/etcd-new - 高可用测试: 1. 模拟控制平面节点故障: $ ssh fgedu-master1 sudo systemctl stop kube-apiserver 2. 验证集群状态: $ kubectl get nodes NAME STATUS ROLES AGE VERSION fgedu-master1 NotReady control-plane,master 1d v1.24.0 fgedu-master2 Ready control-plane,master 1d v1.24.0 fgedu-master3 Ready control-plane,master 1d v1.24.0 fgedu-node1 Ready
fgedu-node2 Ready
3. 验证API访问:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
fgedu-app-6d6f58987b-7f5f8 1/1 Running 0 1d
4. 恢复控制平面节点:
$ ssh fgedu-master1 sudo systemctl start kube-apiserver
5. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0,学习交流加群风哥微信: itpux-com。
fgedu-node1 Ready
fgedu-node2 Ready
3.3 备份与恢复实施方案
生产环境Kubernetes备份与恢复实施方案:
– etcd备份:
1. 定期备份etcd数据:
$ cat > etcd-backup.sh << 'EOF' #!/bin/bash # etcd-backup.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn TIMESTAMP=$(date +%Y%m%d%H%M%S) BACKUP_DIR="/backup/etcd" mkdir -p $BACKUP_DIR kubectl -n kube-system exec etcd-$(hostname) -- etcdctl snapshot save $BACKUP_DIR/etcd-snapshot-$TIMESTAMP.db --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key # 保留最近7天的备份 find $BACKUP_DIR -name "etcd-snapshot-*.db" -mtime +7 -delete EOF $ chmod +x etcd-backup.sh $ sudo mv etcd-backup.sh /usr/local/bin/ 2. 添加到crontab: $ crontab -e 0 0 * * * /usr/local/bin/etcd-backup.sh - etcd恢复: 1. 停止kube-apiserver: $ sudo systemctl stop kube-apiserver 2. 恢复etcd数据: $ sudo ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd/etcd-snapshot-20240101000000.db --data-dir=/var/lib/etcd-new --initial-cluster=fgedu-master1=https://192.168.1.100:2380,fgedu-master2=https://192.168.1.101:2380,fgedu-master3=https://192.168.1.102:2380 --initial-cluster-token=etcd-cluster-1 --initial-advertise-peer-urls=https://192.168.1.100:2380 3. 修改etcd配置: $ sudo vi /etc/kubernetes/manifests/etcd.yaml # 将--data-dir参数修改为/var/lib/etcd-new 4. 重启etcd: $ sudo systemctl restart etcd 5. 启动kube-apiserver: $ sudo systemctl start kube-apiserver - 应用数据备份: 1. 使用Velero备份: $ kubectl apply -f https://github.com/vmware-tanzu/velero/releases/download/v1.9.0/velero-install.yaml 2. 创建备份: $ velero backup create fgedu-backup --include-namespaces default 3. 恢复备份: $ velero restore create --from-backup fgedu-backup - 集群配置备份: 1. 备份集群配置: $ cat > cluster-backup.sh << 'EOF' #!/bin/bash # cluster-backup.sh # from:www.itpux.com.qq113257174.wx:itpux-com # web: http://www.fgedu.net.cn TIMESTAMP=$(date +%Y%m%d%H%M%S) BACKUP_DIR="/backup/cluster" mkdir -p $BACKUP_DIR # 备份kubeconfig cp $HOME/.kube/config $BACKUP_DIR/kubeconfig-$TIMESTAMP # 备份集群信息 kubectl get nodes -o yaml > $BACKUP_DIR/nodes-$TIMESTAMP.yaml
kubectl get namespaces -o yaml > $BACKUP_DIR/namespaces-$TIMESTAMP.yaml
kubectl get deployments –all-namespaces -o yaml > $BACKUP_DIR/deployments-$TIMESTAMP.yaml
kubectl get services –all-namespaces -o yaml > $BACKUP_DIR/services-$TIMESTAMP.yaml
kubectl get configmaps –all-namespaces -o yaml > $BACKUP_DIR/configmaps-$TIMESTAMP.yaml
kubectl get secrets –all-namespaces -o yaml > $BACKUP_DIR/secrets-$TIMESTAMP.yaml
# 保留最近7天的备份
find $BACKUP_DIR -name “*-*.yaml” -o -name “kubeconfig-*” | grep -v “etcd” | xargs find -mtime +7 -delete
EOF
$ chmod +x cluster-backup.sh
$ sudo mv cluster-backup.sh /usr/local/bin/
2. 添加到crontab:
$ crontab -e
0 1 * * * /usr/local/bin/cluster-backup.sh
Part04-生产案例与实战讲解
4.1 主从架构案例
生产环境Kubernetes主从架构的案例。。
# 场景:部署单控制平面节点的Kubernetes集群,用于测试和开发环境
# 问题:
– 需要部署Kubernetes集群
– 资源有限,只能部署单控制平面节点
– 用于测试和开发环境,对高可用性要求不高
# 解决方案:
1. 准备控制平面节点:
$ hostnamectl set-hostname fgedu-master
$ echo “192.168.1.100 fgedu-master” >> /etc/hosts
2. 安装Docker:
$ sudo yum install -y yum-utils
$ sudo yum-config-manager –add-repo https://download.docker.com/linux/centos/docker-ce.repo
$ sudo yum install -y docker-ce docker-ce-cli containerd.io
$ sudo systemctl start docker
$ sudo systemctl enable docker
3. 安装kubeadm、kubelet和kubectl:
$ cat <
8. 安装Docker和Kubernetes组件:
# 同控制平面节点安装步骤
9. 加入工作节点:
$ sudo kubeadm join 192.168.1.100:6443 –token
10. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
11. 部署应用:
$ cat > fgedu-app-deployment.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: fgedu-app namespace: default spec: replicas: 2 selector: matchLabels:,学习交流加群风哥QQ113257174。 app: fgedu-app template: metadata: labels: app: fgedu-app spec: containers: - name: fgedu-app image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: fgedu-app namespace: default spec: selector: app: fgedu-app ports: - port: 80 targetPort: 80 type: LoadBalancer EOF $ kubectl apply -f fgedu-app-deployment.yaml 12. 验证应用状态: $ kubectl get pods NAME READY STATUS RESTARTS AGE fgedu-app-6d6f58987b-7f5f8 1/1 Running 0 5m fgedu-app-6d6f58987b-8d2k3 1/1 Running 0 5m $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE fgedu-app LoadBalancer 10.96.123.45 192.168.1.200 80:30080/TCP 5m kubernetes ClusterIP 10.96.0.1
# 案例:多控制平面节点集群部署
# 场景:部署多控制平面节点的Kubernetes集群,用于生产环境
# 问题:
– 需要部署高可用的Kubernetes集群
– 生产环境对可用性要求高,需要避免单点故障
– 需要确保集群的稳定性和可靠性
# 解决方案:
1. 准备控制平面节点:
$ hostnamectl set-hostname fgedu-master1
$ echo “192.168.1.100 fgedu-master1” >> /etc/hosts
$ echo “192.168.1.101 fgedu-master2” >> /etc/hosts
$ echo “192.168.1.102 fgedu-master3” >> /etc/hosts
2. 安装Docker和Kubernetes组件:
# 同单控制平面节点安装步骤
3. 部署负载均衡器:
$ sudo yum install -y haproxy keepalived
$ sudo cat > /etc/haproxy/haproxy.cfg << 'EOF' global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 frontend kubernetes-frontend bind 192.168.1.200:6443 mode tcp option tcplog default_backend kubernetes-backend backend kubernetes-backend mode tcp option tcplog option tcp-check balance roundrobin server fgedu-master1 192.168.1.100:6443 check fall 3 rise 2 server fgedu-master2 192.168.1.101:6443 check fall 3 rise 2 server fgedu-master3 192.168.1.102:6443 check fall 3 rise 2 EOF $ sudo cat > /etc/keepalived/keepalived.conf << 'EOF' global_defs { notification_email { admin@example.com } notification_email_from keepalived@example.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id LVS_DEVEL } vrrp_script check_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 101 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.1.200 } track_script { check_haproxy } } EOF $ sudo systemctl start haproxy keepalived $ sudo systemctl enable haproxy keepalived 4. 初始化第一个控制平面节点: $ sudo kubeadm init --control-plane-endpoint="192.168.1.200:6443" --upload-certs --pod-network-cidr=10.244.0.0/16 5. 配置kubectl: # 同单控制平面节点配置步骤 6. 安装网络插件: $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml 7. 加入其他控制平面节点: $ sudo kubeadm join 192.168.1.200:6443 --token
8. 加入工作节点:
$ sudo kubeadm join 192.168.1.200:6443 –token
9. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
10. 部署应用:
$ cat > fgedu-app-deployment.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: fgedu-app namespace: default spec:,更多视频教程www.fgedu.net.cn。 replicas: 3 selector: matchLabels: app: fgedu-app template: metadata: labels: app: fgedu-app spec: containers: - name: fgedu-app image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: fgedu-app namespace: default spec: selector: app: fgedu-app ports: - port: 80 targetPort: 80 type: LoadBalancer EOF $ kubectl apply -f fgedu-app-deployment.yaml 11. 验证应用状态: $ kubectl get pods NAME READY STATUS RESTARTS AGE fgedu-app-6d6f58987b-7f5f8 1/1 Running 0 5m fgedu-app-6d6f58987b-8d2k3 1/1 Running 0 5m fgedu-app-6d6f58987b-9e3l4 1/1 Running 0 5m $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE fgedu-app LoadBalancer 10.96.123.45 192.168.1.201 80:30080/TCP 5m kubernetes ClusterIP 10.96.0.1
4.2 高可用案例
生产环境Kubernetes高可用的案例。。
# 场景:测试Kubernetes控制平面的高可用性,模拟控制平面节点故障
# 问题:
– 需要验证控制平面的高可用性
– 需要模拟控制平面节点故障
– 需要确保集群在控制平面节点故障时仍能正常运行
# 解决方案:
1. 部署多控制平面节点集群:
# 按照多控制平面节点部署步骤部署集群
2. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
3. 部署测试应用:
$ cat > test-app-deployment.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: test-app namespace: default spec: replicas: 3 selector: matchLabels: app: test-app template: metadata: labels: app: test-app spec: containers: - name: test-app image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: test-app namespace: default spec: selector: app: test-app ports: - port: 80 targetPort: 80 type: LoadBalancer EOF $ kubectl apply -f test-app-deployment.yaml 4. 验证测试应用状态: $ kubectl get pods NAME READY STATUS RESTARTS AGE test-app-6d6f58987b-7f5f8 1/1 Running 0 5m test-app-6d6f58987b-8d2k3 1/1 Running 0 5m test-app-6d6f58987b-9e3l4 1/1 Running 0 5m $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE test-app LoadBalancer 10.96.123.46 192.168.1.202 80:30081/TCP 5m kubernetes ClusterIP 10.96.0.1
5. 模拟控制平面节点故障:
$ ssh fgedu-master1 sudo systemctl stop kube-apiserver
6. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 NotReady control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
7. 验证API访问:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-app-6d6f58987b-7f5f8 1/1 Running 0 10m
test-app-6d6f58987b-8d2k3 1/1 Running 0 10m
test-app-6d6f58987b-9e3l4 1/1 Running 0 10m
8. 验证应用访问:
$ curl http://192.168.1.202:80
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to,更多学习教程公众号风哥教程itpux_com。
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
9. 恢复控制平面节点:
$ ssh fgedu-master1 sudo systemctl start kube-apiserver
10. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
# 案例:工作节点高可用测试
# 场景:测试Kubernetes工作节点的高可用性,模拟工作节点故障
# 问题:
– 需要验证工作节点的高可用性
– 需要模拟工作节点故障
– 需要确保应用在工作节点故障时仍能正常运行
# 解决方案:
1. 部署多控制平面节点集群:
# 按照多控制平面节点部署步骤部署集群
2. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
3. 部署测试应用:
$ cat > test-app-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: test-app
image: nginx:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: test-app
namespace: default
spec:
selector:
app: test-app
ports:
- port: 80
targetPort: 80
type: LoadBalancer
EOF
$ kubectl apply -f test-app-deployment.yaml
4. 验证测试应用状态:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-app-6d6f58987b-7f5f8 1/1 Running 0 5m 10.244.1.2 fgedu-node1
test-app-6d6f58987b-8d2k3 1/1 Running 0 5m 10.244.2.2 fgedu-node2
test-app-6d6f58987b-9e3l4 1/1 Running 0 5m 10.244.1.3 fgedu-node1
5. 模拟工作节点故障:
$ ssh fgedu-node1 sudo poweroff
6. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 NotReady
fgedu-node2 Ready
7. 验证应用状态:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-app-6d6f58987b-7f5f8 1/1 Terminating 0 10m
test-app-6d6f58987b-8d2k3 1/1 Running 0 10m
test-app-6d6f58987b-9e3l4 1/1 Terminating 0 10m
test-app-6d6f58987b-abcde 1/1 Running 0 1m
test-app-6d6f58987b-fghij 1/1 Running 0 1m
8. 验证应用访问:
$ curl http://192.168.1.202:80
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
9. 恢复工作节点:
# 重启工作节点服务器
10. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
4.3 备份与恢复案例
生产环境Kubernetes备份与恢复的案例。
# 场景:备份和恢复etcd数据,确保集群数据安全
# 问题:
– 需要定期备份etcd数据
– 需要在集群故障时恢复etcd数据
– 需要确保数据的一致性和可靠性
。from K8S+DB视频:www.itpux.com。
# 解决方案:
1. 创建etcd备份脚本:
$ cat > etcd-backup.sh << 'EOF'
#!/bin/bash
# etcd-backup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
TIMESTAMP=$(date +%Y%m%d%H%M%S)
BACKUP_DIR="/backup/etcd"
mkdir -p $BACKUP_DIR
kubectl -n kube-system exec etcd-$(hostname) -- etcdctl snapshot save $BACKUP_DIR/etcd-snapshot-$TIMESTAMP.db --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
# 保留最近7天的备份
find $BACKUP_DIR -name "etcd-snapshot-*.db" -mtime +7 -delete
EOF
$ chmod +x etcd-backup.sh
$ sudo mv etcd-backup.sh /usr/local/bin/
2. 添加到crontab:
$ crontab -e
0 0 * * * /usr/local/bin/etcd-backup.sh
3. 执行备份:
$ sudo /usr/local/bin/etcd-backup.sh
$ ls -la /backup/etcd/
total 102400
drwxr-xr-x 2 root root 4096 Jan 1 00:00 .
drwxr-xr-x 3 root root 4096 Jan 1 00:00 ..
-rw-r--r-- 1 root root 104857600 Jan 1 00:00 etcd-snapshot-20240101000000.db
4. 模拟etcd故障:
$ ssh fgedu-master1 sudo rm -rf /var/lib/etcd/*
$ ssh fgedu-master1 sudo systemctl restart etcd
5. 验证集群状态:
$ kubectl get nodes
The connection to the server 192.168.1.200:6443 was refused - did you specify the right host or port?
6. 恢复etcd数据:
$ ssh fgedu-master1 sudo ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd/etcd-snapshot-20240101000000.db --data-dir=/var/lib/etcd-new --initial-cluster=fgedu-master1=https://192.168.1.100:2380,fgedu-master2=https://192.168.1.101:2380,fgedu-master3=https://192.168.1.102:2380 --initial-cluster-token=etcd-cluster-1 --initial-advertise-peer-urls=https://192.168.1.100:2380
7. 修改etcd配置:
$ ssh fgedu-master1 sudo vi /etc/kubernetes/manifests/etcd.yaml
# 将--data-dir参数修改为/var/lib/etcd-new
8. 重启etcd:
$ ssh fgedu-master1 sudo systemctl restart etcd
9. 验证集群状态:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-master1 Ready control-plane,master 1d v1.24.0
fgedu-master2 Ready control-plane,master 1d v1.24.0
fgedu-master3 Ready control-plane,master 1d v1.24.0
fgedu-node1 Ready
fgedu-node2 Ready
# 案例:应用数据备份与恢复
# 场景:备份和恢复应用数据,确保业务连续性
# 问题:
– 需要定期备份应用数据
– 需要在应用故障时恢复应用数据
– 需要确保数据的一致性和可靠性
# 解决方案:
1. 安装Velero:
$ kubectl apply -f https://github.com/vmware-tanzu/velero/releases/download/v1.9.0/velero-install.yaml
2. 创建备份:
$ velero backup create fgedu-backup –include-namespaces default
3. 验证备份:
$ velero backup get
NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
fgedu-backup Completed 2024-01-01 00:00:00 +0000 UTC 29d default
4. 模拟应用故障:
$ kubectl delete deployment test-app
deployment.apps “test-app” deleted
5. 验证应用状态:
$ kubectl get pods
No resources found in default namespace.
6. 恢复备份:
$ velero restore create –from-backup fgedu-backup
7. 验证应用状态:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-app-6d6f58987b-7f5f8 1/1 Running 0 5m
test-app-6d6f58987b-8d2k3 1/1 Running 0 5m
test-app-6d6f58987b-9e3l4 1/1 Running 0 5m
8. 验证应用访问:
$ curl http://192.168.1.202:80
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
