Kubernetes教程FG017-Kubernetes集群主从架构实战解析
本文档风哥主要介绍Kubernetes的集群主从架构,包括主从架构概述、控制平面高可用、工作节点管理、主从架构规划、高可用设计、负载均衡、主节点设置、工作节点设置、集群配置等内容,风哥教程参考Kubernetes官方文档,适合DevOps工程师和系统管理员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 主从架构概述
Kubernetes集群的主从架构是指由一个或多个主节点(Master Node)和多个工作节点(Worker Node)组成的架构。主节点负责集群的控制和管理,工作节点负责运行应用容器。
1.2 控制平面高可用
控制平面高可用是指通过部署多个主节点,确保即使某个主节点故障,集群仍然能够正常运行。控制平面高可用需要考虑API服务器、etcd、调度器和控制器管理器的高可用。
1.3 工作节点管理
工作节点管理是指对工作节点的配置、监控、维护和扩缩容等操作。工作节点是集群中运行应用容器的地方,需要确保工作节点的稳定性和可用性。
Part02-生产环境规划与建议
2.1 主从架构规划
生产环境Kubernetes集群的主从架构规划:
– 数量:3个或5个,确保奇数个,便于etcd集群选举
– 配置:至少4核CPU,16GB内存,100GB存储
– 网络:10Gbps网络带宽,低延迟
– 存储:SSD存储,确保etcd性能
# 工作节点规划
– 数量:根据应用需求,至少2个
– 配置:根据应用需求,至少8核CPU,32GB内存,500GB存储
– 网络:10Gbps网络带宽,低延迟
– 存储:根据应用需求,选择合适的存储方案
# 集群规模规划
– 小型集群:1-3个主节点,2-5个工作节点
– 中型集群:3个主节点,6-10个工作节点
– 大型集群:5个主节点,10个以上工作节点
2.2 高可用设计
生产环境Kubernetes集群的高可用设计:
– API服务器:多个API服务器实例,通过负载均衡器访问
– etcd:etcd集群,至少3个节点,确保数据一致性
– 调度器:多个调度器实例,通过领导者选举机制确保只有一个活跃
– 控制器管理器:多个控制器管理器实例,通过领导者选举机制确保只有一个活跃
# 工作节点高可用
– 节点冗余:部署多个工作节点,确保应用的高可用
– Pod 调度:使用PodDisruptionBudget,确保应用的可用性
– 健康检查:配置Pod和节点的健康检查,及时发现问题
# 网络高可用
– 网络插件:选择高可用的网络插件,如Calico、Flannel等
– 网络策略:配置网络策略,确保网络安全,风哥提示:。
– 负载均衡:使用负载均衡器,确保服务的高可用
# 存储高可用
– 存储方案:选择高可用的存储方案,如Ceph、NFS等
– 存储类:配置多个存储类,满足不同应用的存储需求
– 持久卷:使用持久卷,确保数据的持久性
2.3 负载均衡
生产环境Kubernetes集群的负载均衡:
– 外部负载均衡器:使用硬件负载均衡器或云服务提供商的负载均衡服务
– 内部负载均衡器:使用Kubernetes内置的负载均衡功能
– 健康检查:配置负载均衡器的健康检查,确保流量只发送到健康的节点
# 服务负载均衡
– ClusterIP:集群内部的负载均衡
– NodePort:通过节点端口的负载均衡
– LoadBalancer:使用云服务提供商的负载均衡服务
– Ingress:通过Ingress控制器的负载均衡
# 负载均衡策略
– 轮询:按顺序将请求分发到后端服务器
– 最少连接:将请求分发到连接数最少的后端服务器
– IP哈希:根据客户端IP地址进行哈希,确保同一客户端的请求总是发送到同一后端服务器
– 加权轮询:根据后端服务器的权重分发请求
Part03-生产环境项目实施方案
3.1 主节点设置
生产环境Kubernetes集群的主节点设置:
# 在第一个主节点上初始化集群
$ kubeadm init –control-plane-endpoint=”192.168.1.100:6443″ –upload-certs
# 输出示例
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.1.100:6443 –token abcdef.1234567890abcdef \
–discovery-token-ca-cert-hash sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef \
–control-plane –certificate-key 1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.100:6443 –token abcdef.1234567890abcdef \
–discovery-token-ca-cert-hash sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
# 配置kubectl
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 安装网络插件
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# 检查主节点状态
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 10m v1.24.0
# 添加其他主节点
$ kubeadm join 192.168.1.100:6443 –token abcdef.1234567890abcdef \
–discovery-token-ca-cert-hash sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef \
–control-plane –certificate-key 1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
# 检查所有主节点状态
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 20m v1.24.0
master2 Ready control-plane,master 10m v1.24.0
master3 Ready control-plane,master 5m v1.24.0
3.2 工作节点设置
生产环境Kubernetes集群的工作节点设置:
# 添加工作节点
$ kubeadm join 192.168.1.100:6443 –token abcdef.1234567890abcdef \
–discovery-token-ca-cert-hash sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
# 检查工作节点状态
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 30m v1.24.0
master2 Ready control-plane,master 20m v1.24.0
master3 Ready control-plane,master 15m v1.24.0
worker1 Ready
worker2 Ready
# 给工作节点添加标签
$ kubectl label node worker1 node-role.kubernetes.io/worker=worker
$ kubectl label node worker2 node-role.kubernetes.io/worker=worker
# 检查节点标签
$ kubectl get nodes –show-labels
NAME STATUS ROLES AGE VERSION LABELS
master1 Ready control-plane,master 35m v1.24.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
master2 Ready control-plane,master 25m v1.24.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
master3 Ready control-plane,master 20m v1.24.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master3,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
worker1 Ready worker 15m v1.24.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=worker
worker2 Ready worker 10m v1.24.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=worker
3.3 集群配置
生产环境Kubernetes集群的配置。,风哥提示:。
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6d4b75cb6d-7f5f8 1/1 Running 0 30m
calico-node-4q7k8 1/1 Running 0 30m
calico-node-7c9x6 1/1 Running 0 30m
calico-node-8d2k3 1/1 Running 0 30m
calico-node-9f5g7 1/1 Running 0 30m
calico-node-b7c4d 1/1 Running 0 30m
# 配置集群存储
$ cat > storage-class.yaml << 'EOF' apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: standard provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer EOF $ kubectl apply -f storage-class.yaml # 配置集群监控 $ kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.50.0/bundle.yaml # 配置集群日志 $ kubectl apply -f https://raw.githubusercontent.com/grafana/loki/v2.4.0/production/kubernetes/loki-stack.yaml # 检查集群状态 $ kubectl cluster-info Kubernetes control plane is running at https://192.168.1.100:6443 CoreDNS is running at https://192.168.1.100:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy # 检查集群组件状态 $ kubectl get componentstatuses NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"}
Part04-生产案例与实战讲解
4.1 集群高可用案例
生产环境Kubernetes集群的高可用案例。
# 检查主节点状态
$ kubectl get nodes,学习交流加群风哥QQ113257174。
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 1d v1.24.0
master2 Ready control-plane,master 1d v1.24.0
master3 Ready control-plane,master 1d v1.24.0
worker1 Ready worker 1d v1.24.0
worker2 Ready worker 1d v1.24.0
# 检查控制平面组件状态
$ kubectl get pods -n kube-system | grep -E ‘kube-apiserver|kube-controller-manager|kube-scheduler|etcd’
kube-apiserver-master1 1/1 Running 0 1d
kube-apiserver-master2 1/1 Running 0 1d
kube-apiserver-master3 1/1 Running 0 1d
kube-controller-manager-master1 1/1 Running 0 1d
kube-controller-manager-master2 1/1 Running 0 1d
kube-controller-manager-master3 1/1 Running 0 1d
kube-scheduler-master1 1/1 Running 0 1d
kube-scheduler-master2 1/1 Running 0 1d
kube-scheduler-master3 1/1 Running 0 1d
etcd-master1 1/1 Running 0 1d
etcd-master2 1/1 Running 0 1d
etcd-master3 1/1 Running 0 1d
# 模拟主节点故障
$ ssh root@master1 “systemctl stop kubelet”
# 检查主节点状态
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 NotReady control-plane,master 1d v1.24.0
master2 Ready control-plane,master 1d v1.24.0
master3 Ready control-plane,master 1d v1.24.0
worker1 Ready worker 1d v1.24.0
worker2 Ready worker 1d v1.24.0
# 检查集群状态
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.1.100:6443
CoreDNS is running at https://192.168.1.100:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# 测试集群功能
$ kubectl create deployment nginx –image=nginx
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-6d6f58987b-7f5f8 1/1 Running 0 5m
# 恢复主节点
$ ssh root@master1 “systemctl start kubelet”
# 检查主节点状态
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 1d v1.24.0
master2 Ready control-plane,master 1d v1.24.0
master3 Ready control-plane,master 1d v1.24.0
worker1 Ready worker 1d v1.24.0
worker2 Ready worker 1d v1.24.0
4.2 故障转移案例
生产环境Kubernetes集群的故障转移案例。
# 检查etcd集群状态
$ etcdctl endpoint health –cluster
192.168.1.101:2379 is healthy: successfully committed proposal: took = 3.264801ms
192.168.1.102:2379 is healthy: successfully committed proposal: took = 2.876543ms
192.168.1.103:2379 is healthy: successfully committed proposal: took = 3.123456ms
# 模拟etcd节点故障
$ ssh root@master1 “systemctl stop etcd”
# 检查etcd集群状态
$ etcdctl endpoint health –cluster
192.168.1.101:2379 is unhealthy: failed to connect: context deadline exceeded
192.168.1.102:2379 is healthy: successfully committed proposal: took = 2.987654ms
192.168.1.103:2379 is healthy: successfully committed proposal: took = 3.012345ms
# 检查集群状态
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.1.100:6443
CoreDNS is running at https://192.168.1.100:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# 测试集群功能
$ kubectl create deployment apache –image=httpd,更多视频教程www.fgedu.net.cn。
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-6d6f58987b-7f5f8 1/1 Running 0 10m
apache-7d6f58987b-8d2k3 1/1 Running 0 5m
# 恢复etcd节点
$ ssh root@master1 “systemctl start etcd”
# 检查etcd集群状态
$ etcdctl endpoint health –cluster
192.168.1.101:2379 is healthy: successfully committed proposal: took = 3.345678ms
192.168.1.102:2379 is healthy: successfully committed proposal: took = 2.765432ms
192.168.1.103:2379 is healthy: successfully committed proposal: took = 3.098765ms
4.3 负载均衡案例
生产环境Kubernetes集群的负载均衡案例。
# 创建Deployment
$ cat > nginx-deployment.yaml << 'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: nginx labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 EOF $ kubectl apply -f nginx-deployment.yaml # 创建Service $ cat > nginx-service.yaml << 'EOF' apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: selector: app: nginx ports: - port: 80 targetPort: 80 type: LoadBalancer EOF $ kubectl apply -f nginx-service.yaml # 检查Pod状态 $ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-6d6f58987b-7f5f8 1/1 Running 0 5m nginx-6d6f58987b-8d2k3 1/1 Running 0 5m nginx-6d6f58987b-9f5g7 1/1 Running 0 5m # 检查Service状态 $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1
nginx LoadBalancer 10.100.123.45 192.168.1.200 80:30080/TCP 5m
# 测试负载均衡
$ for i in {1..10}; do curl http://192.168.1.200; done
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
Welcome to nginx!
。
Part05-风哥经验总结与分享
5.1 主从架构最佳实践
Kubernetes集群主从架构的最佳实践。。
- 主节点数量:使用奇数个主节点,如3个或5个,确保etcd集群选举的可靠性
- 硬件配置:主节点配置足够的CPU、内存和存储,确保控制平面的性能
- 网络配置:使用高带宽、低延迟的网络,确保控制平面组件之间的通信
- 存储配置:使用SSD存储,确保etcd的性能和可靠性
- 负载均衡:使用负载均衡器,确保API服务器的高可用
- 监控告警:建立完善的监控系统,及时发现和处理问题
- 备份策略:定期备份etcd数据,确保数据可恢复
- 灾备方案:制定灾备方案,确保在发生灾难时能够快速恢复
5.2 高可用优化
Kubernetes集群高可用的优化:
- 控制平面优化:优化API服务器、etcd、调度器和控制器管理器的配置,提高性能和可靠性
- 网络优化:优化网络配置,减少网络延迟和丢包率
- 存储优化:优化存储配置,提高存储性能和可靠性
- 资源管理:合理配置资源,避免资源不足导致的故障
- 故障检测:配置合理的健康检查和故障检测机制,及时发现和处理故障
- 故障转移:优化故障转移机制,减少故障转移时间
- 负载均衡:优化负载均衡配置,提高服务的可用性和性能
- 监控优化:优化监控系统,提高监控的准确性和实时性
5.3 集群管理技巧
Kubernetes集群管理的技巧:
- 定期检查:定期检查集群的状态和健康状况,及时发现问题
- 版本管理:定期更新集群版本,保持系统的安全性和稳定性
- 配置管理:使用配置管理工具,管理集群的配置
- 自动化:使用自动化工具,减少人工操作,提高管理效率
- 文档管理:建立完善的文档,记录集群的配置和操作
- 培训:对运维人员进行培训,提高集群管理能力
- 演练:定期进行故障演练,提高应急响应能力
- 合作:与团队成员和社区合作,共同解决问题
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
