本文档风哥主要介绍Rancher节点管理与维护模式实战,包括Rancher数据库节点概念、Rancher数据库节点角色、Rancher数据库维护模式、Rancher数据库节点准备、Rancher数据库节点要求、Rancher数据库节点规划、Rancher数据库添加节点、Rancher数据库移除节点、Rancher数据库维护节点、Rancher数据库驱逐节点、Rancher数据库封锁节点、Rancher数据库优化节点等内容,风哥教程参考Rancher官方文档节点、维护模式、节点管理等内容,适合运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 Rancher数据库节点概念
Rancher数据库节点是Kubernetes集群中的工作节点,负责运行Pod。节点可以是物理服务器、虚拟机或云主机。节点需要安装Kubernetes组件(kubelet、kube-proxy)和容器运行时(如Docker、containerd)。更多视频教程www.fgedu.net.cn
- 工作节点:运行Pod
- 资源管理:管理CPU、内存、存储
- 网络管理:管理网络配置
- 健康检查:检查节点健康状态
- 自动扩展:支持自动扩展
1.2 Rancher数据库节点角色
Rancher数据库节点角色是指节点在集群中的角色,如控制平面节点(Control Plane)、工作节点(Worker)、ETCD节点等。控制平面节点负责管理集群,工作节点负责运行Pod,ETCD节点负责存储集群数据。学习交流加群风哥微信: itpux-com
- 控制平面:管理集群
- 工作节点:运行Pod
- ETCD节点:存储数据
- 混合节点:多种角色
- 自定义角色:自定义角色
1.3 Rancher数据库维护模式
Rancher数据库维护模式是指将节点设置为维护状态,避免调度新的Pod。维护模式包括封锁(Cordon)和驱逐(Drain)。封锁节点禁止调度新的Pod,驱逐节点会驱逐所有Pod。学习交流加群风哥QQ113257174
- 封锁节点:禁止调度新Pod
- 驱逐节点:驱逐所有Pod
- 维护窗口:维护时间窗口
- 自动恢复:自动恢复正常
- 通知机制:通知维护状态
Part02-生产环境规划与建议
2.1 Rancher数据库节点准备
Rancher数据库节点准备:
# 1. 操作系统准备
– 操作系统:Oracle Linux 9.3 / RHEL 9.3 / 8.x / 7.x
– 内核版本:>= 5.4.0
– 系统更新:最新补丁
# 2. 硬件准备
– CPU:>= 4核
– 内存:>= 8GB
– 磁盘:>= 100GB
– 网络:>= 1Gbps
# 3. 网络准备
– 网络互通:集群网络互通
– DNS配置:配置DNS
– 时间同步:配置NTP
– 防火墙:开放必要端口
# 4. 软件准备
– Docker:>= 20.10.0
– containerd:>= 1.6.0
– Kubernetes:>= 1.23.0
– RKE2:>= 1.25.0
# 5. 配置准备
– 主机名:配置主机名
– SSH密钥:配置SSH密钥
– 用户权限:配置用户权限
– 系统参数:配置系统参数
2.2 Rancher数据库节点要求
Rancher数据库节点要求:
# 控制平面节点要求
CPU:>= 4核
内存:>= 8GB
磁盘:>= 100GB
网络:>= 1Gbps
副本数:>= 3个
# 工作节点要求
CPU:>= 4核
内存:>= 8GB
磁盘:>= 100GB
网络:>= 1Gbps
副本数:>= 3个
# ETCD节点要求
CPU:>= 4核
内存:>= 8GB
磁盘:>= 100GB
网络:>= 1Gbps
副本数:>= 3个
# 操作系统要求
Oracle Linux:>= 9.3
RHEL:>= 9.3
CentOS:>= 7.9
Ubuntu:>= 20.04
# 网络要求
网络带宽:>= 1Gbps
网络延迟:< 10ms
端口开放:6443、2379、2380、10250等
2.3 Rancher数据库节点规划
Rancher数据库节点规划:
# 控制平面节点规划
节点1:fgedu-control-plane-1(192.168.1.10)
节点2:fgedu-control-plane-2(192.168.1.11)
节点3:fgedu-control-plane-3(192.168.1.12)
# 工作节点规划
节点1:fgedu-worker-1(192.168.1.20)
节点2:fgedu-worker-2(192.168.1.21)
节点3:fgedu-worker-3(192.168.1.22)
# ETCD节点规划
节点1:fgedu-control-plane-1(192.168.1.10)
节点2:fgedu-control-plane-2(192.168.1.11)
节点3:fgedu-control-plane-3(192.168.1.12)
# 资源规划
控制平面:CPU 4核,内存 8GB
工作节点:CPU 8核,内存 16GB
ETCD节点:CPU 4核,内存 8GB
# 网络规划
管理网络:192.168.1.0/24
Pod网络:10.244.0.0/16
Service网络:10.96.0.0/12
Part03-生产环境项目实施方案
3.1 Rancher数据库添加节点
3.1.1 Rancher数据库通过Web界面添加节点
# 步骤1:登录Rancher管理界面
# 步骤2:点击”集群” – 选择集群 – 点击”节点”
# 步骤3:点击”添加节点”按钮
# 步骤4:选择节点类型:Worker
# 步骤5:填写节点信息:
# 节点名称:fgedu-worker-4
# 节点描述:Rancher数据库工作节点4
# 节点IP:192.168.1.23
# 步骤6:点击”创建”按钮
# 通过CLI添加节点
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready
fgedu-worker-2 Ready
fgedu-worker-3 Ready
# 在新节点上安装RKE2
[root@fgedu-worker-4 ~]# curl -sfL https://get.rke2.io | sh –
[root@fgedu-worker-4 ~]# yum install -y rke2-1.28.0-1.el7.x86_64
# 配置RKE2 Agent
[root@fgedu-worker-4 ~]# cat > /etc/rancher/rke2/config.yaml <
fgedu-worker-2 Ready
fgedu-worker-3 Ready
fgedu-worker-4 Ready
# 查看节点详情
[root@rancher ~]# kubectl describe node fgedu-worker-4
Name: fgedu-worker-4
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=fgedu-worker-4
kubernetes.io/os=linux
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Fri, 10 Apr 2026 10:00:00 +0000
Taints:
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
—- —— —————– —————— —— ——-
NetworkUnavailable False Fri, 10 Apr 2026 10:00:00 +0000 Fri, 10 Apr 2026 10:00:00 +0000 Kubelet has sufficient network available
MemoryPressure False Fri, 10 Apr 2026 10:00:00 +0000 Fri, 10 Apr 2026 10:00:00 +0000 Kubelet has sufficient memory available
DiskPressure False Fri, 10 Apr 2026 10:00:00 +0000 Fri, 10 Apr 2026 10:00:00 +0000 Kubelet has no disk pressure
PIDPressure False Fri, 10 Apr 2026 10:00:00 +0000 Fri, 10 Apr 2026 10:00:00 +0000 Kubelet has sufficient PID available
Ready True Fri, 10 Apr 2026 10:00:00 +0000 Fri, 10 Apr 2026 10:00:00 +0000 Kubelet is posting ready status
Addresses:
InternalIP: 192.168.1.23
Hostname: fgedu-worker-4
Capacity:
cpu: 8
ephemeral-storage: 100Gi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16384200Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 100Gi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16384200Ki
pods: 110
System Info:
Machine ID: 1234567890abcdef
System UUID: 12345678-90ab-cdef-1234-567890abcdef
Boot ID: 12345678-90ab-cdef-1234-567890abcdef
Kernel Version: 5.4.0-91-generic
OS Image: Oracle Linux Server 9.3
Operating System: Linux
Architecture: amd64
Container Runtime Version: containerd://1.6.0
Kubelet Version: v1.28.0
Kube-Proxy Version: v1.28.0
3.2 Rancher数据库移除节点
3.2.1 Rancher数据库通过CLI移除节点
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready
fgedu-worker-2 Ready
fgedu-worker-3 Ready
fgedu-worker-4 Ready
# 驱逐节点上的Pod
[root@rancher ~]# kubectl drain fgedu-worker-4 –ignore-daemonsets –delete-emptydir-data
node/fgedu-worker-4 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-1234567890-abcde
evicting pod fgedu-dev/fgedu-nginx-1234567890-abcde
pod/fgedu-nginx-1234567890-abcde evicted
# 查看Pod状态
[root@rancher ~]# kubectl get pods -n fgedu-dev
NAME READY STATUS RESTARTS AGE
fgedu-nginx-1234567890-fghij 1/1 Running 0 1m
fgedu-nginx-1234567890-klmno 1/1 Running 0 1m
fgedu-nginx-1234567890-opqr 1/1 Running 0 1m
# 删除节点
[root@rancher ~]# kubectl delete node fgedu-worker-4
node “fgedu-worker-4” deleted
# 查看节点状态
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready
fgedu-worker-2 Ready
fgedu-worker-3 Ready
# 在节点上停止RKE2 Agent
[root@fgedu-worker-4 ~]# systemctl stop rke2-agent
[root@fgedu-worker-4 ~]# systemctl disable rke2-agent
# 清理节点配置
[root@fgedu-worker-4 ~]# rm -rf /etc/rancher/rke2
[root@fgedu-worker-4 ~]# rm -rf /var/lib/rancher/rke2
3.3 Rancher数据库维护节点
3.3.1 Rancher数据库通过CLI维护节点
[root@rancher ~]# kubectl cordon fgedu-worker-1
node/fgedu-worker-1 cordoned
# 查看节点状态
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready,SchedulingDisabled
fgedu-worker-2 Ready
fgedu-worker-3 Ready
# 解锁节点
[root@rancher ~]# kubectl uncordon fgedu-worker-1
node/fgedu-worker-1 uncordoned
# 查看节点状态
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready
fgedu-worker-2 Ready
fgedu-worker-3 Ready
# 添加节点标签
[root@rancher ~]# kubectl label node fgedu-worker-1 environment=production
node/fgedu-worker-1 labeled
# 查看节点标签
[root@rancher ~]# kubectl get nodes –show-labels
NAME STATUS ROLES AGE VERSION LABELS
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,environment=production,kubernetes.io/arch=amd64,kubernetes.io/hostname=fgedu-control-plane-1,kubernetes.io/os=linux
fgedu-worker-1 Ready
# 添加节点污点
[root@rancher ~]# kubectl taint node fgedu-worker-1 key=value:NoSchedule
node/fgedu-worker-1 tainted
# 查看节点污点
[root@rancher ~]# kubectl describe node fgedu-worker-1 | grep -A 5 Taints
Taints: key=value:NoSchedule
Part04-生产案例与实战讲解
4.1 Rancher数据库驱逐节点
4.1.1 Rancher数据库通过CLI驱逐节点
[root@rancher ~]# kubectl get pods -o wide -n fgedu-dev
NAME READY STATUS RESTARTS AGE IP NODE
fgedu-nginx-1234567890-abcde 1/1 Running 0 5m 10.244.0.5 fgedu-worker-1
fgedu-nginx-1234567890-fghij 1/1 Running 0 5m 10.244.0.6 fgedu-worker-2
fgedu-nginx-1234567890-klmno 1/1 Running 0 5m 10.244.0.7 fgedu-worker-3
# 驱逐节点上的Pod
[root@rancher ~]# kubectl drain fgedu-worker-1 –ignore-daemonsets –delete-emptydir-data
node/fgedu-worker-1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-1234567890-abcde
evicting pod fgedu-dev/fgedu-nginx-1234567890-abcde
pod/fgedu-nginx-1234567890-abcde evicted
# 查看Pod状态
[root@rancher ~]# kubectl get pods -o wide -n fgedu-dev
NAME READY STATUS RESTARTS AGE IP NODE
fgedu-nginx-1234567890-fghij 1/1 Running 0 5m 10.244.0.6 fgedu-worker-2
fgedu-nginx-1234567890-klmno 1/1 Running 0 5m 10.244.0.7 fgedu-worker-3
fgedu-nginx-1234567890-opqr 1/1 Running 0 1m 10.244.0.8 fgedu-worker-2
# 查看节点状态
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready,SchedulingDisabled
fgedu-worker-2 Ready
fgedu-worker-3 Ready
# 解锁节点
[root@rancher ~]# kubectl uncordon fgedu-worker-1
node/fgedu-worker-1 uncordoned
4.2 Rancher数据库封锁节点
4.2.1 Rancher数据库通过CLI封锁节点
[root@rancher ~]# kubectl cordon fgedu-worker-1
node/fgedu-worker-1 cordoned
# 查看节点状态
[root@rancher ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
fgedu-control-plane-1 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-2 Ready control-plane,etcd,master 10m v1.28.0
fgedu-control-plane-3 Ready control-plane,etcd,master 10m v1.28.0
fgedu-worker-1 Ready,SchedulingDisabled
fgedu-worker-2 Ready
fgedu-worker-3 Ready
# 尝试调度Pod到封锁的节点 # 优化节点资源限制 # 查看节点详情 # 优化节点内核参数 Rancher数据库节点最佳实践: Rancher数据库节点问题排查: # 问题1:节点状态为NotReady # 问题2:节点无法调度Pod # 问题3:节点资源不足 # 问题4:节点网络异常 Rancher数据库节点维护: # 1. 定期检查 # 2. 定期优化 # 3. 定期备份 # 4. 定期清理 # 5. 定期审计 本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
[root@rancher ~]# cat <4.3 Rancher数据库优化节点
4.3.1 Rancher数据库优化节点性能
[root@rancher ~]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
fgedu-control-plane-1 2.5 62% 5Gi 62%
fgedu-control-plane-2 2.5 62% 5Gi 62%
fgedu-control-plane-3 2.5 62% 5Gi 62%
fgedu-worker-1 5.0 62% 10Gi 62%
fgedu-worker-2 5.0 62% 10Gi 62%
fgedu-worker-3 5.0 62% 10Gi 62%
[root@rancher ~]# kubectl patch node fgedu-worker-1 -p ‘{“spec”:{“unschedulable”:false}}’
[root@rancher ~]# kubectl describe node fgedu-worker-1 | grep -A 10 Capacity
Capacity:
cpu: 8
ephemeral-storage: 100Gi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16384200Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 100Gi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16384200Ki
pods: 110
[root@fgedu-worker-1 ~]# cat >> /etc/sysctl.conf <Part05-风哥经验总结与分享
5.1 Rancher数据库节点最佳实践
5.2 Rancher数据库节点问题排查
# 现象:节点状态为NotReady
# 原因:kubelet服务异常、网络不通、资源不足
# 解决:
[root@rancher ~]# kubectl get nodes
[root@rancher ~]# kubectl describe node fgedu-worker-1
[root@fgedu-worker-1 ~]# systemctl status kubelet
[root@fgedu-worker-1 ~]# journalctl -u kubelet -n 50
# 现象:Pod无法调度到节点
# 原因:节点被封锁、资源不足、污点配置
# 解决:
[root@rancher ~]# kubectl get nodes
[root@rancher ~]# kubectl describe node fgedu-worker-1
[root@rancher ~]# kubectl describe pod
[root@rancher ~]# kubectl get events –all-namespaces
# 现象:节点资源使用率高
# 原因:Pod过多、资源限制不当、应用资源占用高
# 解决:
[root@rancher ~]# kubectl top nodes
[root@rancher ~]# kubectl top pods –all-namespaces
[root@rancher ~]# kubectl describe node fgedu-worker-1
[root@rancher ~]# kubectl get pods –all-namespaces -o wide
# 现象:节点网络不通
# 原因:网络配置错误、防火墙配置、网络延迟高
# 解决:
[root@rancher ~]# kubectl get nodes
[root@rancher ~]# kubectl describe node fgedu-worker-1
[root@fgedu-worker-1 ~]# ping 192.168.1.10
[root@fgedu-worker-1 ~]# traceroute 192.168.1.10
5.3 Rancher数据库节点维护
– 检查节点状态
– 检查节点资源
– 检查节点健康
– 检查节点网络
– 优化节点资源
– 优化节点性能
– 优化节点配置
– 优化节点网络
– 备份节点配置
– 备份节点数据
– 备份节点日志
– 备份节点证书
– 清理无用Pod
– 清理无用镜像
– 清理无用日志
– 清理无用缓存
– 审计节点配置
– 审计节点变更
– 审计节点日志
– 审计操作记录
