本文档风哥主要介绍Rancher导入现有K8s集群与纳管实战,包括Rancher数据库导入集群概念、Rancher数据库导入集群场景、Rancher数据库导入集群架构、Rancher数据库导入集群准备、Rancher数据库导入集群要求、Rancher数据库导入集群网络规划、Rancher数据库导入K8s集群、Rancher数据库导入验证、Rancher数据库集群配置、Rancher数据库集群管理、Rancher数据库应用部署、Rancher数据库集群监控等内容,风哥教程参考Rancher官方文档集群管理、导入集群等内容,适合运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 Rancher数据库导入集群概念
Rancher数据库导入集群是指将现有的Kubernetes集群导入到Rancher管理平台进行统一管理。导入集群后,可以通过Rancher的Web界面进行集群管理、应用部署、监控告警等操作。Rancher支持导入各种类型的Kubernetes集群,包括自建集群、云厂商托管集群等。导入集群不会修改原有集群的配置,只是在集群中部署Rancher Agent,实现与Rancher Server的通信。更多视频教程www.fgedu.net.cn
- 不影响原有集群配置
- 支持多种Kubernetes集群类型
- 统一管理多个集群
- 提供集中式权限控制
- 支持监控和告警
1.2 Rancher数据库导入集群场景
Rancher数据库导入集群适用场景:
- 多集群管理:统一管理多个Kubernetes集群
- 现有集群纳管:将现有集群纳入Rancher管理
- 云厂商集群:导入EKS、ACK、GKE等云厂商集群
- 混合云部署:管理本地和云端的集群
- 权限统一:统一管理集群访问权限
1.3 Rancher数据库导入集群架构
Rancher数据库导入集群架构:
┌─────────────────────────────────────────────────────────┐
│ Rancher Server │
│ 192.168.1.100 │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Rancher Management │ │
│ │ – Web UI │ │
│ │ – API Server │ │
│ │ – Cluster Management │ │
│ │ – RBAC │ │
│ │ – Monitoring │ │
│ └───────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────┘
│
│ HTTPS
↓
┌─────────────────────────────────────────────────────────┐
│ Imported Clusters │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Cluster 1 (EKS) │ │
│ │ – Rancher Agent │ │
│ │ – Kubernetes Cluster │ │
│ └───────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Cluster 2 (ACK) │ │
│ │ – Rancher Agent │ │
│ │ – Kubernetes Cluster │ │
│ └───────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Cluster 3 (Self-built) │ │
│ │ – Rancher Agent │ │
│ │ – Kubernetes Cluster │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
# 架构说明
1. Rancher Server:提供统一的管理界面和API
2. Rancher Agent:部署在每个集群中,与Rancher Server通信
3. Imported Clusters:被导入的Kubernetes集群
4. 通信方式:通过HTTPS协议进行安全通信
Part02-生产环境规划与建议
2.1 Rancher数据库导入集群准备
Rancher数据库导入集群准备:
# 1. Rancher Server准备
– Rancher Server已部署
– Rancher Server可访问
– Rancher Server配置正确
# 2. Kubernetes集群准备
– Kubernetes集群正常运行
– Kubernetes版本兼容
– 集群节点资源充足
– 集群网络配置正确
# 3. 网络准备
– Rancher Server与集群网络互通
– 防火墙规则配置
– DNS解析配置
– 证书配置
# 4. 权限准备
– Kubernetes集群管理员权限
– Rancher Server管理员权限
– 集群访问凭证
# 5. 资源准备
– Rancher Agent所需资源
– 监控组件所需资源
– 日志组件所需资源
2.2 Rancher数据库导入集群要求
Rancher数据库导入集群要求:
# Kubernetes版本要求
– 最低版本:v1.23.x
– 推荐版本:v1.25.x – v1.28.x
– 最高版本:v1.28.x
# 集群要求
– 集群状态:Active
– 节点数量:至少1个节点
– 节点资源:CPU >= 2核,内存 >= 4GB
– 网络插件:支持Calico、Flannel等
# 网络要求
– 网络延迟:< 100ms
- 网络带宽:> 100Mbps
– 网络稳定性:99.9%以上
– 端口开放:443、6443、10250等
# 权限要求
– Kubernetes管理员权限
– 能够创建命名空间
– 能够部署Pod
– 能够创建Service
2.3 Rancher数据库导入集群网络规划
Rancher数据库导入集群网络规划:
# IP地址规划
Rancher Server:192.168.1.100
Cluster 1 (EKS):10.0.1.0/24
Cluster 2 (ACK):10.0.2.0/24
Cluster 3 (Self-built):10.0.3.0/24
# 端口规划
Rancher Server:
443/tcp:HTTPS访问
6443/tcp:Kubernetes API
Kubernetes集群:
6443/tcp:Kubernetes API
10250/tcp:Kubelet API
10251/tcp:Kube-scheduler
10252/tcp:Kube-controller-manager
# 防火墙规则
开放443/tcp、6443/tcp端口
限制访问来源IP
配置端口转发规则
# DNS解析配置
rancher.fgedu.net.cn -> 192.168.1.100
cluster1.fgedu.net.cn -> 10.0.1.100
cluster2.fgedu.net.cn -> 10.0.2.100
cluster3.fgedu.net.cn -> 10.0.3.100
Part03-生产环境项目实施方案
3.1 Rancher数据库导入K8s集群
3.1.1 Rancher数据库导入EKS集群
# 步骤1:登录Rancher管理界面
# 步骤2:点击”集群” – “导入现有集群”
# 步骤3:选择”Amazon EKS”
# 步骤4:填写集群信息:
# 集群名称:fgedu-eks-cluster
# 集群描述:Rancher数据库EKS测试集群
# EKS集群名称:fgedu-eks
# AWS区域:us-west-2
# 步骤5:点击”下一步”按钮
# 步骤6:选择导入方式:
# 使用AWS CLI
# 使用kubeconfig文件
# 步骤7:复制导入命令
# 步骤8:在本地执行导入命令
# 步骤9:点击”完成”按钮
# 通过kubectl导入EKS集群
[root@fgedu ~]# aws eks update-kubeconfig –name fgedu-eks –region us-west-2
Added new context arn:aws:eks:us-west-2:123456789012:cluster/fgedu-eks to /root/.kube/config
# 验证EKS集群连接
[root@fgedu ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-1-100.us-west-2.compute.internal Ready
ip-10-0-1-101.us-west-2.compute.internal Ready
ip-10-0-1-102.us-west-2.compute.internal Ready
# 创建Rancher Agent命名空间
[root@fgedu ~]# kubectl create namespace cattle-system
namespace/cattle-system created
# 应用Rancher Agent清单
[root@fgedu ~]# kubectl apply -f https://192.168.1.100/v3/import/1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef.yaml
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver created
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master created
namespace/cattle-system unchanged
serviceaccount/cattle created
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding created
secret/cattle-credentials-1234567890 created
deployment.apps/cattle-cluster-agent created
daemonset.apps/cattle-node-agent created
# 查看Agent状态
[root@fgedu ~]# kubectl get pods -n cattle-system
NAME READY STATUS RESTARTS AGE
cattle-cluster-agent-1234567890-abcde 1/1 Running 0 1m
cattle-node-agent-12345678 1/1 Running 0 1m
cattle-node-agent-2345678901 1/1 Running 0 1m
cattle-node-agent-3456789012 1/1 Running 0 1m
# 查看Agent日志
[root@fgedu ~]# kubectl logs -n cattle-system cattle-cluster-agent-1234567890-abcde -f
INFO: Starting Rancher Agent
INFO: Connected to Rancher Server
INFO: Cluster registered successfully
3.1.2 Rancher数据库导入ACK集群
# 步骤1:登录Rancher管理界面
# 步骤2:点击”集群” – “导入现有集群”
# 步骤3:选择”Alibaba ACK”
# 步骤4:填写集群信息:
# 集群名称:fgedu-ack-cluster
# 集群描述:Rancher数据库ACK测试集群
# ACK集群ID:c1234567890abcdef1234567890abcdef
# ACK区域:cn-hangzhou
# 步骤5:点击”下一步”按钮
# 步骤6:选择导入方式:
# 使用ACK CLI
# 使用kubeconfig文件
# 步骤7:复制导入命令
# 步骤8:在本地执行导入命令
# 步骤9:点击”完成”按钮
# 通过kubectl导入ACK集群
[root@fgedu ~]# aliyun cs GET /k8s/c1234567890abcdef1234567890abcdef/user_config | kubectl config use-context
Switched to context “c1234567890abcdef1234567890abcdef”
# 验证ACK集群连接
[root@fgedu ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
cn-hangzhou.192.168.1.100 Ready
cn-hangzhou.192.168.1.101 Ready
cn-hangzhou.192.168.1.102 Ready
# 创建Rancher Agent命名空间
[root@fgedu ~]# kubectl create namespace cattle-system
namespace/cattle-system created
# 应用Rancher Agent清单
[root@fgedu ~]# kubectl apply -f https://192.168.1.100/v3/import/2345678901bcdef2345678901bcdef2345678901bcdef2345678901bcdef2345678.yaml
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver created
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master created
namespace/cattle-system unchanged
serviceaccount/cattle created
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding created
secret/cattle-credentials-2345678901 created
deployment.apps/cattle-cluster-agent created
daemonset.apps/cattle-node-agent created
# 查看Agent状态
[root@fgedu ~]# kubectl get pods -n cattle-system
NAME READY STATUS RESTARTS AGE
cattle-cluster-agent-2345678901-abcde 1/1 Running 0 1m
cattle-node-agent-23456789 1/1 Running 0 1m
cattle-node-agent-34567890 1/1 Running 0 1m
cattle-node-agent-45678901 1/1 Running 0 1m
3.2 Rancher数据库导入验证
3.2.1 Rancher数据库验证集群导入
# 步骤1:登录Rancher管理界面
# 步骤2:点击”集群”
# 步骤3:查看导入的集群列表
# 步骤4:点击集群名称,查看集群详情
# 步骤5:查看集群状态、节点、Pod等信息
# 通过API验证集群导入
[root@fgedu ~]# curl -k -u “admin:password” \
https://192.168.1.100/v3/clusters
{
“data”: [
{
“id”: “c-1234567890”,
“type”: “cluster”,
“name”: “fgedu-eks-cluster”,
“description”: “Rancher数据库EKS测试集群”,
“state”: “active”,
“transitioning”: “no”
},
{
“id”: “c-2345678901”,
“type”: “cluster”,
“name”: “fgedu-ack-cluster”,
“description”: “Rancher数据库ACK测试集群”,
“state”: “active”,
“transitioning”: “no”
}
]
}
# 查看集群节点
[root@fgedu ~]# curl -k -u “admin:password” \
https://192.168.1.100/v3/clusters/c-1234567890/nodes
{
“data”: [
{
“id”: “node-1234567890”,
“type”: “node”,
“name”: “ip-10-0-1-100.us-west-2.compute.internal”,
“state”: “active”,
“transitioning”: “no”
},
{
“id”: “node-2345678901”,
“type”: “node”,
“name”: “ip-10-0-1-101.us-west-2.compute.internal”,
“state”: “active”,
“transitioning”: “no”
},
{
“id”: “node-3456789012”,
“type”: “node”,
“name”: “ip-10-0-1-102.us-west-2.compute.internal”,
“state”: “active”,
“transitioning”: “no”
}
]
}
3.3 Rancher数据库集群配置
3.3.1 Rancher数据库配置集群监控
# 步骤1:登录Rancher管理界面
# 步骤2:选择集群 – 点击”工具” – “监控”
# 步骤3:点击”启用”按钮
# 步骤4:选择监控版本:v0.45.0
# 步骤5:配置监控参数:
# Prometheus存储:50Gi
# Grafana存储:10Gi
# 告警规则:启用
# 步骤6:点击”安装”按钮
# 通过CLI配置集群监控
[root@fgedu ~]# kubectl create namespace cattle-monitoring-system
namespace/cattle-monitoring-system created
[root@fgedu ~]# helm repo add rancher-monitoring https://charts.rancher.io
“rancher-monitoring” has been added to your repositories
[root@fgedu ~]# helm repo update
Hang tight while we grab the latest from your chart repositories…
…Successfully got an update from the “rancher-monitoring” chart repository
Update Complete. ⎈Happy Helming!⎈
[root@fgedu ~]# helm install rancher-monitoring rancher-monitoring/rancher-monitoring \
–namespace cattle-monitoring-system \
–set prometheus.prometheusSpec.retention=15d \
–set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=gp2 \
–set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
–set grafana.persistence.enabled=true \
–set grafana.persistence.size=10Gi \
–set grafana.persistence.storageClassName=gp2
NAME: rancher-monitoring
LAST DEPLOYED: Fri Apr 10 10:00:00 2026
NAMESPACE: cattle-monitoring-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rancher Monitoring Stack has been installed successfully!
Access Grafana:
kubectl -n cattle-monitoring-system port-forward svc/rancher-monitoring-grafana 3000:80
Access Prometheus:
kubectl -n cattle-monitoring-system port-forward svc/rancher-monitoring-prometheus 9090:9090
# 查看监控组件状态
[root@fgedu ~]# kubectl get pods -n cattle-monitoring-system
NAME READY STATUS RESTARTS AGE
rancher-monitoring-operator-1234567890-abcde 1/1 Running 0 2m
rancher-monitoring-prometheus-0 2/2 Running 0 2m
rancher-monitoring-grafana-1234567890-abcde 1/1 Running 0 2m
rancher-monitoring-alertmanager-0 1/1 Running 0 2m
Part04-生产案例与实战讲解
4.1 Rancher数据库集群管理
4.1.1 Rancher数据库查看集群信息
# 步骤1:登录Rancher管理界面
# 步骤2:点击”集群”
# 步骤3:选择集群
# 步骤4:查看集群概览、节点、工作负载、存储等信息
# 通过kubectl查看集群信息
[root@fgedu ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-1-100.us-west-2.compute.internal Ready
ip-10-0-1-101.us-west-2.compute.internal Ready
ip-10-0-1-102.us-west-2.compute.internal Ready
# 查看节点详细信息
[root@fgedu ~]# kubectl describe node ip-10-0-1-100.us-west-2.compute.internal
Name: ip-10-0-1-100.us-west-2.compute.internal
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m5.large
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-west-2
failure-domain.beta.kubernetes.io/zone=us-west-2a
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-1-100
kubernetes.io/os=linux
node-role.kubernetes.io/worker=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Wed, 01 Apr 2026 00:00:00 +0000
Taints:
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
—- —— —————– —————— —— ——-
MemoryPressure False Fri, 10 Apr 2026 10:00:00 +0000 Wed, 01 Apr 2026 00:00:00 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 10 Apr 2026 10:00:00 +0000 Wed, 01 Apr 2026 00:00:00 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 10 Apr 2026 10:00:00 +0000 Wed, 01 Apr 2026 00:00:00 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 10 Apr 2026 10:00:00 +0000 Wed, 01 Apr 2026 00:00:00 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.1.100
ExternalIP: 54.123.45.67
Hostname: ip-10-0-1-100.us-west-2.compute.internal
Capacity:
cpu: 2
ephemeral-storage: 104857600Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8192000Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 104857600Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8089580Ki
pods: 110
System Info:
Machine ID: 1234567890abcdef1234567890abcdef
System UUID: 12345678-90AB-CDEF-1234-567890ABCDEF
Boot ID: abcdef12-3456-7890-abcd-ef1234567890
Kernel Version: 5.4.0-1049-aws
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.7
Kubelet Version: v1.28.5
Kube-Proxy Version: v1.28.5
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
——— —- ———— ———- ————— ————-
cattle-system cattle-cluster-agent-1234567890-abcde 100m (5%) 0 (0%) 256Mi (3%) 0 (0%)
cattle-system cattle-node-agent-12345678 100m (5%) 0 (0%) 256Mi (3%) 0 (0%)
kube-system coredns-1234567890-abcde 100m (5%) 0 (0%) 128Mi (1%) 0 (0%)
kube-system kube-proxy-12345678 100m (5%) 0 (0%) 128Mi (1%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
——– ——– ——
cpu 400m (20%) 0 (0%)
memory 768Mi (9%) 0 (0%)
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal Starting 10d kubelet Starting kubelet.
Normal NodeAllocatableEnforced 10d kubelet Updated Node Allocatable limit across pods
Normal NodeReady 10d kubelet Node ip-10-0-1-100.us-west-2.compute.internal status is now: NodeReady
4.2 Rancher数据库应用部署
4.2.1 Rancher数据库部署应用
# 步骤1:登录Rancher管理界面
# 步骤2:选择集群 – 点击”工作负载” – “部署”
# 步骤3:填写应用信息:
# 名称:fgedu-nginx
# 命名空间:default
# 镜像:nginx:latest
# 端口映射:80:80
# 副本数:3
# 步骤4:点击”部署”按钮
# 通过kubectl部署应用
[root@fgedu ~]# kubectl create deployment fgedu-nginx –image=nginx:latest –replicas=3
deployment.apps/fgedu-nginx created
# 查看部署状态
[root@fgedu ~]# kubectl get deployments fgedu-nginx
NAME READY UP-TO-DATE AVAILABLE AGE
fgedu-nginx 3/3 3 3 1m
# 查看Pod状态
[root@fgedu ~]# kubectl get pods -l app=fgedu-nginx
NAME READY STATUS RESTARTS AGE
fgedu-nginx-1234567890-abcde 1/1 Running 0 1m
fgedu-nginx-1234567890-fghij 1/1 Running 0 1m
fgedu-nginx-1234567890-klmno 1/1 Running 0 1m
# 创建Service
[root@fgedu ~]# kubectl expose deployment fgedu-nginx –port=80 –type=LoadBalancer
service/fgedu-nginx exposed
# 查看Service状态
[root@fgedu ~]# kubectl get svc fgedu-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fgedu-nginx LoadBalancer 10.100.123.45 a1b2c3d4e5f6g7h8 80:31234/TCP 1m
# 测试应用访问
[root@fgedu ~]# curl http://a1b2c3d4e5f6g7h8
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
4.3 Rancher数据库集群监控
4.3.1 Rancher数据库查看监控数据
# 步骤1:登录Rancher管理界面
# 步骤2:选择集群 – 点击”监控”
# 步骤3:查看Grafana仪表板
# 步骤4:查看Prometheus数据
# 步骤5:查看告警规则
# 通过kubectl查看监控数据
[root@fgedu ~]# kubectl get pods -n cattle-monitoring-system
NAME READY STATUS RESTARTS AGE
rancher-monitoring-operator-1234567890-abcde 1/1 Running 0 10m
rancher-monitoring-prometheus-0 2/2 Running 0 10m
rancher-monitoring-grafana-1234567890-abcde 1/1 Running 0 10m
rancher-monitoring-alertmanager-0 1/1 Running 0 10m
# 端口转发访问Grafana
[root@fgedu ~]# kubectl -n cattle-monitoring-system port-forward svc/rancher-monitoring-grafana 3000:80 &
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000
# 访问Grafana
# URL: http://localhost:3000
# 用户名:admin
# 密码:prom-operator
# 端口转发访问Prometheus
[root@fgedu ~]# kubectl -n cattle-monitoring-system port-forward svc/rancher-monitoring-prometheus 9090:9090 &
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
# 访问Prometheus
# URL: http://localhost:9090
# 查看Prometheus指标
[root@fgedu ~]# curl http://localhost:9090/api/v1/label/__name__/values | jq -r ‘.data[]’ | head -20
up
process_start_time_seconds
promhttp_metric_handler_requests_total
promhttp_metric_handler_requests_in_flight
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_info
go_memstats_alloc_bytes
go_memstats_alloc_bytes_total
go_memstats_buck_hash_sys_bytes
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_objects
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
Part05-风哥经验总结与分享
5.1 Rancher数据库导入集群最佳实践
Rancher数据库导入集群最佳实践:
- 网络规划:确保Rancher Server与集群网络互通
- 证书配置:使用有效的SSL证书,确保通信安全
- 权限控制:配置适当的RBAC权限,确保安全
- 监控告警:配置监控和告警系统,及时发现和处理问题
- 备份恢复:定期备份集群配置和数据,配置异地备份
- 文档记录:记录导入过程和配置,便于知识传承
- 定期检查:定期检查集群状态,确保Agent正常运行
5.2 Rancher数据库导入集群问题排查
Rancher数据库导入集群问题排查:
# 问题1:Agent无法连接到Rancher Server
# 现象:Agent Pod状态为CrashLoopBackOff或Error
# 原因:网络不通、证书无效、配置错误
# 解决:
[root@fgedu ~]# kubectl get pods -n cattle-system
[root@fgedu ~]# kubectl logs -n cattle-system cattle-cluster-agent-1234567890-abcde
[root@fgedu ~]# kubectl describe pod -n cattle-system cattle-cluster-agent-1234567890-abcde
[root@fgedu ~]# ping 192.168.1.100
[root@fgedu ~]# curl -k https://192.168.1.100/ping
# 问题2:集群状态显示为Unknown
# 现象:集群状态显示为Unknown或Error
# 原因:Agent未正常运行、网络不通、配置错误
# 解决:
[root@fgedu ~]# kubectl get pods -n cattle-system
[root@fgedu ~]# kubectl get nodes
[root@fgedu ~]# kubectl get events –sort-by=.metadata.creationTimestamp
[root@fgedu ~]# kubectl logs -n cattle-system cattle-cluster-agent-1234567890-abcde
# 问题3:监控数据不显示
# 现象:Grafana界面没有数据
# 原因:Prometheus未启动、配置错误、权限问题
# 解决:
[root@fgedu ~]# kubectl get pods -n cattle-monitoring-system
[root@fgedu ~]# kubectl logs -n cattle-monitoring-system rancher-monitoring-prometheus-0
[root@fgedu ~]# kubectl get prometheusrules -n cattle-monitoring-system
[root@fgedu ~]# kubectl get servicemonitors -n cattle-monitoring-system
# 问题4:应用部署失败
# 现象:Pod状态显示为ImagePullBackOff或ErrImagePull
# 原因:镜像不存在、镜像仓库配置错误、权限不足
# 解决:
[root@fgedu ~]# kubectl describe pod
[root@fgedu ~]# kubectl get secrets
[root@fgedu ~]# kubectl create secret docker-registry harbor-secret \
–docker-server=harbor.fgedu.net.cn \
–docker-username=admin \
–docker-password=Harbor@123456
5.3 Rancher数据库集群维护
Rancher数据库集群维护:
# 1. 定期检查
– 检查集群状态
– 检查Agent状态
– 检查节点状态
– 检查Pod状态
# 2. 监控告警
– 配置监控指标
– 配置告警规则
– 配置通知方式
– 定期检查告警
# 3. 备份恢复
– 备份集群配置
– 备份应用数据
– 测试备份恢复
– 配置异地备份
# 4. 更新升级
– 更新Rancher Agent
– 更新Kubernetes版本
– 更新应用版本
– 测试升级流程
# 5. 安全加固
– 更新SSL证书
– 配置访问控制
– 定期审计日志
– 更新安全策略
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
