1. 首页 > Rancher教程 > 正文

Rancher教程FG001-Rancher架构原理与官方核心特性生产实战解析

本文档风哥主要介绍Rancher架构原理与官方核心特性生产实战解析,包括Rancher数据库概念与学习前景、Rancher数据库架构原理、Rancher数据库官方核心特性、Rancher数据库系统硬件要求、Rancher数据库操作系统环境、Rancher数据库运行平台环境、Rancher数据库环境检查与准备、Rancher数据库Docker环境安装、Rancher数据库网络配置与优化、Rancher数据库多集群统一管理、Rancher数据库RBAC权限配置、Rancher数据库监控告警配置等内容,风哥教程参考Rancher官方文档快速入门、安装与升级、集群管理等内容,适合运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。

Part01-基础概念与理论知识

1.1 Rancher数据库概念与学习前景

Rancher是一个开源的Kubernetes管理平台,它能够统一管理多个Kubernetes集群,提供集中化的身份认证、RBAC权限控制、监控告警、日志收集等功能。Rancher基于Kubernetes构建,为企业提供了一个全栈式的容器管理解决方案。学习Rancher可以掌握容器编排、集群管理、微服务部署等核心技术,在云原生时代具有广阔的职业发展前景。更多视频教程www.fgedu.net.cn

Rancher数据库学习前景:

  • 云原生技术栈必备技能
  • 企业级容器管理平台
  • 多集群统一管理能力
  • DevOps运维核心工具
  • 市场需求持续增长

1.2 Rancher数据库架构原理

Rancher数据库架构采用分层设计,主要包括以下几个核心组件:

  • Rancher Server:Rancher的核心服务,提供Web UI和API接口
  • Cluster Agent:部署在每个Kubernetes集群中的代理,负责与Rancher Server通信
  • Fleet:多集群应用分发系统,支持GitOps工作流
  • Rancher Manager:集群管理引擎,支持RKE、RKE2、K3s等多种Kubernetes发行版
  • 认证与授权:支持本地用户、LDAP、OIDC、SAML等多种认证方式
# Rancher数据库架构层次
┌─────────────────────────────────────┐
│ Rancher Server │
│ (Web UI + API + Management) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Cluster Agent │
│ (部署在每个K8s集群中) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Managed Clusters │
│ (RKE/RKE2/K3s/EKS/ACK/GKE) │
└─────────────────────────────────────┘

1.3 Rancher数据库官方核心特性

Rancher数据库官方核心特性包括:

  • 多集群统一管理:支持管理来自不同云厂商和自建的Kubernetes集群
  • 集中式身份认证:统一所有集群的用户认证和RBAC权限控制
  • 应用商店集成:内置Helm Chart应用商店,一键部署常用应用
  • Fleet多集群分发:支持GitOps式的多集群应用分发和升级
  • 监控与告警:集成Prometheus和Grafana,提供细粒度的监控告警
  • 日志收集:支持将日志发送到外部日志系统
  • CI/CD集成:可与外部CI/CD系统对接,支持自动化部署
风哥提示:Rancher的核心价值在于多集群统一管理和集中式权限控制,特别适合企业级容器平台建设。学习交流加群风哥微信: itpux-com

Part02-生产环境规划与建议

2.1 Rancher数据库系统硬件要求

Rancher数据库系统硬件要求根据部署规模而定:

# Rancher数据库单节点部署硬件要求
CPU:4核或以上
内存:8GB或以上
磁盘:100GB SSD或以上
网络:千兆网卡

# Rancher数据库高可用部署硬件要求
CPU:8核或以上
内存:16GB或以上
磁盘:200GB SSD或以上
网络:千兆网卡

# Rancher数据库企业级部署硬件要求
CPU:16核或以上
内存:32GB或以上
磁盘:500GB SSD或以上
网络:万兆网卡

# 存储规划
系统盘:100GB SSD
数据盘:/Rancher/fgdata 200GB SSD
日志盘:/var/log 50GB SSD

2.2 Rancher数据库操作系统环境

Rancher数据库支持的操作系统环境:

# 支持的操作系统
Oracle Linux 9.3 / RHEL 9.3
Oracle Linux 8.x / RHEL 8.x
Oracle Linux 7.x / RHEL 7.x
Ubuntu 20.04 / 22.04
CentOS 7.x / 8.x
国产麒麟操作系统 Kylin v10 SP3
欧拉操作系统 openEuler 22.03 LTS

# 内核版本要求
Linux内核版本:3.10或以上
推荐内核版本:4.18或以上

# 系统架构要求
x86_64架构
ARM64架构(部分支持)

2.3 Rancher数据库运行平台环境

Rancher数据库运行平台环境要求:

# Docker环境要求
Docker版本:20.10.x或以上
Docker CE或Docker EE
Containerd版本:1.5.x或以上

# Kubernetes集群要求
Kubernetes版本:1.23.x – 1.28.x
支持RKE、RKE2、K3s
支持云厂商托管服务(EKS、ACK、GKE)

# 网络环境要求
网络延迟:< 10ms(同机房) 网络带宽:> 1Gbps
防火墙规则:开放必要端口

生产环境建议:Rancher数据库部署建议使用高可用架构,至少3个节点,配置SSD存储,确保网络稳定。学习交流加群风哥QQ113257174

Part03-生产环境项目实施方案

3.1 Rancher数据库环境检查与准备

3.1.1 Rancher数据库系统环境检查

# 检查系统信息
[root@fgedu ~]# hostnamectl
Static hostname: fgedu.net.cn
Icon name: computer-vm
Chassis: vm
Machine ID: 1234567890abcdef1234567890abcdef
Boot ID: abcdef1234567890abcdef1234567890
Virtualization: kvm
Operating System: Oracle Linux Server 9.3
CPE OS Name: cpe:/o:oracle:linux:9:3
Kernel: Linux 5.14.0-284.11.1.el9_3.x86_64
Architecture: x86-64

# 检查CPU信息
[root@fgedu ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 7
CPU MHz: 2200.000
BogoMIPS: 4400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 16896K
NUMA node0 CPU(s): 0-7

# 检查内存信息
[root@fgedu ~]# free -h
total used free shared buff/cache available
Mem: 15Gi 2.1Gi 11Gi 12Mi 2.2Gi 12Gi
Swap: 4.0Gi 0B 4.0Gi

# 检查磁盘信息
[root@fgedu ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.5G 0 7.5G 0% /dev
tmpfs 7.6G 0 7.6G 0% /dev/shm
tmpfs 3.0G 9.5M 3.0G 1% /run
/dev/sda1 200G 15G 185G 8% /
/dev/sdb1 200G 10G 190G 5% /Rancher/fgdata
tmpfs 1.5G 0 1.5G 0% /run/user/0

# 检查网络信息
[root@fgedu ~]# ip addr show
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:1a:4a:16:01:4a brd ff:ff:ff:ff:ff:ff
inet 192.168.1.100/24 brd 192.168.1.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::21a:4aff:fe16:14a/64 scope link
valid_lft forever preferred_lft forever

3.1.2 Rancher数据库系统参数优化

# 配置内核参数
[root@fgedu ~]# cat >> /etc/sysctl.conf << EOF # Rancher数据库内核参数优化 fs.file-max = 6815744 fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.conf.all.forwarding = 1 net.ipv4.neigh.default.gc_thresh1 = 4096 net.ipv4.neigh.default.gc_thresh2 = 8192 net.ipv4.neigh.default.gc_thresh3 = 16384 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 15 net.core.somaxconn = 32768 net.core.netdev_max_backlog = 16384 vm.max_map_count = 262144 vm.swappiness = 10 EOF # 应用内核参数 [root@fgedu ~]# sysctl -p fs.file-max = 6815744 fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.conf.all.forwarding = 1 net.ipv4.neigh.default.gc_thresh1 = 4096 net.ipv4.neigh.default.gc_thresh2 = 8192 net.ipv4.neigh.default.gc_thresh3 = 16384 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 15 net.core.somaxconn = 32768 net.core.netdev_max_backlog = 16384 vm.max_map_count = 262144 vm.swappiness = 10 # 配置文件描述符限制 [root@fgedu ~]# cat >> /etc/security/limits.conf << EOF # Rancher数据库文件描述符限制 * soft nofile 65536 * hard nofile 65536 * soft nproc 65536 * hard nproc 65536 EOF # 配置systemd限制 [root@fgedu ~]# mkdir -p /etc/systemd/system.conf.d [root@fgedu ~]# cat > /etc/systemd/system.conf.d/rancher.conf << EOF [Manager] DefaultLimitNOFILE=65536 DefaultLimitNPROC=65536 EOF # 重载systemd配置 [root@fgedu ~]# systemctl daemon-reload

3.2 Rancher数据库Docker环境安装

3.2.1 Rancher数据库Docker安装

# 卸载旧版本Docker
[root@fgedu ~]# yum remove -y docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine

# 安装Docker依赖包
[root@fgedu ~]# yum install -y yum-utils device-mapper-persistent-data lvm2

# 添加Docker仓库
[root@fgedu ~]# yum-config-manager –add-repo https://download.docker.com/linux/centos/docker-ce.repo

# 安装Docker CE
[root@fgedu ~]# yum install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# 启动Docker服务
[root@fgedu ~]# systemctl start docker
[root@fgedu ~]# systemctl enable docker

# 验证Docker安装
[root@fgedu ~]# docker –version
Docker version 24.0.7, build afdd53b

# 查看Docker信息
[root@fgedu ~]# docker info
Client: Docker Engine – Community
Version: 24.0.7
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
compose: Docker Compose (Docker Inc.)
Version: v2.21.0

Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 24.0.7
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 816ae42871e7f6c6e3d1b7e8b3c2d4e5f6a7b8c9d
runc version: v1.1.9-0-gccaecfb
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.14.0-284.11.1.el9_3.x86_64
Operating System: Oracle Linux Server 9.3
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.6GiB
Name: fgedu.net.cn
ID: ABCD1234EFGH5678
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

3.2.2 Rancher数据库Docker配置优化

# 配置Docker daemon
[root@fgedu ~]# mkdir -p /etc/docker
[root@fgedu ~]# cat > /etc/docker/daemon.json << EOF { "registry-mirrors": [ "https://docker.mirrors.ustc.edu.cn", "https://hub-mirror.c.163.com" ], "data-root": "/Rancher/fgdata/docker", "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "3" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ], "exec-opts": ["native.cgroupdriver=systemd"], "live-restore": true, "max-concurrent-downloads": 10, "max-concurrent-uploads": 10, "default-ulimits": { "nofile": { "Name": "nofile", "Hard": 65536, "Soft": 65536 } } } EOF # 重启Docker服务 [root@fgedu ~]# systemctl restart docker [root@fgedu ~]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2026-04-10 10:00:00 CST; 5s ago Docs: https://docs.docker.com Main PID: 12345 (dockerd) Tasks: 10 Memory: 45.2M CGroup: /system.slice/docker.service └─12345 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock Apr 10 10:00:00 fgedu.net.cn dockerd[12345]: time="2026-04-10T10:00:00.123456789+08:00" level=info msg="API listen on /var/run/docker.sock" Apr 10 10:00:00 fgedu.net.cn systemd[1]: Started Docker Application Container Engine. # 验证Docker配置 [root@fgedu ~]# docker info | grep -A 5 "Storage Driver" Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true userxattr: false

3.3 Rancher数据库网络配置与优化

3.3.1 Rancher数据库防火墙配置

# 配置防火墙规则
[root@fgedu ~]# firewall-cmd –permanent –add-port=80/tcp
success
[root@fgedu ~]# firewall-cmd –permanent –add-port=443/tcp
success
[root@fgedu ~]# firewall-cmd –permanent –add-port=6443/tcp
success
[root@fgedu ~]# firewall-cmd –permanent –add-port=2376/tcp
success
[root@fgedu ~]# firewall-cmd –permanent –add-port=2379/tcp
success
[root@fgedu ~]# firewall-cmd –permanent –add-port=2380/tcp
success
[root@fgedu ~]# firewall-cmd –reload
success

# 查看防火墙规则
[root@fgedu ~]# firewall-cmd –list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: eth0
sources:
services: cockpit dhcpv6-client ssh
ports: 80/tcp 443/tcp 6443/tcp 2376/tcp 2379/tcp 2380/tcp
protocols:
forward: yes
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:

# 或者关闭防火墙(测试环境)
[root@fgedu ~]# systemctl stop firewalld
[root@fgedu ~]# systemctl disable firewalld

3.3.2 Rancher数据库SELinux配置

# 检查SELinux状态
[root@fgedu ~]# getenforce
Enforcing

# 设置SELinux为Permissive模式
[root@fgedu ~]# setenforce 0
[root@fgedu ~]# getenforce
Permissive

# 永久关闭SELinux
[root@fgedu ~]# sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
[root@fgedu ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing – SELinux security policy is enforced.
# permissive – SELinux prints warnings instead of enforcing.
# disabled – No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
# targeted – Targeted processes are protected,
# minimum – Modification of targeted policy. Only selected processes are protected.
# mls – Multi Level Security protection.
SELINUXTYPE=targeted

风哥提示:生产环境建议保持SELinux为Enforcing模式,但需要配置适当的SELinux策略。更多学习教程公众号风哥教程itpux_com

Part04-生产案例与实战讲解

4.1 Rancher数据库多集群统一管理

4.1.1 Rancher数据库创建第一个集群

# 拉取Rancher镜像
[root@fgedu ~]# docker pull rancher/rancher:v2.8.5
v2.8.5: Pulling from rancher/rancher
1234567890ab: Pull complete
2345678901bc: Pull complete
3456789012cd: Pull complete
4567890123de: Pull complete
5678901234ef: Pull complete
6789012345fa: Pull complete
7890123456ab: Pull complete
8901234567bc: Pull complete
9012345678cd: Pull complete
Digest: sha256:abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890
Status: Downloaded newer image for rancher/rancher:v2.8.5
docker.io/rancher/rancher:v2.8.5

# 启动Rancher容器
[root@fgedu ~]# docker run -d –restart=unless-stopped \
–name rancher \
-p 80:80 -p 443:443 \
-v /Rancher/fgdata/rancher:/var/lib/rancher \
-v /Rancher/fgdata/rancher/log:/var/log/rancher \
-e CATTLE_SYSTEM_DEFAULT_REGISTRY=registry.cn-hangzhou.aliyuncs.com \
-e CATTLE_SYSTEM_CATALOG=bundled \
–privileged \
rancher/rancher:v2.8.5

1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef

# 查看Rancher容器状态
[root@fgedu ~]# docker ps -a | grep rancher
1234567890ab rancher/rancher:v2.8.5 “entrypoint.sh” 10 seconds ago Up 9 seconds 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp rancher

# 查看Rancher容器日志
[root@fgedu ~]# docker logs -f rancher
INFO: Starting Rancher
INFO: Rancher is starting
INFO: Waiting for Rancher to be ready…
INFO: Rancher is ready
INFO: Rancher is running

4.1.2 Rancher数据库访问Web界面

# 获取Rancher初始密码
[root@fgedu ~]# docker logs rancher 2>&1 | grep “Bootstrap Password:”
INFO: Bootstrap Password: admin-password-1234567890

# 访问Rancher Web界面
# 打开浏览器访问:https://192.168.1.100
# 使用初始密码登录后,需要修改密码

# 查看Rancher容器资源使用情况
[root@fgedu ~]# docker stats rancher –no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
1234567890ab rancher 5.23% 1.2GiB / 15.6GiB 7.69% 12.3MB / 8.9MB 45.6MB / 23.4MB 156

# 查看Rancher容器详细信息
[root@fgedu ~]# docker inspect rancher | grep -A 20 “Mounts”
“Mounts”: [
{
“Type”: “bind”,
“Source”: “/Rancher/fgdata/rancher”,
“Destination”: “/var/lib/rancher”,
“Mode”: “”,
“RW”: true,
“Propagation”: “rprivate”
},
{
“Type”: “bind”,
“Source”: “/Rancher/fgdata/rancher/log”,
“Destination”: “/var/log/rancher”,
“Mode”: “”,
“RW”: true,
“Propagation”: “rprivate”
}
],

4.2 Rancher数据库RBAC权限配置

4.2.1 Rancher数据库创建用户

# 通过Rancher Web界面创建用户
# 步骤1:登录Rancher管理界面
# 步骤2:点击右上角用户图标 – 选择”用户管理”
# 步骤3:点击”创建用户”按钮
# 步骤4:填写用户信息:
# 用户名:fgedu
# 密码:Fgedu@123456
# 描述:Rancher数据库管理员
# 步骤5:点击”创建”按钮

# 通过API创建用户
[root@fgedu ~]# curl -k -u “admin:admin-password-1234567890” \
-X POST \
-H “Content-Type: application/json” \
-d ‘{
“username”: “fgedu”,
“password”: “Fgedu@123456”,
“description”: “Rancher数据库管理员”
}’ \
https://192.168.1.100/v3/users

{
“id”: “user-1234567890”,
“type”: “user”,
“links”: {
“self”: “https://192.168.1.100/v3/users/user-1234567890”
},
“username”: “fgedu”,
“description”: “Rancher数据库管理员”,
“created”: “2026-04-10T10:00:00Z”,
“createdBy”: “admin”
}

# 查看用户列表
[root@fgedu ~]# curl -k -u “admin:admin-password-1234567890” \
https://192.168.1.100/v3/users

{
“data”: [
{
“id”: “user-admin”,
“type”: “user”,
“username”: “admin”,
“description”: “System Administrator”
},
{
“id”: “user-1234567890”,
“type”: “user”,
“username”: “fgedu”,
“description”: “Rancher数据库管理员”
}
],
“pagination”: {
“limit”: 100,
“offset”: 0,
“total”: 2
}
}

4.2.2 Rancher数据库创建角色

# 通过Rancher Web界面创建角色
# 步骤1:登录Rancher管理界面
# 步骤2:点击右上角用户图标 – 选择”全局设置” – “角色”
# 步骤3:点击”创建角色”按钮
# 步骤4:填写角色信息:
# 角色名称:fgedu-admin
# 角色描述:Rancher数据库管理员角色
# 权限:勾选所有管理权限
# 步骤5:点击”创建”按钮

# 通过API创建角色
[root@fgedu ~]# curl -k -u “admin:admin-password-1234567890” \
-X POST \
-H “Content-Type: application/json” \
-d ‘{
“name”: “fgedu-admin”,
“description”: “Rancher数据库管理员角色”,
“rules”: [
{
“apiGroups”: [“*”],
“resources”: [“*”],
“verbs”: [“*”]
}
]
}’ \
https://192.168.1.100/v3/globalroles

{
“id”: “globalrole-1234567890”,
“type”: “globalRole”,
“links”: {
“self”: “https://192.168.1.100/v3/globalroles/globalrole-1234567890”
},
“name”: “fgedu-admin”,
“description”: “Rancher数据库管理员角色”,
“rules”: [
{
“apiGroups”: [“*”],
“resources”: [“*”],
“verbs”: [“*”]
}
]
}

# 为用户分配角色
[root@fgedu ~]# curl -k -u “admin:admin-password-1234567890” \
-X POST \
-H “Content-Type: application/json” \
-d ‘{
“globalRoleId”: “globalrole-1234567890”,
“userId”: “user-1234567890”
}’ \
https://192.168.1.100/v3/globalrolebindings

{
“id”: “globalrolebinding-1234567890”,
“type”: “globalRoleBinding”,
“links”: {
“self”: “https://192.168.1.100/v3/globalrolebindings/globalrolebinding-1234567890”
},
“globalRoleId”: “globalrole-1234567890”,
“userId”: “user-1234567890”
}

4.3 Rancher数据库监控告警配置

4.3.1 Rancher数据库启用监控

# 通过Rancher Web界面启用监控
# 步骤1:登录Rancher管理界面
# 步骤2:选择集群 – 点击”工具” – “监控”
# 步骤3:点击”启用”按钮
# 步骤4:选择监控版本:v0.45.0
# 步骤5:点击”安装”按钮

# 通过CLI启用监控
[root@fgedu ~]# kubectl create namespace cattle-monitoring-system
namespace/cattle-monitoring-system created

[root@fgedu ~]# helm repo add rancher-monitoring https://charts.rancher.io
“rancher-monitoring” has been added to your repositories

[root@fgedu ~]# helm repo update
Hang tight while we grab the latest from your chart repositories…
…Successfully got an update from the “rancher-monitoring” chart repository
Update Complete. ⎈Happy Helming!⎈

[root@fgedu ~]# helm install rancher-monitoring rancher-monitoring/rancher-monitoring \
–namespace cattle-monitoring-system \
–set prometheus.prometheusSpec.retention=15d \
–set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=local-path \
–set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
–set grafana.persistence.enabled=true \
–set grafana.persistence.size=10Gi \
–set grafana.persistence.storageClassName=local-path

NAME: rancher-monitoring
LAST DEPLOYED: Fri Apr 10 10:00:00 2026
NAMESPACE: cattle-monitoring-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rancher Monitoring Stack has been installed successfully!

Access Grafana:
kubectl -n cattle-monitoring-system port-forward svc/rancher-monitoring-grafana 3000:80

Access Prometheus:
kubectl -n cattle-monitoring-system port-forward svc/rancher-monitoring-prometheus 9090:9090

# 查看监控组件状态
[root@fgedu ~]# kubectl get pods -n cattle-monitoring-system
NAME READY STATUS RESTARTS AGE
rancher-monitoring-operator-1234567890-abcde 1/1 Running 0 2m
rancher-monitoring-prometheus-0 2/2 Running 0 2m
rancher-monitoring-grafana-1234567890-abcde 1/1 Running 0 2m
rancher-monitoring-alertmanager-0 1/1 Running 0 2m
rancher-monitoring-kube-state-metrics-12345678 1/1 Running 0 2m
rancher-monitoring-node-exporter-12345678 1/1 Running 0 2m

4.3.2 Rancher数据库配置告警规则

# 创建告警规则文件
[root@fgedu ~]# cat > /tmp/fgedu-alert-rules.yaml << EOF apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: fgedu-alert-rules namespace: cattle-monitoring-system labels: release: rancher-monitoring spec: groups: - name: fgedu-cluster-alerts rules: - alert: ClusterPodNotReady expr: kube_pod_status_ready{condition="true"} == 0 for: 5m labels: severity: warning annotations: summary: "Pod {{ $labels.pod }} is not ready" description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has been not ready for more than 5 minutes." - alert: ClusterNodeNotReady expr: kube_node_status_ready{condition="true"} == 0 for: 5m labels: severity: critical annotations: summary: "Node {{ $labels.node }} is not ready" description: "Node {{ $labels.node }} has been not ready for more than 5 minutes." - alert: ClusterCPUUsageHigh expr: sum(rate(container_cpu_usage_seconds_total{image!=""}[5m])) by (node) / sum(kube_node_status_capacity{resource="cpu"}) by (node) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: “High CPU usage on node {{ $labels.node }}”
description: “CPU usage on node {{ $labels.node }} is above 80% for more than 10 minutes.”

– alert: ClusterMemoryUsageHigh
expr: sum(container_memory_working_set_bytes{image!=””}) by (node) / sum(kube_node_status_capacity{resource=”memory”}) by (node) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: “High memory usage on node {{ $labels.node }}”
description: “Memory usage on node {{ $labels.node }} is above 80% for more than 10 minutes.”

– alert: ClusterDiskUsageHigh
expr: sum(kubelet_volume_stats_used_bytes) by (node) / sum(kubelet_volume_stats_capacity_bytes) by (node) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: “High disk usage on node {{ $labels.node }}”
description: “Disk usage on node {{ $labels.node }} is above 80% for more than 10 minutes.”
EOF

# 应用告警规则
[root@fgedu ~]# kubectl apply -f /tmp/fgedu-alert-rules.yaml
prometheusrule.monitoring.coreos.com/fgedu-alert-rules created

# 查看告警规则
[root@fgedu ~]# kubectl get prometheusrules -n cattle-monitoring-system
NAME AGE
fgedu-alert-rules 1m
rancher-monitoring 10m

# 查看告警规则详情
[root@fgedu ~]# kubectl get prometheusrules fgedu-alert-rules -n cattle-monitoring-system -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{“apiVersion”:”monitoring.coreos.com/v1″,”kind”:”PrometheusRule”,”metadata”:{“annotations”:{},”labels”:{“release”:”rancher-monitoring”},”name”:”fgedu-alert-rules”,”namespace”:”cattle-monitoring-system”},”spec”:{“groups”:[{“name”:”fgedu-cluster-alerts”,”rules”:[{“alert”:”ClusterPodNotReady”,”annotations”:{“description”:”Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has been not ready for more than 5 minutes.”,”summary”:”Pod {{ $labels.pod }} is not ready”},”expr”:”kube_pod_status_ready{condition=\”true\”} == 0″,”for”:”5m”,”labels”:{“severity”:”warning”}},{“alert”:”ClusterNodeNotReady”,”annotations”:{“description”:”Node {{ $labels.node }} has been not ready for more than 5 minutes.”,”summary”:”Node {{ $labels.node }} is not ready”},”expr”:”kube_node_status_ready{condition=\”true\”} == 0″,”for”:”5m”,”labels”:{“severity”:”critical”}},{“alert”:”ClusterCPUUsageHigh”,”annotations”:{“description”:”CPU usage on node {{ $labels.node }} is above 80% for more than 10 minutes.”,”summary”:”High CPU usage on node {{ $labels.node }}”},”expr”:”sum(rate(container_cpu_usage_seconds_total{image!=\”\”}[5m])) by (node) / sum(kube_node_status_capacity{resource=\”cpu\”}) by (node) > 0.8″,”for”:”10m”,”labels”:{“severity”:”warning”}},{“alert”:”ClusterMemoryUsageHigh”,”annotations”:{“description”:”Memory usage on node {{ $labels.node }} is above 80% for more than 10 minutes.”,”summary”:”High memory usage on node {{ $labels.node }}”},”expr”:”sum(container_memory_working_set_bytes{image!=\”\”}) by (node) / sum(kube_node_status_capacity{resource=\”memory\”}) by (node) > 0.8″,”for”:”10m”,”labels”:{“severity”:”warning”}},{“alert”:”ClusterDiskUsageHigh”,”annotations”:{“description”:”Disk usage on node {{ $labels.node }} is above 80% for more than 10 minutes.”,”summary”:”High disk usage on node {{ $labels.node }}”},”expr”:”sum(kubelet_volume_stats_used_bytes) by (node) / sum(kubelet_volume_stats_capacity_bytes) by (node) > 0.8″,”for”:”10m”,”labels”:{“severity”:”warning”}}]}]}}
creationTimestamp: “2026-04-10T10:00:00Z”
generation: 1
labels:
release: rancher-monitoring
name: fgedu-alert-rules
namespace: cattle-monitoring-system
resourceVersion: “12345678”
uid: abcdef12-3456-7890-abcd-ef1234567890
spec:
groups:
– name: fgedu-cluster-alerts
rules:
– alert: ClusterPodNotReady
annotations:
description: Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has
been not ready for more than 5 minutes.
summary: Pod {{ $labels.pod }} is not ready
expr: kube_pod_status_ready{condition=”true”} == 0
for: 5m
labels:
severity: warning
– alert: ClusterNodeNotReady
annotations:
description: Node {{ $labels.node }} has been not ready for more than 5
minutes.
summary: Node {{ $labels.node }} is not ready
expr: kube_node_status_ready{condition=”true”} == 0
for: 5m
labels:
severity: critical
– alert: ClusterCPUUsageHigh
annotations:
description: CPU usage on node {{ $labels.node }} is above 80% for more
than 10 minutes.
summary: High CPU usage on node {{ $labels.node }}
expr: sum(rate(container_cpu_usage_seconds_total{image!=””}[5m])) by (node)
/ sum(kube_node_status_capacity{resource=”cpu”}) by (node) > 0.8
for: 10m
labels:
severity: warning
– alert: ClusterMemoryUsageHigh
annotations:
description: Memory usage on node {{ $labels.node }} is above 80% for more
than 10 minutes.
summary: High memory usage on node {{ $labels.node }}
expr: sum(container_memory_working_set_bytes{image!=””}) by (node) /
sum(kube_node_status_capacity{resource=”memory”}) by (node) > 0.8
for: 10m
labels:
severity: warning
– alert: ClusterDiskUsageHigh
annotations:
description: Disk usage on node {{ $labels.node }} is above 80% for more
than 10 minutes.
summary: High disk usage on node {{ $labels.node }}
expr: sum(kubelet_volume_stats_used_bytes) by (node) / sum(kubelet_volume_stats_capacity_bytes)
by (node) > 0.8
for: 10m
labels:
severity: warning

生产环境建议:Rancher数据库监控告警配置建议根据实际业务需求调整告警阈值和通知方式。from Rancher视频:www.itpux.com

Part05-风哥经验总结与分享

5.1 Rancher数据库生产环境最佳实践

Rancher数据库生产环境最佳实践:

  • 高可用部署:生产环境建议使用3节点高可用部署,避免单点故障
  • 数据备份:定期备份Rancher数据库和ETCD数据,确保数据安全
  • 监控告警:配置完善的监控告警系统,及时发现和处理问题
  • 权限控制:实施最小权限原则,定期审计用户权限
  • 网络规划:合理规划网络架构,确保网络稳定和安全
  • 版本管理:定期升级Rancher版本,保持系统安全和稳定
  • 日志管理:集中收集和管理日志,便于问题排查和分析

5.2 Rancher数据库性能优化建议

Rancher数据库性能优化建议:

# Rancher数据库性能优化建议

# 1. 硬件优化
– 使用SSD存储提高IO性能
– 增加内存减少swap使用
– 使用万兆网卡提高网络性能
– 配置多核CPU提高计算性能

# 2. 系统优化
– 关闭透明大页(THP)
– 配置大内存页(Huge Pages)
– 优化内核参数
– 调整文件描述符限制

# 3. Docker优化
– 使用overlay2存储驱动
– 配置Docker日志轮转
– 优化Docker网络配置
– 配置Docker资源限制

# 4. Rancher优化
– 启用Rancher缓存
– 配置Rancher资源限制
– 优化Rancher数据库连接池
– 配置Rancher日志级别

# 5. 集群优化
– 合理规划集群规模
– 优化节点资源分配
– 配置Pod资源限制
– 优化网络策略

5.3 Rancher数据库常见问题处理

Rancher数据库常见问题处理:

# Rancher数据库常见问题及解决方案

# 问题1:Rancher容器启动失败
# 现象:docker ps看不到rancher容器
# 原因:端口被占用、资源不足、配置错误
# 解决:
[root@fgedu ~]# docker logs rancher
[root@fgedu ~]# docker inspect rancher
[root@fgedu ~]# netstat -tlnp | grep -E ’80|443′
[root@fgedu ~]# free -h
[root@fgedu ~]# df -h

# 问题2:Rancher Web界面无法访问
# 现象:浏览器无法打开https://192.168.1.100
# 原因:防火墙阻止、网络不通、证书问题
# 解决:
[root@fgedu ~]# firewall-cmd –list-all
[root@fgedu ~]# ping 192.168.1.100
[root@fgedu ~]# curl -k https://192.168.1.100

# 问题3:Rancher集群无法连接
# 现象:集群状态显示为Unknown或Error
# 原因:网络不通、证书过期、配置错误
# 解决:
[root@fgedu ~]# kubectl get nodes
[root@fgedu ~]# kubectl get pods -A
[root@fgedu ~]# kubectl logs -n cattle-system rancher-1234567890-abcde

# 问题4:Rancher监控数据不显示
# 现象:Grafana界面没有数据
# 原因:Prometheus未启动、配置错误、权限问题
# 解决:
[root@fgedu ~]# kubectl get pods -n cattle-monitoring-system
[root@fgedu ~]# kubectl logs -n cattle-monitoring-system rancher-monitoring-prometheus-0
[root@fgedu ~]# kubectl get prometheusrules -n cattle-monitoring-system

风哥提示:Rancher数据库生产环境部署需要充分考虑高可用、数据备份、监控告警等方面。建议在测试环境充分验证后再部署到生产环境。更多视频教程www.fgedu.net.cn

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息