本文档风哥主要介绍Kubernetes节点基础环境预配置检查命令,包括系统资源检查、网络配置检查、内核参数检查等内容,风哥教程参考Kubernetes官方文档和Red Hat Enterprise Linux 10官方文档,适合Linux运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。更多视频教程www.fgedu.net.cn
参考Red Hat Enterprise Linux 10官方文档中的System administration章节
Part01-基础概念与理论知识
1.1 K8s节点环境要求
Kubernetes节点需要满足一定的硬件和软件要求才能正常运行。这些要求包括CPU、内存、磁盘空间、操作系统版本、内核版本、网络配置等多个方面。 更多学习教程公众号风哥教程itpux_com
- CPU:控制平面节点至少2核,工作节点至少1核
- 内存:控制平面节点至少2GB,工作节点至少1GB
- 磁盘:至少20GB可用空间
- 操作系统:RHEL 7/8/10、Ubuntu 18.04/20.04、CentOS 7/8等
- 内核版本:至少3.10以上,建议4.x或5.x
1.2 K8s预配置检查项
Kubernetes节点预配置检查主要包括以下项目:
1. 系统资源检查
– CPU核心数
– 内存容量
– 磁盘空间
– 系统负载
2. 操作系统检查
– 操作系统版本
– 内核版本
– 必要的软件包
3. 网络配置检查
– 主机名解析
– IP地址配置
– 端口可用性
– 网络连通性
4. 内核参数检查
– bridge-nf-call-iptables
– ip_forward
– swap关闭
5. 服务配置检查
– 防火墙配置
– SELinux配置
– 时间同步
6. 容器运行时检查
– Docker/Podman状态
– containerd状态
– 容器网络配置
1.3 预配置检查的重要性
预配置检查对于Kubernetes集群部署至关重要:
- 避免部署失败:提前发现配置问题,避免部署过程中出错
- 提高成功率:确保所有节点满足要求,提高部署成功率
- 减少故障:预防因配置不当导致的运行时故障
- 标准化部署:建立标准化的部署流程和检查清单
- 节省时间:提前发现问题比事后排查更高效
Part02-生产环境规划与建议
2.1 K8s节点规划策略
在生产环境中,Kubernetes节点规划需要考虑以下因素:
1. 节点角色规划
– 控制平面节点(Master)
* 运行etcd、kube-apiserver、kube-controller-manager、kube-scheduler
* 建议奇数个节点(3、5、7)以实现高可用
* 资源要求较高
– 工作节点(Worker)
* 运行kubelet、kube-proxy
* 运行用户工作负载
* 根据业务需求扩展
2. 资源规划
– 控制平面节点
* CPU: 4核以上
* 内存: 8GB以上
* 磁盘: 50GB以上(SSD推荐)
– 工作节点
* CPU: 根据工作负载确定
* 内存: 根据工作负载确定
* 磁盘: 根据容器镜像和数据量确定
3. 网络规划
– Pod网络CIDR: 10.244.0.0/16
– Service网络CIDR: 10.96.0.0/12
– 节点网络: 根据实际网络环境规划
2.2 网络配置规划
Kubernetes网络配置需要特别注意: from LinuxDBA视频:www.itpux.com
控制平面节点端口:
6443 – Kubernetes API Server
2379-2380 – etcd server client API
10250 – Kubelet API
10251 – kube-scheduler
10252 – kube-controller-manager
10257 – kube-controller-manager (secure)
10259 – kube-scheduler (secure)
工作节点端口:
10250 – Kubelet API
10256 – kube-proxy
30000-32767 – NodePort Services
# 检查端口占用
# netstat -tuln | grep -E “6443|2379|2380|10250|10251|10252”
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
2.3 存储配置规划
Kubernetes存储配置规划:
1. 容器运行时存储
– /var/lib/docker (Docker)
– /var/lib/containers (Podman)
– /var/lib/containerd (containerd)
– 建议单独分区,预留足够空间
2. etcd数据存储
– /var/lib/etcd
– 建议使用SSD存储
– 定期备份
3. 日志存储
– /var/log
– 建议单独分区
– 配置日志轮转
4. 临时存储
– /var/lib/kubelet
– /tmp
– 预留足够空间
# 检查磁盘空间
# df -h /var/lib/docker /var/lib/etcd /var/log
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 50G 5.0G 46G 10% /
Part03-生产环境项目实施方案
3.1 系统基础检查
检查系统基础配置是否满足Kubernetes要求: 学习交流加群风哥微信: itpux-com
# cat /etc/os-release
NAME=”Red Hat Enterprise Linux Server”
VERSION=”10.0 (Plow)”
ID=”rhel”
ID_LIKE=”fedora”
VERSION_ID=”10.0″
PLATFORM_ID=”platform:el10″
PRETTY_NAME=”Red Hat Enterprise Linux Server 10.0 (Plow)”
ANSI_COLOR=”0;31″
CPE_NAME=”cpe:/o:redhat:enterprise_linux:10::server”
# 检查内核版本
# uname -r
5.14.0-123.el10.x86_64
# 检查CPU核心数
# nproc
4
# 检查CPU信息
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 45 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
# 检查内存容量
# free -h
total used free shared buff/cache available
Mem: 7.6Gi 1.2Gi 5.8Gi 128Mi 640Mi 6.0Gi
Swap: 0B 0B 0B
# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 1.6G 9.5M 1.6G 1% /run
/dev/mapper/rhel-root 50G 5.0G 46G 10% /
/dev/sda1 1014M 256M 759M 26% /boot
/dev/mapper/rhel-home 50G 512M 50G 1% /home
# 检查系统负载
# uptime
14:00:01 up 4 days, 4:00, 2 users, load average: 0.00, 0.01, 0.05
# 检查主机名
# hostname
rhel10-node1
# 检查主机名解析
# hostnamectl
Static hostname: rhel10-node1
Icon name: computer-vm
Chassis: vm
Machine ID: 1234567890abcdef1234567890abcdef
Boot ID: abcdef1234567890abcdef1234567890
Virtualization: kvm
Operating System: Red Hat Enterprise Linux Server 10.0 (Plow)
CPE OS Name: cpe:/o:redhat:enterprise_linux:10::server
Kernel: Linux 5.14.0-123.el10.x86_64
Architecture: x86-64
# 检查/etc/hosts配置
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.10 rhel10-node1 rhel10-node1.fgedu.net.cn
192.168.1.11 rhel10-node2 rhel10-node2.fgedu.net.cn
192.168.1.12 rhel10-node3 rhel10-node3.fgedu.net.cn
# 检查时间同步
# timedatectl status
Local time: Fri 2026-04-02 14:00:01 CST
Universal time: Fri 2026-04-02 06:00:01 UTC
RTC time: Fri 2026-04-02 06:00:01
Time zone: Asia/Shanghai (CST, +0800)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
# 检查chronyd服务状态
# systemctl status chronyd
● chronyd.service – NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Main PID: 1234 (chronyd)
Tasks: 1 (limit: 23456)
Memory: 1.5M
CPU: 234ms
CGroup: /system.slice/chronyd.service
└─1234 /usr/sbin/chronyd
Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting NTP client/server…
Apr 02 10:00:00 rhel10-node1 chronyd[1234]: chronyd version 4.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 +DEBUG)
Apr 02 10:00:00 rhel10-node1 chronyd[1234]: Frequency -5.234 +/- 0.123 ppm read from /var/lib/chrony/drift
Apr 02 10:00:00 rhel10-node1 systemd[1]: Started NTP client/server.
3.2 网络配置检查
检查网络配置是否满足Kubernetes要求:
# ip addr show
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3:
link/ether 08:00:27:12:34:56 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe12:3456/64 scope link
valid_lft forever preferred_lft forever
# 检查路由表
# ip route show
default via 192.168.1.1 dev enp0s3 proto static metric 100
192.168.1.0/24 dev enp0s3 proto kernel scope link src 192.168.1.10 metric 100
# 检查DNS配置
# cat /etc/resolv.conf
nameserver 192.168.1.1
search fgedu.net.cn
# 检查网络连通性
# ping -c 3 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.234 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.123 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.234 ms
— 192.168.1.1 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2045ms
rtt min/avg/max/mdev = 0.123/0.197/0.234/0.051 ms
# 检查防火墙状态
# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago
Docs: man:firewalld(1)
Main PID: 5678 (firewalld)
Tasks: 2 (limit: 23456)
Memory: 32.5M
CPU: 1.234s
CGroup: /system.slice/firewalld.service
└─5678 /usr/bin/python3 -Es /usr/sbin/firewalld –nofork –nopid
Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting firewalld – dynamic firewall daemon…
Apr 02 10:00:00 rhel10-node1 systemd[1]: Started firewalld – dynamic firewall daemon.
# 检查防火墙规则
# firewall-cmd –list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: cockpit dhcpv6-client ssh
ports:
protocols:
forward: no
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
# 检查iptables规则
# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 udp dpt:53
ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
# 检查端口占用
# netstat -tuln | grep -E “:6443|:2379|:2380|:10250|:10251|:10252|:10255|:10256”
# 检查SELinux状态
# getenforce
Enforcing
# 检查SELinux配置
# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing – SELinux security policy is enforced.
# permissive – SELinux prints warnings instead of enforcing.
# disabled – No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these three values:
# targeted – Targeted processes are protected,
# minimum – Modification of targeted policy. Only selected processes are protected.
# mls – Multi Level Security protection.
SELINUXTYPE=targeted
3.3 K8s特定检查
检查Kubernetes特定的配置要求:
# free -h | grep Swap
Swap: 0B 0B 0B
# 检查/etc/fstab中的swap配置
# grep swap /etc/fstab
# 应该没有输出或已注释
# 检查内核模块
# lsmod | grep -E “br_netfilter|overlay|nf_conntrack”
br_netfilter 24576 0
bridge 192512 1 br_netfilter
overlay 114688 0
nf_conntrack 147456 1 nf_nat
# 加载必要的内核模块
# modprobe br_netfilter
# modprobe overlay
# 配置内核模块开机自动加载
# cat > /etc/modules-load.d/k8s.conf << EOF
br_netfilter
overlay
EOF
# 检查内核参数
# sysctl -a | grep -E "net.bridge.bridge-nf-call|net.ipv4.ip_forward"
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0
# 配置内核参数
# cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用内核参数
# sysctl --system
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
* Applying /usr/lib/sysctl.d/50-coredump.conf ...
* Applying /usr/lib/sysctl.d/50-default.conf ...
* Applying /usr/lib/sysctl.d/50-libkcapi-optmem_max.conf ...
* Applying /usr/lib/sysctl.d/50-pid_max.conf ...
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.d/k8s.conf ...
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
* Applying /etc/sysctl.conf ...
# 检查容器运行时
# which docker
/usr/bin/docker
# 检查Docker版本
# docker --version
Docker version 24.0.0, build 1234567
# 检查Docker服务状态
# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago
Docs: https://docs.docker.com
Main PID: 9012 (dockerd)
Tasks: 12
Memory: 128.5M
CPU: 5.678s
CGroup: /system.slice/docker.service
├─9012 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
└─9013 containerd
Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting Docker Application Container Engine...
Apr 02 10:00:00 rhel10-node1 dockerd[9012]: time="2026-04-02T10:00:00.123456789+08:00" level=info msg="Starting up"
Apr 02 10:00:00 rhel10-node1 systemd[1]: Started Docker Application Container Engine.
# 检查containerd状态
# systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago
Docs: https://containerd.io
Main PID: 9013 (containerd)
Tasks: 10
Memory: 64.5M
CPU: 3.456s
CGroup: /system.slice/containerd.service
└─9013 /usr/bin/containerd
Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting containerd container runtime...
Apr 02 10:00:00 rhel10-node1 containerd[9013]: time="2026-04-02T10:00:00.123456789+08:00" level=info msg="starting containerd" revision=1234567890abcdef version=v1.6.0
Apr 02 10:00:00 rhel10-node1 systemd[1]: Started containerd container runtime.
# 检查kubelet是否已安装
# which kubelet
/usr/bin/kubelet
# 检查kubelet版本
# kubelet --version
Kubernetes v1.28.0
# 检查kubeadm是否已安装
# which kubeadm
/usr/bin/kubeadm
# 检查kubeadm版本
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"1234567890abcdef", GitTreeState:"clean", BuildDate:"2026-01-01T00:00:00Z", GoVersion:"go1.20", Compiler:"gc", Platform:"linux/amd64"}
# 检查kubectl是否已安装
# which kubectl
/usr/bin/kubectl
# 检查kubectl版本
# kubectl version --client
Client Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"1234567890abcdef", GitTreeState:"clean", BuildDate:"2026-01-01T00:00:00Z", GoVersion:"go1.20", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.0
Part04-生产案例与实战讲解
4.1 完整节点预检查案例
案例:对Kubernetes节点进行完整的预配置检查。
# cat > /fgedu/shell/k8s-node-check.sh << 'EOF' #!/bin/bash # from:www.itpux.com.qq113257174.wx:itpux-com echo "=========================================" echo "Kubernetes节点预配置检查" echo "时间: $(date)" echo "主机名: $(hostname)" echo "=========================================" echo "" PASS=0 FAIL=0 # 检查函数 check_pass() { echo "[PASS] $1" ((PASS++)) } check_fail() { echo "[FAIL] $1" ((FAIL++)) } # 1. 检查操作系统版本 echo "1. 检查操作系统版本" if [ -f /etc/redhat-release ]; then OS_VERSION=$(cat /etc/redhat-release) check_pass "操作系统: $OS_VERSION" else check_fail "无法确定操作系统版本" fi echo "" # 2. 检查内核版本 echo "2. 检查内核版本" KERNEL_VERSION=$(uname -r | cut -d. -f1-2) if (( $(echo "$KERNEL_VERSION >= 4.0″ | bc -l) )); then
check_pass “内核版本: $(uname -r)”
else
check_fail “内核版本过低: $(uname -r)”
fi
echo “”
# 3. 检查CPU核心数
echo “3. 检查CPU核心数”
CPU_CORES=$(nproc)
if [ $CPU_CORES -ge 2 ]; then
check_pass “CPU核心数: $CPU_CORES”
else
check_fail “CPU核心数不足: $CPU_CORES (至少需要2核)”
fi
echo “”
# 4. 检查内存容量
echo “4. 检查内存容量”
MEM_TOTAL=$(free -g | awk ‘/^Mem:/{print $2}’)
if [ $MEM_TOTAL -ge 2 ]; then
check_pass “内存容量: ${MEM_TOTAL}GB”
else
check_fail “内存容量不足: ${MEM_TOTAL}GB (至少需要2GB)”
fi
echo “”
# 5. 检查磁盘空间
echo “5. 检查磁盘空间”
DISK_AVAIL=$(df -BG / | awk ‘NR==2 {print $4}’ | tr -d ‘G’)
if [ $DISK_AVAIL -ge 20 ]; then
check_pass “根分区可用空间: ${DISK_AVAIL}GB”
else
check_fail “根分区可用空间不足: ${DISK_AVAIL}GB (至少需要20GB)”
fi
echo “”
# 6. 检查swap
echo “6. 检查swap”
SWAP_TOTAL=$(free -m | awk ‘/^Swap:/{print $2}’)
if [ $SWAP_TOTAL -eq 0 ]; then
check_pass “Swap已关闭”
else
check_fail “Swap未关闭: ${SWAP_TOTAL}MB”
fi
echo “”
# 7. 检查主机名解析
echo “7. 检查主机名解析”
HOSTNAME=$(hostname)
if grep -q “$HOSTNAME” /etc/hosts; then
check_pass “主机名已在/etc/hosts中配置”
else
check_fail “主机名未在/etc/hosts中配置”
fi
echo “”
# 8. 检查时间同步
echo “8. 检查时间同步”
if systemctl is-active chronyd &>/dev/null; then
check_pass “时间同步服务(chronyd)运行正常”
else
check_fail “时间同步服务(chronyd)未运行”
fi
echo “”
# 9. 检查防火墙
echo “9. 检查防火墙”
if systemctl is-active firewalld &>/dev/null; then
check_pass “防火墙服务运行正常”
else
check_fail “防火墙服务未运行”
fi
echo “”
# 10. 检查SELinux
echo “10. 检查SELinux”
SELINUX_STATUS=$(getenforce)
if [ “$SELINUX_STATUS” == “Enforcing” ] || [ “$SELINUX_STATUS” == “Permissive” ]; then
check_pass “SELinux状态: $SELINUX_STATUS”
else
check_fail “SELinux状态: $SELINUX_STATUS”
fi
echo “”
# 11. 检查内核参数
echo “11. 检查内核参数”
IP_FORWARD=$(sysctl -n net.ipv4.ip_forward)
BRIDGE_IPTABLES=$(sysctl -n net.bridge.bridge-nf-call-iptables 2>/dev/null || echo “0”)
if [ “$IP_FORWARD” == “1” ]; then
check_pass “net.ipv4.ip_forward = 1”
else
check_fail “net.ipv4.ip_forward = $IP_FORWARD (应为1)”
fi
if [ “$BRIDGE_IPTABLES” == “1” ]; then
check_pass “net.bridge.bridge-nf-call-iptables = 1”
else
check_fail “net.bridge.bridge-nf-call-iptables = $BRIDGE_IPTABLES (应为1)”
fi
echo “”
# 12. 检查容器运行时
echo “12. 检查容器运行时”
if command -v docker &>/dev/null; then
DOCKER_VERSION=$(docker –version)
check_pass “Docker已安装: $DOCKER_VERSION”
elif command -v podman &>/dev/null; then
PODMAN_VERSION=$(podman –version)
check_pass “Podman已安装: $PODMAN_VERSION”
else
check_fail “未安装容器运行时(Docker或Podman)”
fi
echo “”
# 13. 检查Kubernetes组件
echo “13. 检查Kubernetes组件”
if command -v kubelet &>/dev/null; then
KUBELET_VERSION=$(kubelet –version)
check_pass “kubelet已安装: $KUBELET_VERSION”
else
check_fail “kubelet未安装”
fi
if command -v kubeadm &>/dev/null; then
KUBEADM_VERSION=$(kubeadm version -o short)
check_pass “kubeadm已安装: $KUBEADM_VERSION”
else
check_fail “kubeadm未安装”
fi
if command -v kubectl &>/dev/null; then
KUBECTL_VERSION=$(kubectl version –client -o json 2>/dev/null | grep gitVersion | cut -d'”‘ -f4)
check_pass “kubectl已安装: $KUBECTL_VERSION”
else
check_fail “kubectl未安装”
fi
echo “”
# 检查总结
echo “=========================================”
echo “检查结果汇总”
echo “=========================================”
echo “通过: $PASS”
echo “失败: $FAIL”
echo “总计: $((PASS + FAIL))”
echo “”
if [ $FAIL -eq 0 ]; then
echo “所有检查项通过,节点满足Kubernetes部署要求。”
exit 0
else
echo “存在 $FAIL 个检查项未通过,请修复后再部署Kubernetes。”
exit 1
fi
EOF
# 执行检查脚本
# chmod +x /fgedu/shell/k8s-node-check.sh
# /fgedu/shell/k8s-node-check.sh
=========================================
Kubernetes节点预配置检查
时间: Fri Apr 2 14:00:00 CST 2026
主机名: rhel10-node1
=========================================
1. 检查操作系统版本
[PASS] 操作系统: Red Hat Enterprise Linux release 10.0 (Plow)
2. 检查内核版本
[PASS] 内核版本: 5.14.0-123.el10.x86_64
3. 检查CPU核心数
[PASS] CPU核心数: 4
4. 检查内存容量
[PASS] 内存容量: 7GB
5. 检查磁盘空间
[PASS] 根分区可用空间: 46GB
6. 检查swap
[PASS] Swap已关闭
7. 检查主机名解析
[PASS] 主机名已在/etc/hosts中配置
8. 检查时间同步
[PASS] 时间同步服务(chronyd)运行正常
9. 检查防火墙
[PASS] 防火墙服务运行正常
10. 检查SELinux
[PASS] SELinux状态: Enforcing
11. 检查内核参数
[PASS] net.ipv4.ip_forward = 1
[PASS] net.bridge.bridge-nf-call-iptables = 1
12. 检查容器运行时
[PASS] Docker已安装: Docker version 24.0.0, build 1234567
13. 检查Kubernetes组件
[PASS] kubelet已安装: Kubernetes v1.28.0
[PASS] kubeadm已安装: v1.28.0
[PASS] kubectl已安装: v1.28.0
=========================================
检查结果汇总
=========================================
通过: 15
失败: 0
总计: 15
所有检查项通过,节点满足Kubernetes部署要求。
4.2 问题诊断与修复案例
案例:诊断并修复节点预配置问题。
# 检查swap状态
# free -h | grep Swap
Swap: 2.0Gi 0B 2.0Gi
# 临时关闭swap
# swapoff -a
# 永久关闭swap
# sed -i ‘/swap/d’ /etc/fstab
# 验证swap已关闭
# free -h | grep Swap
Swap: 0B 0B 0B
# 问题2: 内核参数未配置
# 检查内核参数
# sysctl -n net.ipv4.ip_forward
0
# 配置内核参数
# echo “net.ipv4.ip_forward = 1” >> /etc/sysctl.d/k8s.conf
# echo “net.bridge.bridge-nf-call-iptables = 1” >> /etc/sysctl.d/k8s.conf
# echo “net.bridge.bridge-nf-call-ip6tables = 1” >> /etc/sysctl.d/k8s.conf
# 加载br_netfilter模块
# modprobe br_netfilter
# 应用内核参数
# sysctl –system
# 验证内核参数
# sysctl -n net.ipv4.ip_forward
1
# 问题3: 主机名未解析
# 检查主机名
# hostname
rhel10-node1
# 检查/etc/hosts
# grep $(hostname) /etc/hosts
# 无输出
# 添加主机名解析
# echo “192.168.1.10 rhel10-node1 rhel10-node1.fgedu.net.cn” >> /etc/hosts
# 验证主机名解析
# grep $(hostname) /etc/hosts
192.168.1.10 rhel10-node1 rhel10-node1.fgedu.net.cn
# 问题4: 时间不同步
# 检查时间同步状态
# timedatectl status
System clock synchronized: no
# 安装并启动chronyd
# dnf install -y chrony
# systemctl enable –now chronyd
# 检查chronyd状态
# systemctl status chronyd
● chronyd.service – NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2026-04-02 14:00:00 CST; 1min ago
# 手动同步时间
# chronyc makestep
200 OK
4.3 批量节点检查案例
案例:使用Ansible批量检查多个Kubernetes节点。
# cat > /etc/ansible/hosts << 'EOF' [k8s-master] rhel10-master ansible_host=192.168.1.10 [k8s-workers] rhel10-node1 ansible_host=192.168.1.11 rhel10-node2 ansible_host=192.168.1.12 rhel10-node3 ansible_host=192.168.1.13 [k8s-cluster:children] k8s-master k8s-workers [k8s-cluster:vars] ansible_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa EOF # 创建Ansible检查Playbook # cat > /tmp/k8s-node-check.yml << 'EOF' --- - name: Kubernetes Node Pre-check hosts: k8s-cluster gather_facts: yes tasks: - name: Check OS version debug: msg: "OS: {{ ansible_distribution }} {{ ansible_distribution_version }}" - name: Check kernel version debug: msg: "Kernel: {{ ansible_kernel }}" - name: Check CPU cores debug: msg: "CPU cores: {{ ansible_processor_vcpus }}" failed_when: ansible_processor_vcpus < 2 - name: Check memory debug: msg: "Memory: {{ (ansible_memtotal_mb / 1024) | round(1) }}GB" failed_when: ansible_memtotal_mb < 2048 - name: Check disk space debug: msg: "Disk available: {{ (item.size_available / 1024 / 1024 / 1024) | round(1) }}GB" with_items: "{{ ansible_mounts }}" when: item.mount == '/' failed_when: (item.size_available / 1024 / 1024 / 1024) < 20 - name: Check swap debug: msg: "Swap: {{ ansible_swaptotal_mb }}MB" failed_when: ansible_swaptotal_mb > 0
– name: Check time sync
stat:
path: /run/chrony/chronyd.pid
register: chronyd_pid
– name: Report time sync status
debug:
msg: “Time sync: {{ ‘OK’ if chronyd_pid.stat.exists else ‘FAIL’ }}”
– name: Check container runtime
shell: docker –version || podman –version
register: container_runtime
ignore_errors: yes
– name: Report container runtime
debug:
msg: “Container runtime: {{ container_runtime.stdout }}”
when: container_runtime.rc == 0
EOF
# 执行Ansible检查
# ansible-playbook /tmp/k8s-node-check.yml
PLAY [Kubernetes Node Pre-check] ***********************************************
TASK [Gathering Facts] *********************************************************
ok: [rhel10-master]
ok: [rhel10-node1]
ok: [rhel10-node2]
ok: [rhel10-node3]
TASK [Check OS version] ********************************************************
ok: [rhel10-master] => {
“msg”: “OS: RedHat 10.0”
}
ok: [rhel10-node1] => {
“msg”: “OS: RedHat 10.0”
}
ok: [rhel10-node2] => {
“msg”: “OS: RedHat 10.0”
}
ok: [rhel10-node3] => {
“msg”: “OS: RedHat 10.0”
}
TASK [Check kernel version] ****************************************************
ok: [rhel10-master] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
ok: [rhel10-node1] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
ok: [rhel10-node2] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
ok: [rhel10-node3] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
TASK [Check CPU cores] *********************************************************
ok: [rhel10-master] => {
“msg”: “CPU cores: 4”
}
ok: [rhel10-node1] => {
“msg”: “CPU cores: 4”
}
ok: [rhel10-node2] => {
“msg”: “CPU cores: 4”
}
ok: [rhel10-node3] => {
“msg”: “CPU cores: 4”
}
TASK [Check memory] ************************************************************
ok: [rhel10-master] => {
“msg”: “Memory: 7.6GB”
}
ok: [rhel10-node1] => {
“msg”: “Memory: 7.6GB”
}
ok: [rhel10-node2] => {
“msg”: “Memory: 7.6GB”
}
ok: [rhel10-node3] => {
“msg”: “Memory: 7.6GB”
}
TASK [Check disk space] ********************************************************
ok: [rhel10-master] => {
“msg”: “Disk available: 46.0GB”
}
ok: [rhel10-node1] => {
“msg”: “Disk available: 46.0GB”
}
ok: [rhel10-node2] => {
“msg”: “Disk available: 46.0GB”
}
ok: [rhel10-node3] => {
“msg”: “Disk available: 46.0GB”
}
TASK [Check swap] **************************************************************
ok: [rhel10-master] => {
“msg”: “Swap: 0MB”
}
ok: [rhel10-node1] => {
“msg”: “Swap: 0MB”
}
ok: [rhel10-node2] => {
“msg”: “Swap: 0MB”
}
ok: [rhel10-node3] => {
“msg”: “Swap: 0MB”
}
TASK [Check time sync] *********************************************************
ok: [rhel10-master]
ok: [rhel10-node1]
ok: [rhel10-node2]
ok: [rhel10-node3]
TASK [Report time sync status] *************************************************
ok: [rhel10-master] => {
“msg”: “Time sync: OK”
}
ok: [rhel10-node1] => {
“msg”: “Time sync: OK”
}
ok: [rhel10-node2] => {
“msg”: “Time sync: OK”
}
ok: [rhel10-node3] => {
“msg”: “Time sync: OK”
}
TASK [Check container runtime] *************************************************
changed: [rhel10-master]
changed: [rhel10-node1]
changed: [rhel10-node2]
changed: [rhel10-node3]
TASK [Report container runtime] ************************************************
ok: [rhel10-master] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
ok: [rhel10-node1] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
ok: [rhel10-node2] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
ok: [rhel10-node3] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
PLAY RECAP *********************************************************************
rhel10-master : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rhel10-node1 : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rhel10-node2 : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rhel10-node3 : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Part05-风哥经验总结与分享
5.1 节点预检查最佳实践
基于多年Kubernetes运维经验,总结节点预检查的最佳实践:
1. 标准化检查流程
– 制定统一的检查清单
– 使用自动化脚本执行
– 记录检查结果
2. 检查时机
– 部署前全面检查
– 升级前兼容性检查
– 定期巡检
3. 问题处理
– 发现问题立即修复
– 记录问题和解决方案
– 更新检查脚本
4. 文档管理
– 维护检查清单文档
– 记录节点配置信息
– 更新运维手册
# 常用检查命令
# 快速系统检查
echo “=== 系统信息 ===” && \
cat /etc/redhat-release && \
uname -r && \
nproc && \
free -h && \
df -h /
# 快速网络检查
echo “=== 网络信息 ===” && \
hostname && \
ip addr show | grep inet && \
cat /etc/hosts | tail -5
# 快速K8s检查
echo “=== K8s信息 ===” && \
kubelet –version && \
kubeadm version -o short && \
kubectl version –client -o json | grep gitVersion
5.2 预检查清单
提供一份完整的Kubernetes节点预检查清单:
□ 1. 系统版本检查
cat /etc/redhat-release
uname -r
□ 2. 资源检查
nproc (CPU核心数 >= 2)
free -h (内存 >= 2GB)
df -h / (磁盘 >= 20GB)
□ 3. 网络检查
hostname
cat /etc/hosts
ip addr show
ping gateway
□ 4. 时间同步检查
timedatectl status
systemctl status chronyd
□ 5. Swap检查
free -h | grep Swap
grep swap /etc/fstab
□ 6. 内核参数检查
sysctl -n net.ipv4.ip_forward
sysctl -n net.bridge.bridge-nf-call-iptables
□ 7. 内核模块检查
lsmod | grep br_netfilter
lsmod | grep overlay
□ 8. 防火墙检查
systemctl status firewalld
firewall-cmd –list-all
□ 9. SELinux检查
getenforce
cat /etc/selinux/config
□ 10. 容器运行时检查
docker –version
systemctl status docker
□ 11. Kubernetes组件检查
kubelet –version
kubeadm version
kubectl version –client
□ 12. 端口检查
netstat -tuln | grep -E “6443|2379|2380|10250”
□ 13. 服务检查
systemctl list-unit-files –type=service –state=enabled
□ 14. 用户权限检查
id
groups
□ 15. 日志检查
journalctl -u kubelet -u docker –no-pager -n 50
5.3 检查工具推荐
推荐以下Kubernetes节点检查工具: 学习交流加群风哥QQ113257174
1. kubeadm
– Kubernetes官方工具
– 集群初始化和管理
– 预检查功能
2. kubescape
– 安全扫描工具
– 配置合规检查
– 最佳实践验证
3. kube-bench
– CIS基准检查
– 安全配置审计
– 合规性验证
4. node-problem-detector
– 节点问题检测
– 运行时监控
– 问题报告
# 使用kubeadm预检查
# kubeadm init phase preflight
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using ‘kubeadm config images pull’
# 安装kube-bench
# dnf install -y kube-bench
# 运行kube-bench检查
# kube-bench run –targets master
[pASS] 1.1.1 Ensure that the API server pod specification file permissions are set to 644 or more restrictive
[PASS] 1.1.2 Ensure that the API server pod specification file ownership is set to root:root
[PASS] 1.1.3 Ensure that the controller manager pod specification file permissions are set to 644 or more restrictive
[PASS] 1.1.4 Ensure that the controller manager pod specification file ownership is set to root:root
[PASS] 1.1.5 Ensure that the scheduler pod specification file permissions are set to 644 or more restrictive
== Summary ==
41 checks PASS
13 checks FAIL
12 checks WARN
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
