1. 首页 > Linux教程 > 正文

Linux教程FG042-K8s节点基础环境预配置检查命令

本文档风哥主要介绍Kubernetes节点基础环境预配置检查命令,包括系统资源检查、网络配置检查、内核参数检查等内容,风哥教程参考Kubernetes官方文档和Red Hat Enterprise Linux 10官方文档,适合Linux运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。更多视频教程www.fgedu.net.cn

参考Red Hat Enterprise Linux 10官方文档中的System administration章节

Part01-基础概念与理论知识

1.1 K8s节点环境要求

Kubernetes节点需要满足一定的硬件和软件要求才能正常运行。这些要求包括CPU、内存、磁盘空间、操作系统版本、内核版本、网络配置等多个方面。 更多学习教程公众号风哥教程itpux_com

K8s节点最低要求:

  • CPU:控制平面节点至少2核,工作节点至少1核
  • 内存:控制平面节点至少2GB,工作节点至少1GB
  • 磁盘:至少20GB可用空间
  • 操作系统:RHEL 7/8/10、Ubuntu 18.04/20.04、CentOS 7/8等
  • 内核版本:至少3.10以上,建议4.x或5.x

1.2 K8s预配置检查项

Kubernetes节点预配置检查主要包括以下项目:

# K8s预配置检查项
1. 系统资源检查
– CPU核心数
– 内存容量
– 磁盘空间
– 系统负载

2. 操作系统检查
– 操作系统版本
– 内核版本
– 必要的软件包

3. 网络配置检查
– 主机名解析
– IP地址配置
– 端口可用性
– 网络连通性

4. 内核参数检查
– bridge-nf-call-iptables
– ip_forward
– swap关闭

5. 服务配置检查
– 防火墙配置
– SELinux配置
– 时间同步

6. 容器运行时检查
– Docker/Podman状态
– containerd状态
– 容器网络配置

1.3 预配置检查的重要性

预配置检查对于Kubernetes集群部署至关重要:

  • 避免部署失败:提前发现配置问题,避免部署过程中出错
  • 提高成功率:确保所有节点满足要求,提高部署成功率
  • 减少故障:预防因配置不当导致的运行时故障
  • 标准化部署:建立标准化的部署流程和检查清单
  • 节省时间:提前发现问题比事后排查更高效
风哥提示:在部署Kubernetes集群之前,务必对所有节点进行全面的预配置检查。一个不符合要求的节点可能导致整个集群部署失败或运行不稳定。建议使用自动化脚本进行批量检查。

Part02-生产环境规划与建议

2.1 K8s节点规划策略

在生产环境中,Kubernetes节点规划需要考虑以下因素:

# K8s节点规划要点
1. 节点角色规划
– 控制平面节点(Master)
* 运行etcd、kube-apiserver、kube-controller-manager、kube-scheduler
* 建议奇数个节点(3、5、7)以实现高可用
* 资源要求较高

– 工作节点(Worker)
* 运行kubelet、kube-proxy
* 运行用户工作负载
* 根据业务需求扩展

2. 资源规划
– 控制平面节点
* CPU: 4核以上
* 内存: 8GB以上
* 磁盘: 50GB以上(SSD推荐)

– 工作节点
* CPU: 根据工作负载确定
* 内存: 根据工作负载确定
* 磁盘: 根据容器镜像和数据量确定

3. 网络规划
– Pod网络CIDR: 10.244.0.0/16
– Service网络CIDR: 10.96.0.0/12
– 节点网络: 根据实际网络环境规划

2.2 网络配置规划

Kubernetes网络配置需要特别注意: from LinuxDBA视频:www.itpux.com

# K8s网络端口规划
控制平面节点端口:
6443 – Kubernetes API Server
2379-2380 – etcd server client API
10250 – Kubelet API
10251 – kube-scheduler
10252 – kube-controller-manager
10257 – kube-controller-manager (secure)
10259 – kube-scheduler (secure)

工作节点端口:
10250 – Kubelet API
10256 – kube-proxy
30000-32767 – NodePort Services

# 检查端口占用
# netstat -tuln | grep -E “6443|2379|2380|10250|10251|10252”
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN

2.3 存储配置规划

Kubernetes存储配置规划:

# K8s存储规划
1. 容器运行时存储
– /var/lib/docker (Docker)
– /var/lib/containers (Podman)
– /var/lib/containerd (containerd)
– 建议单独分区,预留足够空间

2. etcd数据存储
– /var/lib/etcd
– 建议使用SSD存储
– 定期备份

3. 日志存储
– /var/log
– 建议单独分区
– 配置日志轮转

4. 临时存储
– /var/lib/kubelet
– /tmp
– 预留足够空间

# 检查磁盘空间
# df -h /var/lib/docker /var/lib/etcd /var/log
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 50G 5.0G 46G 10% /

Part03-生产环境项目实施方案

3.1 系统基础检查

检查系统基础配置是否满足Kubernetes要求: 学习交流加群风哥微信: itpux-com

# 检查操作系统版本
# cat /etc/os-release
NAME=”Red Hat Enterprise Linux Server”
VERSION=”10.0 (Plow)”
ID=”rhel”
ID_LIKE=”fedora”
VERSION_ID=”10.0″
PLATFORM_ID=”platform:el10″
PRETTY_NAME=”Red Hat Enterprise Linux Server 10.0 (Plow)”
ANSI_COLOR=”0;31″
CPE_NAME=”cpe:/o:redhat:enterprise_linux:10::server”

# 检查内核版本
# uname -r
5.14.0-123.el10.x86_64

# 检查CPU核心数
# nproc
4

# 检查CPU信息
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 45 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1

# 检查内存容量
# free -h
total used free shared buff/cache available
Mem: 7.6Gi 1.2Gi 5.8Gi 128Mi 640Mi 6.0Gi
Swap: 0B 0B 0B

# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 1.6G 9.5M 1.6G 1% /run
/dev/mapper/rhel-root 50G 5.0G 46G 10% /
/dev/sda1 1014M 256M 759M 26% /boot
/dev/mapper/rhel-home 50G 512M 50G 1% /home

# 检查系统负载
# uptime
14:00:01 up 4 days, 4:00, 2 users, load average: 0.00, 0.01, 0.05

# 检查主机名
# hostname
rhel10-node1

# 检查主机名解析
# hostnamectl
Static hostname: rhel10-node1
Icon name: computer-vm
Chassis: vm
Machine ID: 1234567890abcdef1234567890abcdef
Boot ID: abcdef1234567890abcdef1234567890
Virtualization: kvm
Operating System: Red Hat Enterprise Linux Server 10.0 (Plow)
CPE OS Name: cpe:/o:redhat:enterprise_linux:10::server
Kernel: Linux 5.14.0-123.el10.x86_64
Architecture: x86-64

# 检查/etc/hosts配置
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.10 rhel10-node1 rhel10-node1.fgedu.net.cn
192.168.1.11 rhel10-node2 rhel10-node2.fgedu.net.cn
192.168.1.12 rhel10-node3 rhel10-node3.fgedu.net.cn

# 检查时间同步
# timedatectl status
Local time: Fri 2026-04-02 14:00:01 CST
Universal time: Fri 2026-04-02 06:00:01 UTC
RTC time: Fri 2026-04-02 06:00:01
Time zone: Asia/Shanghai (CST, +0800)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no

# 检查chronyd服务状态
# systemctl status chronyd
● chronyd.service – NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Main PID: 1234 (chronyd)
Tasks: 1 (limit: 23456)
Memory: 1.5M
CPU: 234ms
CGroup: /system.slice/chronyd.service
└─1234 /usr/sbin/chronyd

Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting NTP client/server…
Apr 02 10:00:00 rhel10-node1 chronyd[1234]: chronyd version 4.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 +DEBUG)
Apr 02 10:00:00 rhel10-node1 chronyd[1234]: Frequency -5.234 +/- 0.123 ppm read from /var/lib/chrony/drift
Apr 02 10:00:00 rhel10-node1 systemd[1]: Started NTP client/server.

3.2 网络配置检查

检查网络配置是否满足Kubernetes要求:

# 检查网络接口
# ip addr show
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:12:34:56 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe12:3456/64 scope link
valid_lft forever preferred_lft forever

# 检查路由表
# ip route show
default via 192.168.1.1 dev enp0s3 proto static metric 100
192.168.1.0/24 dev enp0s3 proto kernel scope link src 192.168.1.10 metric 100

# 检查DNS配置
# cat /etc/resolv.conf
nameserver 192.168.1.1
search fgedu.net.cn

# 检查网络连通性
# ping -c 3 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.234 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.123 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.234 ms
— 192.168.1.1 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2045ms
rtt min/avg/max/mdev = 0.123/0.197/0.234/0.051 ms

# 检查防火墙状态
# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago
Docs: man:firewalld(1)
Main PID: 5678 (firewalld)
Tasks: 2 (limit: 23456)
Memory: 32.5M
CPU: 1.234s
CGroup: /system.slice/firewalld.service
└─5678 /usr/bin/python3 -Es /usr/sbin/firewalld –nofork –nopid

Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting firewalld – dynamic firewall daemon…
Apr 02 10:00:00 rhel10-node1 systemd[1]: Started firewalld – dynamic firewall daemon.

# 检查防火墙规则
# firewall-cmd –list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: cockpit dhcpv6-client ssh
ports:
protocols:
forward: no
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:

# 检查iptables规则
# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 udp dpt:53
ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:22

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

# 检查端口占用
# netstat -tuln | grep -E “:6443|:2379|:2380|:10250|:10251|:10252|:10255|:10256”

# 检查SELinux状态
# getenforce
Enforcing

# 检查SELinux配置
# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing – SELinux security policy is enforced.
# permissive – SELinux prints warnings instead of enforcing.
# disabled – No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these three values:
# targeted – Targeted processes are protected,
# minimum – Modification of targeted policy. Only selected processes are protected.
# mls – Multi Level Security protection.
SELINUXTYPE=targeted

3.3 K8s特定检查

检查Kubernetes特定的配置要求:

# 检查swap是否关闭
# free -h | grep Swap
Swap: 0B 0B 0B

# 检查/etc/fstab中的swap配置
# grep swap /etc/fstab
# 应该没有输出或已注释

# 检查内核模块
# lsmod | grep -E “br_netfilter|overlay|nf_conntrack”
br_netfilter 24576 0
bridge 192512 1 br_netfilter
overlay 114688 0
nf_conntrack 147456 1 nf_nat

# 加载必要的内核模块
# modprobe br_netfilter
# modprobe overlay

# 配置内核模块开机自动加载
# cat > /etc/modules-load.d/k8s.conf << EOF br_netfilter overlay EOF # 检查内核参数 # sysctl -a | grep -E "net.bridge.bridge-nf-call|net.ipv4.ip_forward" net.bridge.bridge-nf-call-arptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 # 配置内核参数 # cat > /etc/sysctl.d/k8s.conf << EOF net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 EOF # 应用内核参数 # sysctl --system * Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ... * Applying /usr/lib/sysctl.d/50-coredump.conf ... * Applying /usr/lib/sysctl.d/50-default.conf ... * Applying /usr/lib/sysctl.d/50-libkcapi-optmem_max.conf ... * Applying /usr/lib/sysctl.d/50-pid_max.conf ... * Applying /etc/sysctl.d/99-sysctl.conf ... * Applying /etc/sysctl.d/k8s.conf ... net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 * Applying /etc/sysctl.conf ... # 检查容器运行时 # which docker /usr/bin/docker # 检查Docker版本 # docker --version Docker version 24.0.0, build 1234567 # 检查Docker服务状态 # systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago Docs: https://docs.docker.com Main PID: 9012 (dockerd) Tasks: 12 Memory: 128.5M CPU: 5.678s CGroup: /system.slice/docker.service ├─9012 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock └─9013 containerd Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting Docker Application Container Engine... Apr 02 10:00:00 rhel10-node1 dockerd[9012]: time="2026-04-02T10:00:00.123456789+08:00" level=info msg="Starting up" Apr 02 10:00:00 rhel10-node1 systemd[1]: Started Docker Application Container Engine. # 检查containerd状态 # systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2026-04-02 10:00:00 CST; 4h ago Docs: https://containerd.io Main PID: 9013 (containerd) Tasks: 10 Memory: 64.5M CPU: 3.456s CGroup: /system.slice/containerd.service └─9013 /usr/bin/containerd Apr 02 10:00:00 rhel10-node1 systemd[1]: Starting containerd container runtime... Apr 02 10:00:00 rhel10-node1 containerd[9013]: time="2026-04-02T10:00:00.123456789+08:00" level=info msg="starting containerd" revision=1234567890abcdef version=v1.6.0 Apr 02 10:00:00 rhel10-node1 systemd[1]: Started containerd container runtime. # 检查kubelet是否已安装 # which kubelet /usr/bin/kubelet # 检查kubelet版本 # kubelet --version Kubernetes v1.28.0 # 检查kubeadm是否已安装 # which kubeadm /usr/bin/kubeadm # 检查kubeadm版本 # kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"1234567890abcdef", GitTreeState:"clean", BuildDate:"2026-01-01T00:00:00Z", GoVersion:"go1.20", Compiler:"gc", Platform:"linux/amd64"} # 检查kubectl是否已安装 # which kubectl /usr/bin/kubectl # 检查kubectl版本 # kubectl version --client Client Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"1234567890abcdef", GitTreeState:"clean", BuildDate:"2026-01-01T00:00:00Z", GoVersion:"go1.20", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v5.0.0

Part04-生产案例与实战讲解

4.1 完整节点预检查案例

案例:对Kubernetes节点进行完整的预配置检查。

# 创建K8s节点预检查脚本
# cat > /fgedu/shell/k8s-node-check.sh << 'EOF' #!/bin/bash # from:www.itpux.com.qq113257174.wx:itpux-com echo "=========================================" echo "Kubernetes节点预配置检查" echo "时间: $(date)" echo "主机名: $(hostname)" echo "=========================================" echo "" PASS=0 FAIL=0 # 检查函数 check_pass() { echo "[PASS] $1" ((PASS++)) } check_fail() { echo "[FAIL] $1" ((FAIL++)) } # 1. 检查操作系统版本 echo "1. 检查操作系统版本" if [ -f /etc/redhat-release ]; then OS_VERSION=$(cat /etc/redhat-release) check_pass "操作系统: $OS_VERSION" else check_fail "无法确定操作系统版本" fi echo "" # 2. 检查内核版本 echo "2. 检查内核版本" KERNEL_VERSION=$(uname -r | cut -d. -f1-2) if (( $(echo "$KERNEL_VERSION >= 4.0″ | bc -l) )); then
check_pass “内核版本: $(uname -r)”
else
check_fail “内核版本过低: $(uname -r)”
fi
echo “”

# 3. 检查CPU核心数
echo “3. 检查CPU核心数”
CPU_CORES=$(nproc)
if [ $CPU_CORES -ge 2 ]; then
check_pass “CPU核心数: $CPU_CORES”
else
check_fail “CPU核心数不足: $CPU_CORES (至少需要2核)”
fi
echo “”

# 4. 检查内存容量
echo “4. 检查内存容量”
MEM_TOTAL=$(free -g | awk ‘/^Mem:/{print $2}’)
if [ $MEM_TOTAL -ge 2 ]; then
check_pass “内存容量: ${MEM_TOTAL}GB”
else
check_fail “内存容量不足: ${MEM_TOTAL}GB (至少需要2GB)”
fi
echo “”

# 5. 检查磁盘空间
echo “5. 检查磁盘空间”
DISK_AVAIL=$(df -BG / | awk ‘NR==2 {print $4}’ | tr -d ‘G’)
if [ $DISK_AVAIL -ge 20 ]; then
check_pass “根分区可用空间: ${DISK_AVAIL}GB”
else
check_fail “根分区可用空间不足: ${DISK_AVAIL}GB (至少需要20GB)”
fi
echo “”

# 6. 检查swap
echo “6. 检查swap”
SWAP_TOTAL=$(free -m | awk ‘/^Swap:/{print $2}’)
if [ $SWAP_TOTAL -eq 0 ]; then
check_pass “Swap已关闭”
else
check_fail “Swap未关闭: ${SWAP_TOTAL}MB”
fi
echo “”

# 7. 检查主机名解析
echo “7. 检查主机名解析”
HOSTNAME=$(hostname)
if grep -q “$HOSTNAME” /etc/hosts; then
check_pass “主机名已在/etc/hosts中配置”
else
check_fail “主机名未在/etc/hosts中配置”
fi
echo “”

# 8. 检查时间同步
echo “8. 检查时间同步”
if systemctl is-active chronyd &>/dev/null; then
check_pass “时间同步服务(chronyd)运行正常”
else
check_fail “时间同步服务(chronyd)未运行”
fi
echo “”

# 9. 检查防火墙
echo “9. 检查防火墙”
if systemctl is-active firewalld &>/dev/null; then
check_pass “防火墙服务运行正常”
else
check_fail “防火墙服务未运行”
fi
echo “”

# 10. 检查SELinux
echo “10. 检查SELinux”
SELINUX_STATUS=$(getenforce)
if [ “$SELINUX_STATUS” == “Enforcing” ] || [ “$SELINUX_STATUS” == “Permissive” ]; then
check_pass “SELinux状态: $SELINUX_STATUS”
else
check_fail “SELinux状态: $SELINUX_STATUS”
fi
echo “”

# 11. 检查内核参数
echo “11. 检查内核参数”
IP_FORWARD=$(sysctl -n net.ipv4.ip_forward)
BRIDGE_IPTABLES=$(sysctl -n net.bridge.bridge-nf-call-iptables 2>/dev/null || echo “0”)

if [ “$IP_FORWARD” == “1” ]; then
check_pass “net.ipv4.ip_forward = 1”
else
check_fail “net.ipv4.ip_forward = $IP_FORWARD (应为1)”
fi

if [ “$BRIDGE_IPTABLES” == “1” ]; then
check_pass “net.bridge.bridge-nf-call-iptables = 1”
else
check_fail “net.bridge.bridge-nf-call-iptables = $BRIDGE_IPTABLES (应为1)”
fi
echo “”

# 12. 检查容器运行时
echo “12. 检查容器运行时”
if command -v docker &>/dev/null; then
DOCKER_VERSION=$(docker –version)
check_pass “Docker已安装: $DOCKER_VERSION”
elif command -v podman &>/dev/null; then
PODMAN_VERSION=$(podman –version)
check_pass “Podman已安装: $PODMAN_VERSION”
else
check_fail “未安装容器运行时(Docker或Podman)”
fi
echo “”

# 13. 检查Kubernetes组件
echo “13. 检查Kubernetes组件”
if command -v kubelet &>/dev/null; then
KUBELET_VERSION=$(kubelet –version)
check_pass “kubelet已安装: $KUBELET_VERSION”
else
check_fail “kubelet未安装”
fi

if command -v kubeadm &>/dev/null; then
KUBEADM_VERSION=$(kubeadm version -o short)
check_pass “kubeadm已安装: $KUBEADM_VERSION”
else
check_fail “kubeadm未安装”
fi

if command -v kubectl &>/dev/null; then
KUBECTL_VERSION=$(kubectl version –client -o json 2>/dev/null | grep gitVersion | cut -d'”‘ -f4)
check_pass “kubectl已安装: $KUBECTL_VERSION”
else
check_fail “kubectl未安装”
fi
echo “”

# 检查总结
echo “=========================================”
echo “检查结果汇总”
echo “=========================================”
echo “通过: $PASS”
echo “失败: $FAIL”
echo “总计: $((PASS + FAIL))”
echo “”

if [ $FAIL -eq 0 ]; then
echo “所有检查项通过,节点满足Kubernetes部署要求。”
exit 0
else
echo “存在 $FAIL 个检查项未通过,请修复后再部署Kubernetes。”
exit 1
fi
EOF

# 执行检查脚本
# chmod +x /fgedu/shell/k8s-node-check.sh
# /fgedu/shell/k8s-node-check.sh
=========================================
Kubernetes节点预配置检查
时间: Fri Apr 2 14:00:00 CST 2026
主机名: rhel10-node1
=========================================

1. 检查操作系统版本
[PASS] 操作系统: Red Hat Enterprise Linux release 10.0 (Plow)

2. 检查内核版本
[PASS] 内核版本: 5.14.0-123.el10.x86_64

3. 检查CPU核心数
[PASS] CPU核心数: 4

4. 检查内存容量
[PASS] 内存容量: 7GB

5. 检查磁盘空间
[PASS] 根分区可用空间: 46GB

6. 检查swap
[PASS] Swap已关闭

7. 检查主机名解析
[PASS] 主机名已在/etc/hosts中配置

8. 检查时间同步
[PASS] 时间同步服务(chronyd)运行正常

9. 检查防火墙
[PASS] 防火墙服务运行正常

10. 检查SELinux
[PASS] SELinux状态: Enforcing

11. 检查内核参数
[PASS] net.ipv4.ip_forward = 1
[PASS] net.bridge.bridge-nf-call-iptables = 1

12. 检查容器运行时
[PASS] Docker已安装: Docker version 24.0.0, build 1234567

13. 检查Kubernetes组件
[PASS] kubelet已安装: Kubernetes v1.28.0
[PASS] kubeadm已安装: v1.28.0
[PASS] kubectl已安装: v1.28.0

=========================================
检查结果汇总
=========================================
通过: 15
失败: 0
总计: 15

所有检查项通过,节点满足Kubernetes部署要求。

4.2 问题诊断与修复案例

案例:诊断并修复节点预配置问题。

# 问题1: Swap未关闭
# 检查swap状态
# free -h | grep Swap
Swap: 2.0Gi 0B 2.0Gi

# 临时关闭swap
# swapoff -a

# 永久关闭swap
# sed -i ‘/swap/d’ /etc/fstab

# 验证swap已关闭
# free -h | grep Swap
Swap: 0B 0B 0B

# 问题2: 内核参数未配置
# 检查内核参数
# sysctl -n net.ipv4.ip_forward
0

# 配置内核参数
# echo “net.ipv4.ip_forward = 1” >> /etc/sysctl.d/k8s.conf
# echo “net.bridge.bridge-nf-call-iptables = 1” >> /etc/sysctl.d/k8s.conf
# echo “net.bridge.bridge-nf-call-ip6tables = 1” >> /etc/sysctl.d/k8s.conf

# 加载br_netfilter模块
# modprobe br_netfilter

# 应用内核参数
# sysctl –system

# 验证内核参数
# sysctl -n net.ipv4.ip_forward
1

# 问题3: 主机名未解析
# 检查主机名
# hostname
rhel10-node1

# 检查/etc/hosts
# grep $(hostname) /etc/hosts
# 无输出

# 添加主机名解析
# echo “192.168.1.10 rhel10-node1 rhel10-node1.fgedu.net.cn” >> /etc/hosts

# 验证主机名解析
# grep $(hostname) /etc/hosts
192.168.1.10 rhel10-node1 rhel10-node1.fgedu.net.cn

# 问题4: 时间不同步
# 检查时间同步状态
# timedatectl status
System clock synchronized: no

# 安装并启动chronyd
# dnf install -y chrony
# systemctl enable –now chronyd

# 检查chronyd状态
# systemctl status chronyd
● chronyd.service – NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2026-04-02 14:00:00 CST; 1min ago

# 手动同步时间
# chronyc makestep
200 OK

4.3 批量节点检查案例

案例:使用Ansible批量检查多个Kubernetes节点。

# 创建Ansible清单文件
# cat > /etc/ansible/hosts << 'EOF' [k8s-master] rhel10-master ansible_host=192.168.1.10 [k8s-workers] rhel10-node1 ansible_host=192.168.1.11 rhel10-node2 ansible_host=192.168.1.12 rhel10-node3 ansible_host=192.168.1.13 [k8s-cluster:children] k8s-master k8s-workers [k8s-cluster:vars] ansible_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa EOF # 创建Ansible检查Playbook # cat > /tmp/k8s-node-check.yml << 'EOF' --- - name: Kubernetes Node Pre-check hosts: k8s-cluster gather_facts: yes tasks: - name: Check OS version debug: msg: "OS: {{ ansible_distribution }} {{ ansible_distribution_version }}" - name: Check kernel version debug: msg: "Kernel: {{ ansible_kernel }}" - name: Check CPU cores debug: msg: "CPU cores: {{ ansible_processor_vcpus }}" failed_when: ansible_processor_vcpus < 2 - name: Check memory debug: msg: "Memory: {{ (ansible_memtotal_mb / 1024) | round(1) }}GB" failed_when: ansible_memtotal_mb < 2048 - name: Check disk space debug: msg: "Disk available: {{ (item.size_available / 1024 / 1024 / 1024) | round(1) }}GB" with_items: "{{ ansible_mounts }}" when: item.mount == '/' failed_when: (item.size_available / 1024 / 1024 / 1024) < 20 - name: Check swap debug: msg: "Swap: {{ ansible_swaptotal_mb }}MB" failed_when: ansible_swaptotal_mb > 0

– name: Check time sync
stat:
path: /run/chrony/chronyd.pid
register: chronyd_pid

– name: Report time sync status
debug:
msg: “Time sync: {{ ‘OK’ if chronyd_pid.stat.exists else ‘FAIL’ }}”

– name: Check container runtime
shell: docker –version || podman –version
register: container_runtime
ignore_errors: yes

– name: Report container runtime
debug:
msg: “Container runtime: {{ container_runtime.stdout }}”
when: container_runtime.rc == 0
EOF

# 执行Ansible检查
# ansible-playbook /tmp/k8s-node-check.yml

PLAY [Kubernetes Node Pre-check] ***********************************************

TASK [Gathering Facts] *********************************************************
ok: [rhel10-master]
ok: [rhel10-node1]
ok: [rhel10-node2]
ok: [rhel10-node3]

TASK [Check OS version] ********************************************************
ok: [rhel10-master] => {
“msg”: “OS: RedHat 10.0”
}
ok: [rhel10-node1] => {
“msg”: “OS: RedHat 10.0”
}
ok: [rhel10-node2] => {
“msg”: “OS: RedHat 10.0”
}
ok: [rhel10-node3] => {
“msg”: “OS: RedHat 10.0”
}

TASK [Check kernel version] ****************************************************
ok: [rhel10-master] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
ok: [rhel10-node1] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
ok: [rhel10-node2] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}
ok: [rhel10-node3] => {
“msg”: “Kernel: 5.14.0-123.el10.x86_64”
}

TASK [Check CPU cores] *********************************************************
ok: [rhel10-master] => {
“msg”: “CPU cores: 4”
}
ok: [rhel10-node1] => {
“msg”: “CPU cores: 4”
}
ok: [rhel10-node2] => {
“msg”: “CPU cores: 4”
}
ok: [rhel10-node3] => {
“msg”: “CPU cores: 4”
}

TASK [Check memory] ************************************************************
ok: [rhel10-master] => {
“msg”: “Memory: 7.6GB”
}
ok: [rhel10-node1] => {
“msg”: “Memory: 7.6GB”
}
ok: [rhel10-node2] => {
“msg”: “Memory: 7.6GB”
}
ok: [rhel10-node3] => {
“msg”: “Memory: 7.6GB”
}

TASK [Check disk space] ********************************************************
ok: [rhel10-master] => {
“msg”: “Disk available: 46.0GB”
}
ok: [rhel10-node1] => {
“msg”: “Disk available: 46.0GB”
}
ok: [rhel10-node2] => {
“msg”: “Disk available: 46.0GB”
}
ok: [rhel10-node3] => {
“msg”: “Disk available: 46.0GB”
}

TASK [Check swap] **************************************************************
ok: [rhel10-master] => {
“msg”: “Swap: 0MB”
}
ok: [rhel10-node1] => {
“msg”: “Swap: 0MB”
}
ok: [rhel10-node2] => {
“msg”: “Swap: 0MB”
}
ok: [rhel10-node3] => {
“msg”: “Swap: 0MB”
}

TASK [Check time sync] *********************************************************
ok: [rhel10-master]
ok: [rhel10-node1]
ok: [rhel10-node2]
ok: [rhel10-node3]

TASK [Report time sync status] *************************************************
ok: [rhel10-master] => {
“msg”: “Time sync: OK”
}
ok: [rhel10-node1] => {
“msg”: “Time sync: OK”
}
ok: [rhel10-node2] => {
“msg”: “Time sync: OK”
}
ok: [rhel10-node3] => {
“msg”: “Time sync: OK”
}

TASK [Check container runtime] *************************************************
changed: [rhel10-master]
changed: [rhel10-node1]
changed: [rhel10-node2]
changed: [rhel10-node3]

TASK [Report container runtime] ************************************************
ok: [rhel10-master] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
ok: [rhel10-node1] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
ok: [rhel10-node2] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}
ok: [rhel10-node3] => {
“msg”: “Container runtime: Docker version 24.0.0, build 1234567”
}

PLAY RECAP *********************************************************************
rhel10-master : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rhel10-node1 : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rhel10-node2 : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
rhel10-node3 : ok=12 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Part05-风哥经验总结与分享

5.1 节点预检查最佳实践

基于多年Kubernetes运维经验,总结节点预检查的最佳实践:

# 节点预检查最佳实践
1. 标准化检查流程
– 制定统一的检查清单
– 使用自动化脚本执行
– 记录检查结果

2. 检查时机
– 部署前全面检查
– 升级前兼容性检查
– 定期巡检

3. 问题处理
– 发现问题立即修复
– 记录问题和解决方案
– 更新检查脚本

4. 文档管理
– 维护检查清单文档
– 记录节点配置信息
– 更新运维手册

# 常用检查命令
# 快速系统检查
echo “=== 系统信息 ===” && \
cat /etc/redhat-release && \
uname -r && \
nproc && \
free -h && \
df -h /

# 快速网络检查
echo “=== 网络信息 ===” && \
hostname && \
ip addr show | grep inet && \
cat /etc/hosts | tail -5

# 快速K8s检查
echo “=== K8s信息 ===” && \
kubelet –version && \
kubeadm version -o short && \
kubectl version –client -o json | grep gitVersion

5.2 预检查清单

提供一份完整的Kubernetes节点预检查清单:

# Kubernetes节点预检查清单
□ 1. 系统版本检查
cat /etc/redhat-release
uname -r

□ 2. 资源检查
nproc (CPU核心数 >= 2)
free -h (内存 >= 2GB)
df -h / (磁盘 >= 20GB)

□ 3. 网络检查
hostname
cat /etc/hosts
ip addr show
ping gateway

□ 4. 时间同步检查
timedatectl status
systemctl status chronyd

□ 5. Swap检查
free -h | grep Swap
grep swap /etc/fstab

□ 6. 内核参数检查
sysctl -n net.ipv4.ip_forward
sysctl -n net.bridge.bridge-nf-call-iptables

□ 7. 内核模块检查
lsmod | grep br_netfilter
lsmod | grep overlay

□ 8. 防火墙检查
systemctl status firewalld
firewall-cmd –list-all

□ 9. SELinux检查
getenforce
cat /etc/selinux/config

□ 10. 容器运行时检查
docker –version
systemctl status docker

□ 11. Kubernetes组件检查
kubelet –version
kubeadm version
kubectl version –client

□ 12. 端口检查
netstat -tuln | grep -E “6443|2379|2380|10250”

□ 13. 服务检查
systemctl list-unit-files –type=service –state=enabled

□ 14. 用户权限检查
id
groups

□ 15. 日志检查
journalctl -u kubelet -u docker –no-pager -n 50

5.3 检查工具推荐

推荐以下Kubernetes节点检查工具: 学习交流加群风哥QQ113257174

# 推荐的检查工具
1. kubeadm
– Kubernetes官方工具
– 集群初始化和管理
– 预检查功能

2. kubescape
– 安全扫描工具
– 配置合规检查
– 最佳实践验证

3. kube-bench
– CIS基准检查
– 安全配置审计
– 合规性验证

4. node-problem-detector
– 节点问题检测
– 运行时监控
– 问题报告

# 使用kubeadm预检查
# kubeadm init phase preflight
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using ‘kubeadm config images pull’

# 安装kube-bench
# dnf install -y kube-bench

# 运行kube-bench检查
# kube-bench run –targets master
[pASS] 1.1.1 Ensure that the API server pod specification file permissions are set to 644 or more restrictive
[PASS] 1.1.2 Ensure that the API server pod specification file ownership is set to root:root
[PASS] 1.1.3 Ensure that the controller manager pod specification file permissions are set to 644 or more restrictive
[PASS] 1.1.4 Ensure that the controller manager pod specification file ownership is set to root:root
[PASS] 1.1.5 Ensure that the scheduler pod specification file permissions are set to 644 or more restrictive

== Summary ==
41 checks PASS
13 checks FAIL
12 checks WARN

风哥总结:Kubernetes节点预配置检查是集群部署成功的关键。一个不符合要求的节点可能导致整个集群部署失败或运行不稳定。建议在部署前对所有节点进行全面的预配置检查,并使用自动化工具定期巡检。记住:预防胜于治疗,提前发现问题比事后排查更高效。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息