1. 首页 > Linux教程 > 正文

Linux教程FG444-Kubernetes故障排查

内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。

本文档介绍Kubernetes集群的故障排查方法。

风哥提示:

Part01-故障排查概述

1.1 排查流程

# Kubernetes故障排查流程
[root@k8s-master ~]# cat > /root/k8s-troubleshooting.txt << 'EOF' Kubernetes故障排查流程 ===================== 1. 问题定位 - 查看资源状态 - 检查事件日志 - 分析错误信息 2. 常见问题 - Pod启动失败 - 服务无法访问 - 存储挂载失败 - 网络连接问题 3. 排查工具 - kubectl describe - kubectl logs - kubectl events - kubectl exec 4. 日志分析 - Pod日志 - 节点日志 - 控制平面日志 - 审计日志 5. 常用命令 - kubectl get - kubectl describe - kubectl logs - kubectl exec EOF

Part02-Pod故障排查

2.1 Pod状态分析

# 查看Pod状态
[root@k8s-master ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default fgedu-web-abc12-xyz789 0/1 ImagePullBackOff 0 5m
default fgedu-web-abc12-abc12 0/1 CrashLoopBackOff 5 10m
default fgedu-web-abc12-def34 0/1 Pending 0 2m

# 查看Pod详情
[root@k8s-master ~]# kubectl describe pod fgedu-web-abc12-xyz789
Name: fgedu-web-abc12-xyz789
Namespace: default
Node: k8s-node1/192.168.1.101
Start Time: Sat, 04 Apr 2026 12:00:00 +0800
Labels: app=fgedu-web
Status: Pending
IP:
Containers:
nginx:
Container ID:
Image: nginx:invalid-tag
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal Pulling 5m (x4 over 5m) kubelet Pulling image “nginx:invalid-tag”
Warning Failed 5m (x4 over 5m) kubelet Failed to pull image “nginx:invalid-tag”: rpc error: code = Unknown desc = Error response from daemon: manifest for nginx:invalid-tag not found
Warning Failed 5m (x4 over 5m) kubelet Error: ErrImagePull
Normal BackOff 5m (x6 over 5m) kubelet Back-off pulling image “nginx:invalid-tag”
Warning Failed 5m (x6 over 5m) kubelet Error: ImagePullBackOff

# 查看Pod日志
[root@k8s-master ~]# kubectl logs fgedu-web-abc12-abc12
Error from server (BadRequest): container “nginx” in pod “fgedu-web-abc12-abc12” is waiting to start

[root@k8s-master ~]# kubectl logs fgedu-web-abc12-abc12 –previous
2026/04/04 12:00:00 [emerg] 1#1: host not found in upstream “backend” in /etc/nginx/nginx.conf:10
nginx: [emerg] host not found in upstream “backend” in /etc/nginx/nginx.conf:10

# 查看Pending Pod原因
[root@k8s-master ~]# kubectl describe pod fgedu-web-abc12-def34 | grep -A 10 Events:
Events:
Type Reason Age From Message
—- —— —- —- ——-
Warning FailedScheduling 2m default-scheduler 0/3 nodes are available: 3 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.

Part03-节点故障排查

3.1 节点状态分析

# 查看节点状态
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-ma学习交流加群风哥QQ113257174ster Ready control-plane 10d v1.28.0
k8s-node1 NotReady 10d v1.28.0
k8s-node2 Ready 10d v1.28.0

# 查看节点详情
[root@k8s-master ~]# kubectl describe node k8s-node1
Name: k8s-node1
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-node1
kubernetes.io/os=linux
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
—- —— —————– —————— —— ——-
MemoryPressure False Sat, 04 Apr 2026 12:00:00 +0800 Sat, 04 Apr 2026 10:00:00 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 04 Apr 2026 12:00:00 +0800 Sat, 04 Apr 2026 10:00:00 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 04 Apr 2026 12:00:00 +0800 Sat, 04 Apr 2026 10:00:00 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Sat, 04 Apr 2026 12:00:00 +0800 Sat, 04 Apr 2026 11:00:00 +0800 KubeletNotReady container runtime not ready: RuntimeReady=false

# 登录节点检查kubelet状态
[root@k8s-node1 ~]# systemctl status kubelet
● kubelet.service – Kubernetes Kubelet Server
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: disabled)
Active: inactive (dead) since Sat 2026-04-04 11:00:00 CST; 1h ago
Main PID: 12345 (code=exited, status=0/SUCCESS)

# 重启kubelet
[root@k8s-node1 ~]# systemctl restart kubelet
[root@k8s-node1 ~]# systemctl status kubelet
● kubelet.service – Kubernetes Kubelet Server
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: disabled)
Active: active (running) since Sat 2026-04-04 12:00:00 CST; 5s ago
Main PID: 23456 (kubelet)

# 检查容器运行时
[root@k8s-node1 ~]# systemctl status containerd
● containerd.service – containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; preset: disabled)
Active: active (running) since Sat 2026-04-04 12:00:00 CST; 10s ago

Part04-网络故障排查

4.1 网络问题排查

# 检查Service状态
[root@k8s-master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fgedu-web ClusterIP 10.96.100.100 80/TCP 10m
kubernetes ClusterIP 10.96.0.1 443/TCP 10d

# 检查Endpoints
[root@k8s-master ~]# kubectl get endpoints fgedu-web
NAME ENDPOINTS AGE
fgedu-web 10.244.1.10:80,10.244.2.10:80 10m

# 测试DNS解析
[root@k8s-master ~]# kubectl run dns-test –image=busybox –rm -it — nslookup kubernetes
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

# 测试服务连通性
[root@k8s-master ~]# kubectl run curl-test –image=curlimages/curl –rm -it — curl -s http://fgedu-web

Welcome to nginx!

# 检查网络策略
[root@k8s-master ~]# kubectl get networkpolicy -A
NAMESPACE NAME POD-SELECTOR AGE
default default-deny-ingress 5m

# 检查CNI插件状态
[root@k8s-master ~]# kubectl get pods -n kube-system -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE
calico-node-abc12 1/1 Running 0 10d
calico-node-def34 1/1 Running 0 10d
calico-node-ghi56 1/1 Running 0 10d

# 检查CoreDNS状态
[root@k8s-master ~]# kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-5dd5756b68-abc12 1/1 Running 0 10d
coredns-5dd5756b68-def34 1/1 Running 0 10d

风哥针对故障排查建议:

  • 使用describe查看详细信息
  • 检查事件日志定位问题
  • 查看Pod日志分析错误
  • 检查节点和服务状态
  • 验证网络连通性

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息