本文档风哥主要介绍Podman容器故障处理进阶,包括故障处理的概念、故障类型、故障处理流程以及容器故障处理、Pod故障处理和网络故障处理等内容。风哥教程参考Podman官方文档Troubleshooting部分,适合容器管理员和开发人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 故障处理概念
故障处理是指识别、分析和解决系统或应用程序中的问题,以恢复系统的正常运行。Podman容器故障处理是指识别、分析和解决Podman容器中的问题,以恢复容器的正常运行。更多视频教程www.fgedu.net.cn
- 快速恢复:快速识别和解决问题,恢复系统的正常运行
- 减少损失:减少故障对业务的影响,降低损失
- 防止复发:分析故障原因,防止类似故障再次发生
- 经验积累:积累故障处理经验,提高系统的可靠性
- 持续改进:通过故障处理,持续改进系统和流程
1.2 故障类型
Podman容器的故障类型主要包括:
- 容器启动失败:容器无法正常启动
- 容器运行异常:容器运行过程中出现异常
- 网络连接问题:容器无法正常连接网络
- 存储问题:容器存储出现问题
- 资源不足:容器资源不足
- 镜像问题:镜像拉取或使用出现问题
- 配置问题:容器配置出现问题
- 安全问题:容器安全出现问题
1.3 故障处理流程
Podman容器的故障处理流程主要包括:
- 故障识别:识别容器的故障现象和症状
- 故障分析:分析故障原因和影响范围
- 故障解决:采取措施解决故障
- 故障验证:验证故障是否已解决
- 故障记录:记录故障处理过程和结果
- 故障预防:采取措施防止类似故障再次发生
Part02-生产环境规划与建议
2.1 故障处理策略
生产环境中Podman的故障处理策略:
## 预防策略
– 定期检查:定期检查容器的运行状态
– 监控告警:设置监控和告警,及时发现问题
– 备份策略:定期备份容器数据和配置
– 测试验证:定期测试容器的功能和性能
– 文档记录:记录容器的配置和运行状态
## 响应策略
– 快速响应:快速响应故障,减少影响
– 分级处理:根据故障的严重程度分级处理
– 团队协作:团队协作解决复杂故障
– 沟通机制:建立有效的沟通机制,及时传递故障信息
– 应急方案:制定应急方案,应对重大故障
## 解决策略
– 根因分析:深入分析故障的根本原因
– 方案选择:选择合适的解决方案
– 实施修复:实施修复措施
– 验证测试:验证修复是否成功
– 文档更新:更新故障处理文档
## 改进策略
– 经验总结:总结故障处理经验
– 流程优化:优化故障处理流程
– 系统改进:改进系统和应用程序
– 培训提升:提升团队的故障处理能力
– 持续监控:持续监控系统的运行状态
2.2 故障处理工具
生产环境中Podman的故障处理工具:
- podman logs:查看容器的日志
- podman inspect:查看容器的详细信息
- podman ps:查看容器的运行状态
- podman stats:查看容器的资源使用情况
- podman events:查看容器的事件
- journalctl:查看系统日志
- netstat:查看网络连接
- top:查看系统资源使用情况
- df:查看磁盘使用情况
- dmesg:查看内核日志
2.3 故障预防
生产环境中Podman的故障预防:
- 定期更新:定期更新Podman和容器镜像,修复已知漏洞
- 资源管理:合理配置容器的资源限制,避免资源不足
- 网络配置:合理配置容器的网络,避免网络问题
- 存储管理:合理配置容器的存储,避免存储问题
- 安全配置:合理配置容器的安全,避免安全问题
- 监控告警:设置监控和告警,及时发现问题
- 备份策略:定期备份容器数据和配置,确保数据安全
- 测试验证:定期测试容器的功能和性能,确保系统稳定
Part03-生产环境项目实施方案
3.1 容器故障处理
3.1.1 容器故障处理配置
# 查看容器状态
$ podman ps -a
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab docker.io/library/nginx nginx -g daemon 2 minutes ago Exited (1) 1 minute ago fgedu-nginx
1234567890ab docker.io/library/mysql mysqld 2 minutes ago Up 2 minutes ago 0.0.0.0:3306->3306/tcp fgedu-mysql
# 查看容器日志
$ podman logs fgedu-nginx
# 输出日志
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2026/04/10 10:00:00 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
2026/04/10 10:00:00 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
2026/04/10 10:00:00 [emerg] 1#1: unable to bind listening sockets, shutting down
2026/04/10 10:00:00 [emerg] 1#1: exiting
# 检查端口占用
$ netstat -tuln | grep 80
# 输出日志
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1234/httpd
# 停止占用端口的进程
$ sudo systemctl stop httpd
# 重新运行容器
$ podman start fgedu-nginx
# 查看容器状态
$ podman ps
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab docker.io/library/nginx nginx -g daemon 2 minutes ago Up 10 seconds 0.0.0.0:80->80/tcp fgedu-nginx
1234567890ab docker.io/library/mysql mysqld 2 minutes ago Up 2 minutes ago 0.0.0.0:3306->3306/tcp fgedu-mysql
# 测试容器
$ curl http://localhost
# 输出日志
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
3.2 Pod故障处理
3.2.1 Pod故障处理配置
# 创建Pod
$ podman pod create –name fgedu-pod -p 80:80 -p 3306:3306
# 运行容器到Pod
$ podman run -d –pod fgedu-pod –name fgedu-nginx docker.io/library/nginx
$ podman run -d –pod fgedu-pod –name fgedu-mysql \
-e MYSQL_ROOT_PASSWORD=fgedu123 \
docker.io/library/mysql:8.0
# 查看Pod状态
$ podman pod ps
# 输出日志
POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS
1234567890ab fgedu-pod Running 2 minutes ago 7890123456ab 3
# 查看Pod中的容器
$ podman ps –pod fgedu-pod
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab k8s.gcr.io/pause:3.5 /pause 2 minutes ago Up 2 minutes ago 0.0.0.0:80->80/tcp fgedu-pod-infra
1234567890ab docker.io/library/nginx nginx -g daemon 2 minutes ago Up 2 minutes ago 0.0.0.0:80->80/tcp fgedu-nginx
5678901234ab docker.io/library/mysql mysqld 2 minutes ago Up 2 minutes ago 0.0.0.0:3306->3306/tcp fgedu-mysql
# 查看Pod日志
$ podman pod logs fgedu-pod
# 输出日志
2026-04-10 10:00:00.000000000 +0800 CST m=+0.000000000 container create 7890123456ab (image=k8s.gcr.io/pause:3.5, name=fgedu-pod-infra)
2026-04-10 10:00:00.000000000 +0800 CST m=+0.000000000 container start 7890123456ab (image=k8s.gcr.io/pause:3.5, name=fgedu-pod-infra)
2026-04-10 10:00:00.000000000 +0800 CST m=+0.000000000 container create 1234567890ab (image=docker.io/library/nginx:latest, name=fgedu-nginx)
2026-04-10 10:00:00.000000000 +0800 CST m=+0.000000000 container start 1234567890ab (image=docker.io/library/nginx:latest, name=fgedu-nginx)
2026-04-10 10:00:00.000000000 +0800 CST m=+0.000000000 container create 5678901234ab (image=docker.io/library/mysql:8.0, name=fgedu-mysql)
2026-04-10 10:00:00.000000000 +0800 CST m=+0.000000000 container start 5678901234ab (image=docker.io/library/mysql:8.0, name=fgedu-mysql)
# 停止Pod
$ podman pod stop fgedu-pod
# 启动Pod
$ podman pod start fgedu-pod
# 删除Pod
$ podman pod rm fgedu-pod
3.3 网络故障处理
3.3.1 网络故障处理配置
# 查看网络
$ podman network ls
# 输出日志
NETWORK ID NAME VERSION PLUGINS
80b199c0a230 bridge 0.4.0 bridge,portmap,firewall,tuning
5f72d2f3d4a5 host 0.4.0 host
bdf7c25c754a none 0.4.0 null
# 查看网络详情
$ podman network inspect bridge
# 输出日志
[
{
“name”: “bridge”,
“id”: “80b199c0a230”,
“driver”: “bridge”,
“network_interface”: “cni0”,
“created”: “2026-04-10T10:00:00+08:00”,
“subnets”: [
{
“subnet”: “10.88.0.0/16”,
“gateway”: “10.88.0.1”
}
],
“ipv6_enabled”: false,
“internal”: false,
“dns_enabled”: true,
“ipam_options”: {
“driver”: “host-local”
}
}
]
# 运行容器,使用bridge网络
$ podman run -d –name fgedu-nginx \
–network bridge \
-p 80:80 \
docker.io/library/nginx
# 测试网络连接
$ podman exec -it fgedu-nginx ping www.baidu.com
# 输出日志
PING www.baidu.com (110.242.68.4) 56(84) bytes of data.
64 bytes from 110.242.68.4: icmp_seq=1 ttl=56 time=10.123 ms
64 bytes from 110.242.68.4: icmp_seq=2 ttl=56 time=10.145 ms
64 bytes from 110.242.68.4: icmp_seq=3 ttl=56 time=10.132 ms
# 检查防火墙规则
$ sudo firewall-cmd –list-all
# 输出日志
public (active)
target: default
icmp-block-inversion: no
interfaces: eth0
sources:
services: ssh dhcpv6-client
ports: 80/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
# 重启网络服务
$ sudo systemctl restart network
# 重启Podman网络
$ sudo systemctl restart podman
Part04-生产案例与实战讲解
4.1 容器启动失败
4.1.1 容器启动失败实战
# 运行容器,使用错误的端口映射
$ podman run -d –name fgedu-nginx \
-p 80:80 \
docker.io/library/nginx
# 查看容器状态
$ podman ps -a
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab docker.io/library/nginx nginx -g daemon 2 minutes ago Exited (1) 1 minute ago fgedu-nginx
# 查看容器日志
$ podman logs fgedu-nginx
# 输出日志
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2026/04/10 10:00:00 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
2026/04/10 10:00:00 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
2026/04/10 10:00:00 [emerg] 1#1: unable to bind listening sockets, shutting down
2026/04/10 10:00:00 [emerg] 1#1: exiting
# 检查端口占用
$ netstat -tuln | grep 80
# 输出日志
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1234/httpd
# 停止占用端口的进程
$ sudo systemctl stop httpd
# 重新运行容器
$ podman start fgedu-nginx
# 查看容器状态
$ podman ps
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab docker.io/library/nginx nginx -g daemon 2 minutes ago Up 10 seconds 0.0.0.0:80->80/tcp fgedu-nginx
# 测试容器
$ curl http://localhost
# 输出日志
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
4.2 网络连接问题
4.2.1 网络连接问题实战
# 运行容器
$ podman run -d –name fgedu-nginx \
-p 80:80 \
docker.io/library/nginx
# 测试网络连接
$ podman exec -it fgedu-nginx ping www.baidu.com
# 输出日志
ping: bad address ‘www.baidu.com’
# 检查容器的DNS配置
$ podman exec -it fgedu-nginx cat /etc/resolv.conf
# 输出日志
nameserver 8.8.8.8
nameserver 8.8.4.4
# 检查主机的网络连接
$ ping www.baidu.com
# 输出日志
PING www.baidu.com (110.242.68.4) 56(84) bytes of data.
64 bytes from 110.242.68.4: icmp_seq=1 ttl=64 time=0.123 ms
64 bytes from 110.242.68.4: icmp_seq=2 ttl=64 time=0.145 ms
64 bytes from 110.242.68.4: icmp_seq=3 ttl=64 time=0.132 ms
# 重启容器
$ podman restart fgedu-nginx
# 测试网络连接
$ podman exec -it fgedu-nginx ping www.baidu.com
# 输出日志
PING www.baidu.com (110.242.68.4) 56(84) bytes of data.
64 bytes from 110.242.68.4: icmp_seq=1 ttl=56 time=10.123 ms
64 bytes from 110.242.68.4: icmp_seq=2 ttl=56 time=10.145 ms
64 bytes from 110.242.68.4: icmp_seq=3 ttl=56 time=10.132 ms
# 测试容器访问
$ curl http://localhost
# 输出日志
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
4.3 资源不足问题
4.3.1 资源不足问题实战
# 运行容器,设置内存限制过小
$ podman run -d –name fgedu-mysql \
–memory 128m \
-e MYSQL_ROOT_PASSWORD=fgedu123 \
-p 3306:3306 \
docker.io/library/mysql:8.0
# 查看容器状态
$ podman ps -a
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab docker.io/library/mysql mysqld 2 minutes ago Exited (1) 1 minute ago fgedu-mysql
# 查看容器日志
$ podman logs fgedu-mysql
# 输出日志
2026-04-10T10:00:00.000000Z 0 [Warning] [MY-011070] [Server] ‘Disabling symbolic links using –skip-symbolic-links (or equivalent) is the default. Consider not using this option as it’ is deprecated and will be removed in a future release.
2026-04-10T10:00:00.000000Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.32) starting as process 1
2026-04-10T10:00:00.000000Z 0 [Warning] [MY-010091] [Server] Can’t create test file /var/lib/mysql/7890123456ab.lower-test
2026-04-10T10:00:00.000000Z 0 [Warning] [MY-010159] [Server] Setting lower_case_table_names=2 because file system for /var/lib/mysql/ is case insensitive
2026-04-10T10:00:00.000000Z 0 [ERROR] [MY-010187] [Server] Could not open file ‘/var/lib/mysql/ibdata1’ for reading: No such file or directory
2026-04-10T10:00:00.000000Z 0 [ERROR] [MY-010338] [Server] Can’t find error-message file ‘/usr/share/mysql-8.0/errmsg.sys’. Check error-message file location and ‘lc-messages-dir’ configuration directive.
2026-04-10T10:00:00.000000Z 0 [ERROR] [MY-010349] [Server] Initialization of the server’s SHA256-based password plugin failed. Password encryption of the MySQL user password failed. Check if the password algorithm is supported by the server.
2026-04-10T10:00:00.000000Z 0 [ERROR] [MY-010119] [Server] Aborting
2026-04-10T10:00:00.000000Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.32) MySQL Community Server – GPL.
# 运行容器,设置合理的内存限制
$ podman run -d –name fgedu-mysql \
–memory 2g \
-e MYSQL_ROOT_PASSWORD=fgedu123 \
-p 3306:3306 \
docker.io/library/mysql:8.0
# 查看容器状态
$ podman ps
# 输出日志
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7890123456ab docker.io/library/mysql mysqld 2 minutes ago Up 2 minutes ago 0.0.0.0:3306->3306/tcp fgedu-mysql
# 测试容器
$ podman exec -it fgedu-mysql mysql -u root -p
# 输出日志
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.32 MySQL Community Server – GPL
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> exit
Part05-风哥经验总结与分享
5.1 故障处理最佳实践
Podman容器故障处理的最佳实践:
- 快速响应:快速响应故障,减少对业务的影响
- 根因分析:深入分析故障的根本原因,避免治标不治本
- 系统排查:系统性地排查故障,避免遗漏问题
- 文档记录:详细记录故障处理过程,便于后续分析和学习
- 预防措施:采取措施防止类似故障再次发生
- 团队协作:团队协作解决复杂故障,提高故障处理效率
- 持续改进:通过故障处理,持续改进系统和流程
- 培训提升:提升团队的故障处理能力,减少故障处理时间
5.2 故障处理清单
Podman容器故障处理清单:
## 容器启动失败
– [ ] 检查容器日志:podman logs
– [ ] 检查端口占用:netstat -tuln | grep
– [ ] 检查镜像问题:podman inspect
– [ ] 检查配置问题:podman inspect
## 容器运行异常
– [ ] 检查容器日志:podman logs
– [ ] 检查容器状态:podman ps -a
– [ ] 检查资源使用:podman stats
– [ ] 检查网络连接:podman exec
– [ ] 检查存储问题:podman exec
## 网络连接问题
– [ ] 检查网络配置:podman network inspect
– [ ] 检查网络连接:podman exec
– [ ] 检查防火墙规则:sudo firewall-cmd –list-all
– [ ] 检查DNS配置:podman exec
– [ ] 检查网络设备:ip addr
## 存储问题
– [ ] 检查存储使用:df -h
– [ ] 检查卷状态:podman volume inspect
– [ ] 检查文件权限:ls -la
– [ ] 检查存储配置:cat /etc/containers/storage.conf
## 资源不足问题
– [ ] 检查资源使用:podman stats
– [ ] 检查系统资源:free -h, top
– [ ] 调整资源限制:podman run –memory
– [ ] 优化应用配置:调整应用程序的配置参数
– [ ] 增加硬件资源:增加主机的内存、CPU等资源
## 镜像问题
– [ ] 检查镜像状态:podman images
– [ ] 拉取最新镜像:podman pull
– [ ] 清理镜像缓存:podman rmi $(podman images -q -f dangling=true)
– [ ] 检查镜像签名:podman image trust show
– [ ] 构建自定义镜像:podman build -t
## 安全问题
– [ ] 检查容器权限:podman inspect
– [ ] 检查SELinux状态:sudo sestatus
– [ ] 检查防火墙规则:sudo firewall-cmd –list-all
– [ ] 扫描镜像漏洞:trivy image
– [ ] 配置安全策略:调整容器的安全配置
5.3 故障处理案例分析
Podman容器故障处理案例分析:
## 案例一:容器启动失败(端口占用)
### 故障现象
– 容器启动失败,日志显示端口被占用
– 容器状态为Exited (1)
### 故障原因
– 主机上的httpd服务占用了80端口
– 容器无法绑定到80端口
### 解决方案
– 停止占用端口的httpd服务:sudo systemctl stop httpd
– 重新启动容器:podman start
### 预防措施
– 在启动容器前检查端口占用情况
– 使用不同的端口映射避免端口冲突
– 配置容器的网络模式为host时注意端口冲突
## 案例二:网络连接问题(DNS解析失败)
### 故障现象
– 容器无法解析域名
– 容器内ping www.baidu.com失败
### 故障原因
– 容器的DNS配置不正确
– 主机的网络连接正常,但容器内无法解析域名
### 解决方案
– 重启容器:podman restart
– 检查容器的DNS配置:podman exec
– 配置容器的DNS服务器:podman run –dns 8.8.8.8
### 预防措施
– 确保主机的DNS配置正确
– 在容器启动时指定DNS服务器
– 监控容器的网络连接状态
## 案例三:资源不足问题(内存不足)
### 故障现象
– 容器启动失败,日志显示内存不足
– 容器状态为Exited (1)
### 故障原因
– 为容器设置的内存限制过小
– MySQL需要较多内存才能正常启动
### 解决方案
– 增加容器的内存限制:podman run –memory 2g
– 重新启动容器:podman run –memory 2g
### 预防措施
– 根据应用的需求设置合理的内存限制
– 监控容器的内存使用情况
– 定期检查系统资源使用情况
## 案例四:存储问题(磁盘空间不足)
### 故障现象
– 容器运行异常,日志显示磁盘空间不足
– 容器无法写入数据
### 故障原因
– 主机的磁盘空间不足
– 容器无法写入数据到存储
### 解决方案
– 清理主机的磁盘空间:sudo du -sh /*
– 删除无用的文件和容器:podman system prune -a
– 扩展主机的磁盘空间
### 预防措施
– 定期清理主机的磁盘空间
– 监控主机的磁盘使用情况
– 为容器设置合理的存储限制
## 案例五:镜像问题(镜像拉取失败)
### 故障现象
– 容器启动失败,日志显示镜像拉取失败
– 容器状态为Created
### 故障原因
– 网络连接问题,无法拉取镜像
– 镜像仓库不可用
### 解决方案
– 检查网络连接:ping docker.io
– 重新拉取镜像:podman pull
– 使用本地镜像:podman run
### 预防措施
– 定期拉取最新镜像并缓存
– 配置镜像加速器
– 建立本地镜像仓库
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
