内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。
本文档详
风哥提示:
细介绍企业服务的故障排查方法和常见问题解决方案。
Part01-Web服务故障排查
1.1 Apache故障排查
$ sudo systemctl status httpd
● httpd.service – The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Fri 2026-04-04 01:00:00 CST; 10s ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 12380 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 12379 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
Main PID: 12379 (code=exited, status=1/FAILURE)
Status: “Total requests: 0; Idle/Busy workers 0/0; Requests/sec: 0; Bytes served/sec: 0 B/sec”
Apr 04 01:00:00 rhel10 systemd[1]: Starting The Apache HTTP Server…
Apr 04 01:00:00 rhel10 httpd[12379]: AH00526: Syntax error on line 123 of /etc/httpd/conf/httpd.conf:
Apr 04 01:00:00 rhel10 httpd[12379]: Invalid command ‘ServerNmae’, perhaps misspelled or defined by a module not included in the server configuration
Apr 04 01:00:00 rhel10 systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Apr 04 01:00:00 rhel10 systemd[1]: httpd.service: Failed with result ‘exit-code’.
Apr 04 01:00:00 rhel10 systemd[1]: Failed to start The Apache HTTP Server.
# 检查配置语法
$ sudo httpd -t
AH00526: Syntax error on line 123 of /etc/httpd/conf/httpd.conf:
Invalid command ‘ServerNmae’, perhaps misspelled or defined by a module not included in the server configuration
# 修复配置错误
$ sudo sed -i ‘s/ServerNmae/ServerName/’ /etc/httpd/conf/httpd.conf
# 再次检查配置
$ sudo httpd -t
Syntax OK
# 重启服务
$ sudo systemctl restart httpd
# 查看错误日志
$ sudo tail -f /var/log/httpd/error_log
[Fri Apr 04 01:00:00.123456 2026] [core:error] [pid 12379] AH00526: Syntax error on line 123 of /etc/httpd/conf/httpd.conf:
Invalid command ‘ServerNmae’, perhaps misspelled or defined by a module not included in the server configuration
# 查看访问日志
$ sudo tail -f /var/log/httpd/access_log
192.168.1.10 – – [04/Apr/2026:01:00:00 +0800] “GET / HTTP/1.1” 200 1234 “-” “Mozilla/5.0”
# 常见问题排查
1. 端口被占用
$ sudo ss -tulpn | grep :80
tcp LISTEN 0 128 *:80 *:* users:((“nginx”,pid=12345,fd=6))
$ sudo systemctl stop nginx
$ sudo systemctl start httpd
2. 权限问题
$ ls -Z /var/www/html/
unconfined_u:object_r:httpd_sys_content_t:s0 index.html
$ sudo restorecon -Rv /var/www/html/
3. SELinux阻止
$ sudo ausearch -m AVC -ts recent
—-
time->Fri Apr 4 01:00:00 2026
type=AVC msg=audit(1234567890.123:123): avc: denied { read } for pid=12379 comm=”httpd” name=”index.html” dev=”dm-0″ ino=12345 scontext=system_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=0
$ sudo chcon -R -t httpd_sys_content_t /var/www/html/
Part02-DNS服务故障排查
2.1 BIND故障排查
$ sudo systemctl status named
● named.service – Berkeley Internet Name Domain (DNS)
Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Fri 2026-04-04 01:05:00 CST; 10s ago
Process: 12381 ExecStartPre=/bin/bash -c if [ ! “$DISABLE_ZONE_CHECKING” == “yes” ]; then /usr/sbin/named-checkconf -z “$NAMEDCONF”; else echo “Checking of zone files is disabled”; fi (code=exited, status=1/FAILURE)
Apr 04 01:05:00 rhel10 bash[12381]: zone fgedu.net.cn/IN: NS ‘ns1.fgedu.net.cn’ has no address records (A or AAAA)
Apr 04 01:05:00 rhel10 bash[12381]: zone fgedu.net.cn/IN: not loaded due to errors.更多学习教程公众号风哥教程itpux_com
Apr 04 01:05:00 rhel10 systemd[1]: named.service: Control process exited, code=exited, status=1/FAILURE
Apr 04 01:05:00 rhel10 systemd[1]: named.service: Failed with result ‘exit-code’.
Apr 04 01:05:00 rhel10 systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).
# 检查配置文件
$ sudo named-checkconf
/etc/named.conf: OK
# 检查区域文件
$ sudo named-checkzone fgedu.net.cn /var/named/fgedu.net.cn.zone
zone fgedu.net.cn/IN: NS ‘ns1.fgedu.net.cn’ has no address records (A or AAAA)
zone fgedu.net.cn/IN: not loaded due to errors.
# 修复区域文件
$ sudo tee /var/named/fgedu.net.cn.zone << EOF
$TTL 86400
@ IN SOA ns1.fgedu.net.cn. admin.fgedu.net.cn. (
2026040401 ; Serial
3600 ; Refresh
1800 ; Retry
604800 ; Expire
86400 ; Minimum TTL
)
@ IN NS ns1.fgedu.net.cn.
@ IN A 192.168.1.100
ns1 IN A 192.168.1.100
www IN A 192.168.1.100
EOF
# 再次检查区域文件
$ sudo named-checkzone fgedu.net.cn /var/named/fgedu.net.cn.zone
zone fgedu.net.cn/IN: loaded serial 2026040401
OK
# 重启服务
$ sudo systemctl restart named
# 测试DNS解析
$ dig @192.168.1.100 www.fgedu.net.cn
; <<>> DiG 9.16.23-RH <<>> @192.168.1.100 www.fgedu.net.cn
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12345
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.fgedu.net.cn. IN A
;; ANSWER SECTION:
www.fgedu.net.cn. 86400 IN A 192.168.1.100
;; Query time: 0 msec
;; SERVER: 192.168.1.100#53(192.学习交流加群风哥QQ113257174168.1.100)
;; WHEN: Fri Apr 04 01:05:30 CST 2026
;; MSG SIZE rcvd: 60
# 查看日志
$ sudo tail -f /var/log/messages | grep named
Apr 4 01:05:00 rhel10 named[12381]: zone fgedu.net.cn/IN: loaded serial 2026040401
Apr 4 01:05:00 rhel10 named[12381]: all zones loaded
Apr 4 01:05:00 rhel10 named[12381]: running
Part03-数据库故障排查
3.1 MySQL故障排查
$ sudo systemctl status mysqld
● mysqld.service – MySQL 8.0 database server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Fri 2026-04-04 01:10:00 CST; 10s ago
Process: 12382 ExecStartPre=/usr/libexec/mysql-check-socket (code=exited, status=0/SUCCESS)
Process: 12383 ExecStartPre=/usr/libexec/mysql-prepare-db-dir mysqld.service (code=exited, status=0/SUCCESS)
Process: 12384 ExecStart=/usr/libexec/mysqld –basedir=/usr (code=exited, status=1/FAILURE)
Main PID: 12384 (code=exited, status=1/FAILURE)
Status: “Server shutdown complete”
Tasks: 0 (limit: 49152)
Memory: 0B
CPU: 500ms
Apr 04 01:10:00 rhel10 systemd[1]: Starting MySQL 8.0 database server…
Apr 04 01:10:00 rhel10 mysqld[12384]: 2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-012960] [InnoDB] Cannot open datafile ‘./ibdata1’
Apr 04 01:10:00 rhel10 mysqld[12384]: 2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Cannot open a file.
Apr 04 01:10:00 rhel10 mysqld[12384]: 2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine
Apr 04 01:10:00 rhel10 mysqld[12384]: 2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
Apr 04 01:10:00 rhel10 mysqld[12384]: 2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-010119] [Server] Aborting
Apr 04 01:10:00 rhel10 systemd[1]: mysqld.service: Main process exited, code=exited, status=1/FAILURE
Apr 04 01:10:00 rhel10 systemd[1]: mysqld.service: Failed with result ‘exit-code’.
Apr 04 01:10:00 rhel10 systemd[1]: Failed to start MySQL 8.0 database server.
# 检查数据目录权限
$ ls -ld /var/lib/mysql
drwxr-xr-x. 5 mysql mysql 4096 Apr 4 01:10 /var/lib/mysql
$ ls -l /var/lib/mysql/ibdata1
ls: cannot access ‘/var/lib/mysql/ibdata1’: No such file or directory
# 恢复数据文件
$ sudo mysqld –initialize –user=mysql
# 检查磁盘空间
$ df -h /var/lib/mysql
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 50G 45G 5.0G 90% /
# 查看错误日志
$ sudo tail -f /var/log/mysqld.log
2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-012960] [InnoDB] Cannot open datafile ‘./ibdata1’
2026-04-03T17:10:00.123456Z 0 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Cannot open a file.
# 常见问题排查
1. 无法连接数据库
$ mysql -u root -p
ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock’ (2)
$ sudo systemctl start mysqld
2. 权限错误
$ mysql -u root -p
ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: YES)
$ sudo grep ‘temporary password’ /var/log/mysqld.log
2026-04-03T17:10:00.123456Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: Abc123!@#
3. 主从复制错误
$ mysql -u root -p -e “SHOW SLAVE STATUS\G” | grep Last_Error
Last_Error: Error ‘Duplicate entry ‘123’ for key ‘PRIMARY” on query.
$ mysql -u root -p -e “STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;”
Part04-网络服务故障排查
4.1 网络连接故障排查
$ ip addr show
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens33:
link/ether 00:0c:29:12:34:56 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.100/24 brd 192.168.1.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
# 检查路由
$ ip route show
default via 192.168.1.1 dev ens33 proto static metric 100
192.168.1.0/24 dev ens33 proto kernel scope link src 192.168.1.100 metric 100
# 测试网络连通性
$ ping -c 4 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.123 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.234 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.345 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=0.456 ms
— 192.168.1.1 ping statistics —
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.123/0.289/0.456/0.123 ms
# 检查DNS解析
$ nslookup www.fgedu.net.cn
Server: 192.168.1.100
Address: 192.168.1.100#53
Name: www.fgedu.net.cn
Address: 192.168.1.100
# 检查端口监听
$ sudo ss -tulpn | grep :80
tcp LISTEN 0 128 *:80 *:* users:((“httpd”,pid=12379,fd=4))
# 检查防火墙规则
$ sudo firewall-cmd –list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: ens33
sources:
services: cockpit dhcpv6-client http ssh
ports:
protocols:
forward: no
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
# 测试端口连通性
$ telnet 192.168.1.100 80
Trying 192.168.1.100…
Connected to 192.168.1.100.
Escape character is ‘^]’.
GET / HTTP/1.1
Host: www.fgedu.net.cn
HTTP/1.1 200 OK
Date: Fri, 04 Apr 2026 01:15:00 GMT
Server: Apache/2.4.53 (Red Hat Enterprise Linux)
Last-Modified: Fri, 04 Apr 2026 01:00:00 GMT
Content-Length: 1234
Content-Type: text/html; charset=UTF-8
# 抓包分析
$ sudo tcpdump -i ens33 port 80 -nn
tcpdump: verbose output suppressed, use -v[v]… for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), snapshot length 262144 bytes
01:15:00.123456 IP 192.168.1.10.54321 > 192.168.1.100.80: Flags [S], seq 1234567890, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 1234567890 ecr 0,sackfrom PG视频:www.itpux.comOK,eol], length 0
01:15:00.123456 IP 192.168.1.100.80 > 192.168.1.10.54321: Flags [S.], seq 1234567891, ack 1234567891, win 65535, options [mss 1460,nop,nop,TS val 1234567891 ecr 1234567890,nop,wscale 6,sackOK,eol], length 0
01:15:00.123456 IP 192.168.1.10.54321 > 192.168.1.100.80: Flags [.], ack 1, win 65535, options [nop,nop,TS val 1234567891 ecr 1234567891], length 0
Part05-系统资源故障排查
5.1 资源耗尽故障排查
$ top
top – 01:20:00 up 1 day, 2:30, 2 users, load average: 4.50, 3.20, 2.10
Tasks: 150 total, 2 running, 148 sleeping,学习交流加群风哥微信: itpux-com 0 stopped, 0 zombie
%Cpu(s): 80.0 us, 10.0 sy, 0.0 ni, 5.0 id, 0.0 wa, 0.0 hi, 5.0 si, 0.0 st
MiB Mem : 8000.0 total, 500.0 free, 7000.0 used, 500.0 buff/cache
MiB Swap: 2000.0 total, 1500.0 free, 500.0 used. 500.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12379 apache 20 0 500000 100000 50000 R 80.0 1.2 0:10.00 httpd
12380 mysql 20 0 2000000 500000 100000 S 20.0 6.2 0:05.00 mysqld
# 检查内存使用
$ free -h
total used free shared buff/cache available
Mem: 8.0Gi 7.0Gi 0.5Gi 0.5Gi 0.5Gi 0.5Gi
Swap: 2.0Gi 0.5Gi 1.5Gi
# 检查磁盘使用
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 50G 45G 5.0G 90% /
devtmpfs 4.0G 0 4.0G 0% /dev
tmpfs 4.0G 0 4.0G 0% /dev/shm
tmpfs 4.0G 8.0M 4.0G 1% /run
tmpfs 4.0G 0 4.0G 0% /sys/fs/cgroup
/dev/sda1 1014M 150M 865M 15% /boot
# 检查磁盘I/O
$ iostat -x 1 5
Linux 5.14.0-284.11.1.el9_2.x86_64 (rhel10) 04/04/2026 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
80.00 0.00 10.00 5.00 0.00 5.00
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 50.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 10.00 0.50 1024.更多视频教程www.fgedu.net.cn00 1024.00 10.00 50.00
dm-0 0.00 50.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 10.00 0.50 1024.00 1024.00 10.00 50.00
# 检查进程状态
$ ps aux –sort=-%cpu | head -10
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
apache 12379 80.0 1.2 500000 100000 ? R 01:20 0:10 /usr/sbin/httpd
mysql 12380 20.0 6.2 2000000 500000 ? S 01:20 0:05 /usr/libexec/mysqld
root 12381 1.0 0.5 123456 78901 ? Ss 01:20 0:01 /usr/sbin/sshd
# 检查系统日志
$ sudo journalctl -p err –since “1 hour ago”
— Logs begin at Thu 2026-04-03 00:00:00 CST, end at Fri 2026-04-04 01:20:00 CST. —
Apr 04 01:20:00 rhel10 kernel: Out of memory: Killed process 12379 (httpd) total-vm:500000kB, anon-rss:100000kB, file-rss:0kB, shmem-rss:0kB
# 查找大文件
$ sudo du -h / –max-depth=1 | sort -hr | head -10
45G /
20G /var
15G /usr
5G /home
3G /var/log
2G /var/lib
$ sudo du -h /var/log –max-depth=1 | sort -hr | head -10
3G /var/log
2G /var/log/messages
500M /var/log/secure
# 清理日志文件
$ sudo journalctl –vacuum-time=7d
Deleted archived journal /var/log/journal/1234567890abcdef1234567890abcdef/system@12345-1234567890.journal (100.0M).
Vacuuming done, freed 100M of archived journals on disk.
# 清理包缓存
$ sudo dnf clean all
50 files removed
1. 检查服务状态和日志
2. 验证配置文件语法
3. 检查网络连通性
4. 监控系统资源使用
5. 分析系统日志和审计日志
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
