本文档风哥主要介绍systemd启动分析的使用,包括systemd-analyze的概念、为什么需要启动分析、systemd-analyze的分析类型、启动分析在生产环境的规划、启动分析最佳实践、启动分析安全配置建议、systemd-analyze基础操作详解、systemd-analyze高级使用技巧、系统启动优化实战、启动时间分析实战案例、服务启动时间分析实战案例、启动分析故障排查与解决等内容,参考Red Hat Enterprise Linux 10官方文档,适合Linux运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 systemd-analyze的概念
systemd-analyze是systemd提供的系统启动分析工具,用于分析系统启动时间、服务启动顺序、服务依赖关系等。systemd-analyze可以帮助管理员了解系统启动过程,发现启动瓶颈,优化系统启动时间。更多视频教程www.fgedu.net.cn
- 分析系统启动时间
- 分析服务启动顺序
- 分析服务依赖关系
- 生成启动图表
- 验证单元文件
1.2 为什么需要启动分析
启动分析的原因:
- 性能优化:发现启动瓶颈,优化启动时间
- 故障排查:排查启动故障
- 依赖分析:分析服务依赖关系
- 容量规划:评估系统启动能力
- 监控告警:监控启动时间
1.3 systemd-analyze的分析类型
systemd-analyze的分析类型:
- time:分析系统启动时间
- blame:分析服务启动时间
- critical-chain:分析关键启动链
- plot:生成启动图表
- verify:验证单元文件
- security:分析安全设置
Part02-生产环境规划与建议
2.1 启动分析在生产环境的规划
启动分析在生产环境的规划要点:
– 定期进行启动分析
– 记录启动时间基线
– 监控启动时间变化
– 优化启动时间
– 记录优化措施
# 启动分析注意事项
– 在系统空闲时进行分析
– 多次测量取平均值
– 记录分析结果
– 对比分析结果
– 制定优化方案
2.2 启动分析最佳实践
启动分析最佳实践:
- 定期分析:定期进行启动分析
- 记录基线:记录启动时间基线
- 监控变化:监控启动时间变化
- 优化启动:优化系统启动时间
- 记录措施:记录优化措施
2.3 启动分析安全配置建议
启动分析安全配置建议:
- 权限控制:限制分析工具访问权限
- 日志保护:保护分析日志
- 审计记录:记录分析操作
- 数据安全:保护分析数据
- 合规审计:满足合规要求
Part03-生产环境项目实施方案
3.1 systemd-analyze基础操作详解
3.1.1 分析系统启动时间
# systemd-analyze
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 15.789s (userspace) = 22.368s
# 详细分析系统启动时间
# systemd-analyze time
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 15.789s (userspace) = 22.368s
graphical.target reached after 15.234s in userspace
# 分析上次启动时间
# systemd-analyze time –boot=-1
Startup finished in 4.234s (kernel) + 2.567s (initrd) + 16.123s (userspace) = 22.924s
graphical.target reached after 15.678s in userspace
# 分析特定启动时间
# systemd-analyze time –boot=0
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 15.789s (userspace) = 22.368s
graphical.target reached after 15.234s in userspace
3.1.2 分析服务启动时间
# systemd-analyze blame
15.234s graphical.target
12.345s NetworkManager-wait-online.service
8.765s plymouth-quit-wait.service
5.432s firewalld.service
4.321s tuned.service
3.210s auditd.service
2.109s NetworkManager.service
1.987s sssd.service
1.876s rsyslog.service
1.765s crond.service
1.654s sshd.service
1.543s nginx.service
1.432s httpd.service
1.321s mysqld.service
1.210s php-fpm.service
# 分析特定服务的启动时间
# systemd-analyze blame | grep nginx
1.543s nginx.service
# 分析多个服务的启动时间
# systemd-analyze blame | grep -E “(nginx|httpd|mysql)”
1.543s nginx.service
1.432s httpd.service
1.321s mysqld.service
3.1.3 分析关键启动链
# systemd-analyze critical-chain
the time after the unit is active or started is printed after the “@” character.
the time the unit takes to start is printed after the “+” character.
graphical.target @15.234s
└─multi-user.target @15.234s
└─nginx.service @13.691s +1.543s
└─network.target @13.690s
└─NetworkManager.service @11.581s +2.109s
└─network-pre.target @11.580s
└─firewalld.service @6.148s +5.432s
└─polkit.service @4.321s +1.827s
└─basic.target @4.320s
└─sockets.target @4.320s
└─dbus.socket @4.320s
└─sysinit.target @4.319s
└─systemd-update-utmp.service @4.123s +195ms
└─auditd.service @2.913s +1.210s
└─systemd-tmpfiles-setup.service @2.456s +456ms
└─local-fs.target @2.455s
└─boot.mount @2.234s +220ms
└─systemd-fsck@dev-disk-by\\x2duuid-12345678\\x2d1234\\x2d1234\\x2d1234\\x2d123456789012.service @1.876s +357ms
└─dev-disk-by\\x2duuid-12345678\\x2d1234\\x2d1234\\x2d1234\\x2d123456789012.device @1.876s
# 分析特定服务的关键启动链
# systemd-analyze critical-chain nginx.service
the time after the unit is active or started is printed after the “@” character.
the time the unit takes to start is printed after the “+” character.
nginx.service @13.691s +1.543s
└─network.target @13.690s
└─NetworkManager.service @11.581s +2.109s
└─network-pre.target @11.580s
└─firewalld.service @6.148s +5.432s
└─polkit.service @4.321s +1.827s
└─basic.target @4.320s
└─sockets.target @4.320s
└─dbus.socket @4.320s
└─sysinit.target @4.319s
└─systemd-update-utmp.service @4.123s +195ms
└─auditd.service @2.913s +1.210s
└─systemd-tmpfiles-setup.service @2.456s +456ms
└─local-fs.target @2.455s
└─boot.mount @2.234s +220ms
└─systemd-fsck@dev-disk-by\\x2duuid-12345678\\x2d1234\\x2d1234\\x2d1234\\x2d123456789012.service @1.876s +357ms
└─dev-disk-by\\x2duuid-12345678\\x2d1234\\x2d1234\\x2d1234\\x2d123456789012.device @1.876s
3.2 systemd-analyze高级使用技巧
3.2.1 生成启动图表
# systemd-analyze plot > /tmp/boot-chart.svg
# 查看生成的图表
# ls -lh /tmp/boot-chart.svg
-rw-r–r–. 1 root root 125K Mar 31 10:00 /tmp/boot-chart.svg
# 将图表复制到Web服务器目录
# cp /tmp/boot-chart.svg /var/www/html/
# 在浏览器中查看图表
# 访问 http://localhost/boot-chart.svg
# 生成特定启动的图表
# systemd-analyze plot –boot=-1 > /tmp/boot-chart-last.svg
# 生成特定服务的图表
# systemd-analyze plot –unit=nginx.service > /tmp/nginx-chart.svg
3.2.2 验证单元文件
# systemd-analyze verify
/etc/systemd/system/myapp.service:7: Unknown lvalue ‘ExecStartt’ in section ‘Service’, ignoring.
/etc/systemd/system/myapp.service:7: Failed to parse service type, ignoring: simplee
# 验证特定单元文件
# systemd-analyze verify /etc/systemd/system/nginx.service
# 验证多个单元文件
# systemd-analyze verify /etc/systemd/system/nginx.service /etc/systemd/system/httpd.service
# 验证单元文件并显示详细信息
# systemd-analyze verify –man=no /etc/systemd/system/myapp.service
/etc/systemd/system/myapp.service:7: Unknown lvalue ‘ExecStartt’ in section ‘Service’, ignoring.
/etc/systemd/system/myapp.service:7: Failed to parse service type, ignoring: simplee
3.2.3 安全分析
# systemd-analyze security nginx.service
NAME DESCRIPTION EXPOSURE
✗ PrivateNetwork= Service has access to the host’s network 0.5
✗ User=/DynamicUser= Service runs as root user 0.4
✗ CapabilityBoundingSet=~CAP_NET_RAW Service may create raw sockets 0.3
✗ NoNewPrivileges= Service processes may acquire new privileges 0.2
✗ ProtectSystem= Service has full access to the OS file hierarchy 0.2
✗ ProtectHome= Service has full access to home directories 0.2
✗ PrivateDevices= Service potentially has access to hardware devices 0.2
✗ PrivateTmp= Service has access to other software’s temporary files 0.1
✗ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has administrator privileges 0.1
✗ RestrictAddressFamilies=~AF_PACKET Service may allocate packet sockets 0.1
→ Overall exposure level for nginx.service: 9.6 EXPOSED
# 分析所有服务的安全设置
# systemd-analyze security –no-pager
NAME DESCRIPTION EXPOSURE
✗ nginx.service Service has access to the host’s network 9.6
✗ httpd.service Service has access to the host’s network 9.5
✗ sshd.service Service has access to the host’s network 9.4
✗ mysqld.service Service has access to the host’s network 9.3
✗ php-fpm.service Service has access to the host’s network 9.2
→ Overall exposure level for all services: 9.4 EXPOSED
3.3 系统启动优化实战
3.3.1 识别启动瓶颈
# systemd-analyze
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 15.789s (userspace) = 22.368s
# 2. 分析服务启动时间
# systemd-analyze blame | head -10
15.234s graphical.target
12.345s NetworkManager-wait-online.service
8.765s plymouth-quit-wait.service
5.432s firewalld.service
4.321s tuned.service
3.210s auditd.service
2.109s NetworkManager.service
1.987s sssd.service
1.876s rsyslog.service
1.765s crond.service
# 3. 分析关键启动链
# systemd-analyze critical-chain | head -20
graphical.target @15.234s
└─multi-user.target @15.234s
└─nginx.service @13.691s +1.543s
└─network.target @13.690s
└─NetworkManager.service @11.581s +2.109s
└─network-pre.target @11.580s
└─firewalld.service @6.148s +5.432s
# 4. 识别启动瓶颈
# NetworkManager-wait-online.service 耗时12.345秒,是主要瓶颈
# firewalld.service 耗时5.432秒,是次要瓶颈
3.3.2 优化启动时间
# systemctl disable NetworkManager-wait-online.service
Removed /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.
# 2. 优化firewalld启动
# systemctl edit firewalld.service
[Service]
ExecStartPre=
# 3. 禁用图形界面(如果不需要)
# systemctl set-default multi-user.target
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.
# 4. 重新分析启动时间
# systemd-analyze
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 8.567s (userspace) = 15.146s
# 5. 对比优化效果
# 优化前:22.368s
# 优化后:15.146s
# 优化效果:启动时间减少7.222s(32.3%)
# 6. 分析优化后的服务启动时间
# systemd-analyze blame | head -10
8.567s graphical.target
5.432s firewalld.service
4.321s tuned.service
3.210s auditd.service
2.109s NetworkManager.service
1.987s sssd.service
1.876s rsyslog.service
1.765s crond.service
1.654s sshd.service
1.543s nginx.service
Part04-生产案例与实战讲解
4.1 启动时间分析实战案例
4.1.1 完整分析流程
# systemd-analyze
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 15.789s (userspace) = 22.368s
# 2. 分析各阶段启动时间
# systemd-analyze time
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 15.789s (userspace) = 22.368s
graphical.target reached after 15.234s in userspace
# 3. 分析内核启动时间
# dmesg | grep “Linux version”
[ 0.000000] Linux version 5.14.0-362.8.1.el9_3.x86_64 (mockbuild@x86-01-vm.build.eng.bos.redhat.com) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Wed Nov 8 11:1
# 4. 分析initrd启动时间
# lsinitrd /boot/initramfs-$(uname -r).img | head -20
Image: /boot/initramfs-5.14.0-362.8.1.el9_3.x86_64.img: 21M
========================================================================
Version: dracut-057-21.git20230214.el9
# 5. 分析用户空间启动时间
# systemd-analyze blame | head -10
15.234s graphical.target
12.345s NetworkManager-wait-online.service
8.765s plymouth-quit-wait.service
5.432s firewalld.service
4.321s tuned.service
3.210s auditd.service
2.109s NetworkManager.service
1.987s sssd.service
1.876s rsyslog.service
1.765s crond.service
# 6. 生成启动报告
# cat > /tmp/boot-analysis-report.txt << 'EOF'
系统启动分析报告
==================
分析时间: $(date)
总体启动时间: 22.368s
- 内核启动时间: 4.123s (18.4%)
- initrd启动时间: 2.456s (11.0%)
- 用户空间启动时间: 15.789s (70.6%)
主要瓶颈:
1. NetworkManager-wait-online.service: 12.345s
2. plymouth-quit-wait.service: 8.765s
3. firewalld.service: 5.432s
优化建议:
1. 禁用NetworkManager-wait-online.service(如果不依赖网络)
2. 禁用plymouth(如果不需要图形启动界面)
3. 优化firewalld启动配置
EOF
# 7. 查看报告
# cat /tmp/boot-analysis-report.txt
4.2 服务启动时间分析实战案例
4.2.1 分析特定服务的启动时间
# systemd-analyze blame | grep nginx
1.543s nginx.service
# 2. 分析nginx服务的关键启动链
# systemd-analyze critical-chain nginx.service
the time after the unit is active or started is printed after the “@” character.
the time the unit takes to start is printed after the “+” character.
nginx.service @13.691s +1.543s
└─network.target @13.690s
└─NetworkManager.service @11.581s +2.109s
└─network-pre.target @11.580s
└─firewalld.service @6.148s +5.432s
└─polkit.service @4.321s +1.827s
└─basic.target @4.320s
└─sockets.target @4.320s
└─dbus.socket @4.320s
└─sysinit.target @4.319s
# 3. 分析nginx服务依赖的其他服务
# systemctl list-dependencies nginx.service –before
nginx.service
● └─multi-user.target
# 4. 分析依赖nginx服务的其他服务
# systemctl list-dependencies nginx.service –after
nginx.service
● ├─network.target
● └─sysinit.target
# 5. 查看nginx服务配置
# systemctl cat nginx.service
# /usr/lib/systemd/system/nginx.service
[Unit]
Description=nginx – high performance web server
Documentation=http://nginx.org/en/docs/
After=network.target remote-fs.target nss-lookup.target
Wants=network.target
[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
[Install]
WantedBy=multi-user.target
# 6. 优化nginx服务启动
# systemctl edit nginx.service
[Service]
ExecStartPre=
# 7. 重新分析nginx服务启动时间
# systemd-analyze blame | grep nginx
1.234s nginx.service
# 8. 优化效果
# 优化前:1.543s
# 优化后:1.234s
# 优化效果:启动时间减少0.309s(20.0%)
4.3 启动分析故障排查与解决
4.3.1 启动时间过长
# 分析步骤:
# 1. 分析系统启动时间
# systemd-analyze
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 45.678s (userspace) = 52.257s
# 2. 分析服务启动时间
# systemd-analyze blame | head -10
45.678s graphical.target
42.345s NetworkManager-wait-online.service
35.678s plymouth-quit-wait.service
15.432s firewalld.service
12.345s tuned.service
8.765s auditd.service
5.432s NetworkManager.service
4.321s sssd.service
3.210s rsyslog.service
2.109s crond.service
# 3. 分析关键启动链
# systemd-analyze critical-chain | head -20
graphical.target @45.678s
└─multi-user.target @45.678s
└─nginx.service @43.135s +2.543s
└─network.target @43.134s
└─NetworkManager.service @40.581s +2.553s
└─network-pre.target @40.580s
└─firewalld.service @25.148s +15.432s
# 4. 识别问题
# NetworkManager-wait-online.service 耗时42.345秒,是主要问题
# 可能是网络配置问题导致等待超时
# 5. 检查网络配置
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
NAME=eth0
DEVICE=eth0
ONBOOT=yes
# 6. 禁用NetworkManager-wait-online.service
# systemctl disable NetworkManager-wait-online.service
Removed /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.
# 7. 重新分析启动时间
# systemd-analyze
Startup finished in 4.123s (kernel) + 2.456s (initrd) + 8.567s (userspace) = 15.146s
# 8. 优化效果
# 优化前:52.257s
# 优化后:15.146s
# 优化效果:启动时间减少37.111s(71.0%)
# 9. 预防措施
# – 检查网络配置
# – 禁用不必要的服务
# – 优化服务启动顺序
# – 定期分析启动时间
Part05-风哥经验总结与分享
5.1 启动分析经验总结
启动分析经验总结:
- 定期分析:定期进行启动分析
- 记录基线:记录启动时间基线
- 监控变化:监控启动时间变化
- 优化启动:优化系统启动时间
- 记录措施:记录优化措施
5.2 启动分析检查清单
启动分析检查清单:
- 分析前:确保系统空闲
- 分析时:多次测量取平均值
- 分析后:记录分析结果
- 优化时:逐一优化服务
- 验证时:对比优化效果
- 故障排查:分析服务启动时间、禁用不必要的服务
5.3 启动分析相关工具推荐
启动分析相关工具推荐:
- systemd-analyze:systemd启动分析工具
- dmesg:内核日志查看工具
- journalctl:systemd日志查看工具
- bootchart:启动图表生成工具
- pybootchartgui:启动图表查看工具
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
