1. 首页 > Linux教程 > 正文

Linux教程FG356-存储监控与分析

内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。

风哥提示:

本文档介绍存储监控与分析的方法和工具。

Part01-I/O监控工具

1.1 使用iostat

# 安装sysstat
[root@server ~]# dnf install -y sysstat

# 查看所有设备I/O统计
[root@server ~]# iostat
Linux 5.14.0-162.6.1.el9_1.x86_64 (server.fgedu.net.cn) 04/04/2026 _x86_64_ (4 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 0.50 0.10 0.00 98.40

Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
sda 10.00 200.00 100.00 0.00 200000 100000 0
sdb 5.00 100.00 50.00 0.更多学习教程公众号风哥教程itpux_com00 100000 50000 0

# 查看扩展统计
[root@server ~]# iostat -x
Linux 5.14.0-162.6.1.el9_1.x86_64 (server.fgedu.net.cn) 04/04/2026 _x86_64_ (4 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 0.50 0.10 0.00 98.40

Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 100.00 50.00 200.00 100.00 0.00 0.00 0.00 0.00 0.50 1.00 0.10 2.00 2.00 0.50 7.50
sdb 50.00 25.00 100.00 50.00 0.00 0.00 0.00 0.00 0.50 1.00 0.05 2.00 2.00 0.50 3.75

# 持续监控
[root@server ~]# iostat -x 2 5
Linux 5.14.0-162.6.1.el9_1.x86_64 (server.fgedu.net.cn) 04/04/2026 _x86_64_ (4 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 0.50 0.10 0.00 98.40

Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 100.00 50.00 200.00 100.00 0.00 0.00 0.00 0.00 0.50 1.00 0.10 2.00 2.00 0.50 7.50
sdb 50.00 25.00 100.00 50.00 0.00 0.00 0.00 0.00 0.50 1.00 0.05 2.00 2.00 0.50 3.75

# 只监控特定设备
[root@server ~]# iostat -x sda 2 5

# 指标说明
[root@server ~]# cat > /root/iostat-metrics.txt << 'EOF' iostat指标说明 ============== 基本指标: - tps:每秒传输次数 - kB_read/s:每秒读取KB数 - kB_wrtn/s:每秒写入KB数 扩展指标: - r/s:每秒读请求数 - w/s:每秒写请求数 - rkB/s:每秒读取KB数 - wkB/s:每秒写入KB数 - rrqm/s:每秒合并读请求数 - wrqm/s:每秒合并写请求数 - %rrqm:合并读请求百分比 - %wrqm:合并写请求百分比 - r_await:读请求平均等待时间(ms) - w_await:写请求平均等待时间(ms) - aqu-sz:平均队列长度 - rareq-sz:平均读请求大小(KB) - wareq-sz:平均写请求大小(KB) - svctm:平均服务时间(ms) - %util:设备利用率 EOF

1.2 使用iotop

# 安装iotop
[root@server ~]# dnf install -y iotop

# 交互模式
[root@server ~]# iotop
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % systemd –system –deserialize 17
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_gp]
4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_par_gp]

# 只显示有I/O的进程
[root@server ~]# iotop -o
Total DISK READ: 0.00 B/s | Total DISK WRITE: 100.00 K/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 50.00 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
12345 be/4 root 0.00 B/s 100.00 K/s 0.00 % 5.00 % dd if=/dev/zero of=/data/test.img bs=1M count=1000

# 非交互模式
[root@server ~]# iotop -b -n 3 -d 2
Total DISK READ: 0.00 B/s | Total DISK WRITE: 100.00 K/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 50.00 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
12345 be/4 root 0.00 B/s 100.00 K/s 0.00 % 5.00 % dd if=/dev/zero of=/data/test.更多视频教程www.fgedu.net.cnimg bs=1M count=1000

# 只显示特定用户
[root@server ~]# iotop -u root

# 只显示特定进程
[root@server ~]# iotop -p 12345

Part02-磁盘健康监控

2.1 使用smartctl

# 安装smartmontools
[root@server ~]# dnf install -y smartmontools

# 查看磁盘信息
[root@server ~]# smartctl -i /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.14.0-162.6.1.el9_1.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 2.5
Device Model: ST1000LM024 HN-M101MBB
Serial Number: S123456789
LU WWN Device Id: 5 0004cf 123456789
Firmware Version: 0001
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Fri Apr 4 22:00:00 2026 CST

# 查看健康状态
[root@server ~]# smartctl -H /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.14.0-162.6.1.el9_1.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

# 查看SMART属性
[root@server ~]# smartctl -A /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64学习交流加群风哥QQ113257174-linux-5.14.0-162.6.1.el9_1.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always – 0
2 Throughput_Performance 0x0004 147 147 054 Old_age Offline – 80
3 Spin_Up_Time 0x0007 100 100 011 Pre-fail Always – 0
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always – 5
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always – 0
7 Seek_Error_Rate 0x000f 252 252 051 Pre-fail Always – 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline – 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always – 1234
10 Spin_Retry_Count 0x0033 252 252 051 Pre-fail Always – 0
11 Calibration_Retry_Ct 0x0012 100 100 000 Old_age Always – 0
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always – 5
191 G-Sense_Error_Rate 0x0012 100 100 000 Old_age Always – 0
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always – 4
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always – 10
194 Temperature_Celsius 0x0002 062 062 000 Old_age Always – 38 (Min/Max 20/45)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always – 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always – 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
198 Offline_Uncorrectable 0x0012 100 100 000 Old_age Always – 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always – 0

# 运行自检
[root@server ~]# smartctl -t short /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.14.0-162.6.1.el9_1.x86_64] (local build)
Copyright (C) 2002-22, Bruce Al学习交流加群风哥微信: itpux-comlen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
sending command: “Execute SMART Short self-test routine immediately in off-line mode”.
Drive command “Execute SMART Short self-test routine immediately in off-line mode” successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Apr 4 22:05:00 2026

# 查看自检结果
[root@server ~]# smartctl -l selftest /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.14.0-162.6.1.el9_1.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5 –

2.2 配置定期健康检查

# 启用smartd服务
[root@server ~]# systemctl enable –now smartd
Created symlink /etc/systemd/system/multi-user.target.wants/smartd.service → /usr/lib/systemd/system/smartd.service.

# 配置smartd
[root@server ~]# cat > /etc/smartd.conf << 'EOF' # 监控所有磁盘 DEVICESCAN -a -o on -S on -m root@localhost # 监控特定磁盘 /dev/sda -a -o on -S on -m root@localhost /dev/sdb -a -o on -S on -m root@localhost # 定期自检 /dev/sda -a -s (S/../.././02|L/../../6/03) /dev/sdb -a -s (S/../.././02|L/../../6/03) EOF # 重启服务 [root@server ~]# systemctl restart smartd # 创建健康检查脚本 [root@server ~]# cat > /usr/from PG视频:www.itpux.comlocal/bin/disk-health-check.sh << 'EOF' #!/bin/bash LOG_FILE="/var/log/disk-health.log" ALERT_EMAIL="root@localhost" for disk in $(ls /dev/sd? 2>/dev/null); do
status=$(smartctl -H $disk | grep “SMART overall-health” | awk ‘{print $6}’)
if [ “$status” != “PASSED” ]; then
echo “$(date): WARNING – $disk health check failed: $status” >> $LOG_FILE
echo “$(date): $disk health check failed: $status” | mail -s “Disk Health Alert” $ALERT_EMAIL
else
echo “$(date): $disk health check passed” >> $LOG_FILE
fi
done
EOF

[root@server ~]# chmod +x /usr/local/bin/disk-health-check.sh

# 配置定时任务
[root@server ~]# echo “0 6 * * * /usr/local/bin/disk-health-check.sh” | crontab –

风哥针对存储监控建议:

  • 定期检查磁盘健康状态
  • 监控I/O性能指标
  • 设置告警阈值
  • 保留历史监控数据
  • 定期进行自检

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息