内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。
风哥提示:
本文档详细介绍Linux网络监控和告警系统的配置方法。
Part01-网络监控工具
1.1 安装监控工具
$ sudo dnf install -y net-snmp net-snmp-utils
$ sudo dnf install -y nagios-plugins
$ sudo dnf install -y monitoring-plugins
# 安装Prometheus Node Exporter
$ sudo dnf install -y prometheus-node-exporter
$ sudo systemctl enable –now prometheus-node-exporter
# 查看Node Exporter状态
$ sudo systemctl status prometheus-node-exporter
● prometheus-node-exporter.service – Prometheus Node Exporter
Loaded: loaded (/usr/lib/systemd/system/prometheus-node-exporter.service; enabled; preset: disabled)
Active: active (running) since Thu 2026-04-03 21:25:00 CST; 10s ago
Main PID: 12345 (node_exporter)
Tasks: 4 (limit: 49152)
Memory: 5.2M
CPU: 15ms
CGroup: /system.slice/prometheus-node-exporter.service
└─12345 /usr/bin/node_exporter
# 访问Node Exporter指标
$ curl http://localhost:9100/metrics | head -20
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile=”0″} 0.000123
go_gc_duration_seconds{quantile=”0.25″} 0.000234
go_gc_duration_seconds{quantile=”0.5″} 0.000345
go_gc_duration_seconds{quantile=”0.75″} 0.000456
go_gc_duration_seconds{quantile=”1″} 0.001234
go_gc_duration_seconds_sum 0.012345
go_gc_duration_seconds_count 10
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
Part02-SNMP监控配置
2.1 配置SNMP服务
$ sudo tee /etc/snmp/snmpd.conf << EOF # 定义团体名 rocommuni学习交流加群风哥QQ113257174ty public 127.0.0.1 rocommunity public 192.168.1.0/24 # 定义系统信息 sysLocation "Data Center, Room 1" sysContact "admin@fgedu.net.cn" sysName "rhel10.fgedu.net.cn" # 定义OID视图 view systemview included .1.3.6.1.2.1.1 view systemview included .1.3.6.1.2.1.25.1.1 # 定义访问权限 access notConfigGroup "" any noauth exact systemview none none # 磁盘监控 disk / 10000 disk /var 5000 # 进程监控 proc httpd 10 1 proc sshd 10 1 # CPU负载监控 load 12 10 5 EOF # 启动SNMP服务 $ sudo systemctl enable --now snmpd # 测试SNMP $ snmpwalk -v 2c -c public localhost system SNMPv2-MIB::sysDescr.0 = STRING: Linux rhel10.fgedu.net.cn 5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 3 21:25:00 CST 2026 x86_64 SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (123456) 0:20:34.56 SNMPv2-MIB::sysContact.0 = STRING: admin@fgedu.net.cn SNMPv2-MIB::sysName.0 = STRING: rhel10.fgedu.net.cn SNMPv2-MIB::sysLocation.0 = STRING: Data Center, Room 1 # 查看磁盘信息 $ snmpwalk -v 2c -c public localhost dskTabl更多学习教程公众号风哥教程itpux_come UCD-SNMP-MIB::dskIndex.1 = INTEGER: 1 UCD-SNMP-MIB::dskPath.1 = STRING: / UCD-SNMP-MIB::dskDevice.1 = STRING: /dev/mapper/rhel-root UCD-SNMP-MIB::dskTotal.1 = INTEGER: 52428800 UCD-SNMP-MIB::dskAvail.1 = INTEGER: 10485760 UCD-SNMP-MIB::dskUsed.1 = INTEGER: 41943040 UCD-SNMP-MIB::dskPercent.1 = INTEGER: 80
Part03-网络流量监控
3.1 配置流量监控
$ sudo dnf install -y vnstat
# 初始化数据库
$ sudo vnstat –add -i eth0
# 启动vnstat服务
$ sudo systemctl enable –now vnstat
# 查看流量统计
$ vnstat
Database updated: 2026-04-03 21:30:00
eth0 since 2026-04-01
rx: 10.00 GiB tx: 5.00 GiB total: 15.00 GiB
monthly
rx | tx | total | avg. rate
————————+————-+————-+—————
2026-04 10.00 GiB | 5.00 GiB | 15.00 GiB | 5.00 Mbit/s
————————+————-+————-+—————
estimated 15.00 GiB | 7.50 GiB | 22.50 GiB |
daily
rx | tx | total | avg. rate
————————+————-+————-+—————
yesterday 5.00 GiB | 2.50 GiB | 7.50 GiB | 10.00 Mbit/s
today 5.00 GiB | 2.50 GiB | 7.50 GiB | 10.00 Mbit/s
————————+————-+————-+—————
estimated 7.50 GiB | 3.75 GiB | 11.25 GiB |
# 实时流量监控
$ vnstat -l
Monitoring eth0… (press CTRL-C to stop)
rx: 1.50 Mbit/s 250 p/s tx: 0.75 Mbit/s 125 p/s
# 安装iftop
$ sudo dnf install -y iftop
# 实时流量监控
$ sudo iftop -i eth0
interface: eth0
IP address is: 192.168.1.100
MAC address is: 08:00:27:12:34:56
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
# 安装nload
$ sudo dnf install -y nload
# 查看网络流量
$ nload eth0
Part04-告警配置
4.1 配置邮件告警
$ sudo dnf install -y postfix mailx
# 配置Postfix
$ sudo tee /etc/postfix/main.cf << EOF
myhostname = rhel10.fgedu.net.cn
mydomain = fgedu.net.cn
myorigin = \$mydomain
inet_interfaces = localhost
inet_protocols = ipv4
mydestination = \$myhostname, localhost.\$mydomain, localhost, \$mydomain
relayhost = [smtp.fgedu.net.cn]:587
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_tls_security_level = encrypt
EOF
# 配置SMTP认证
$ sudo tee /etc/postfix/sasl_passwd << EOF
[smtp.fgedu.net.cn]:587 user@fgedu.net.cn:password
EOF
$ sudo postmap /etc/postfix/sasl_passwd
$ sudo chmod 600 /etc/postfix/sasl_passwd*
# 启动Postfix
$ sudo systemctl enable --now postfix
# 测试邮件发送
$ echo "Test email" | mail -s "Test Subject" admin@fgedu.net.cn
# 创建网络监控脚本
$ cat > /usr/local/bin/network-monitor.sh << 'EOF'
#!/bin/bash
ALERT_EMAIL="admin@fgedu.net.cn"
PING_TARGET="8.8.8.8"
LOG_FILE="/var/log/network-monitor.log"
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}
send_alert() {
echo “$1” | mail -s “Network Alert: $2” $ALERT_EMAIL
}
check_ping() {
if ! ping -c 3 $PING_TARGET > /dev/null 2>&1; then
log “ERROR: Cannot ping $PING_TARGET”
send_alert “Cannot ping $PING_TARGET” “Ping Failed”
fi
}
check_dns() {
if ! nslookup www.google.com > /dev/null 2>&1; then
log “ERROR: DNS resolution failed”
send_alert “DNS resolution failed” “DNS Failed”
fi
}
check_bandwidth() {
RX_RATE=$(cat /proc/net/dev | grep eth0 | awk ‘{print $2}’)
TX_RATE=$(cat /proc/net/dev | grep eth0 | awk ‘{print $10}’)
if [ $RX_RATE -gt 1000000000 ] || [ $TX_RATE -gt 1000000000 ]; then
log “WARNING: High network traffic detected”
send_alert “High network traffic: RX=$RX_RATE, TX=$TX_RATE” “High Traffic”
fi
}
log “Starting network monitor”
check_ping
check_dns
check_bandwidth
log “Network monitor completed”
EOF
chmod +x /usr/local/bin/network-monitor.sh
# 配置定时任务
$ sudo tee /etc/cron.d/network-monitor << EOF
*/5 * * * * root /usr/local/bin/network-monitor.sh
EOF
Part05-监控仪表板
5.1 配置监控仪表板
$ sudo tee /etc/yum.repos.d/grafana.repo << EOF [grafana] name=grafana baseurl=https://packages.grafana.com/oss/rpm repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://packages.grafana.com/gpg.key sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crt EOF $ sudo dnf install -y grafana # 启动Grafana $ sudo systemctl enable --now grafana-server # 查看Grafana状态 $ sudo systemctl status grafana-server ● grafana-server.service - Grafana instance Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: disabled) Active: active (running) since Thu 2026-04-03 21:35:00 CST; 10s ago Main PID: 12346 (grafana-server) Tasks: 10 (limit: 49152) Memory: 50.5M CPU: 500ms CGroup: /system.slice/grafana-server.from PG视频:www.itpux.comservice └─12346 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini # 访问Grafana # 浏览器访问 http://localhost:3000 # 默认用户名: admin # 默认密码: admin # 创建监控脚本 $ cat > /usr/local/bin/network-stats.sh << 'EOF' #!/bin/bash echo "=== Network Statistics ===" echo "Date: $(date)" echo "" echo "=== Interface Statistics ===" ip -s link show eth0 echo "" echo "=== Connection Statistics ===" ss -s echo "" echo "=== Traffic Statistics ===" vnstat --oneline echo "" echo "=== Active Connections ===" ss -tunap | head -20 echo "" echo "=== Network Errors ===" cat /proc/net/dev | grep eth0 echo "" EOF chmod +x /usr/local/bin/network-stats.sh # 执行监控脚本 $ sudo /usr/local/bin/network-stats.sh === Network Statistics === Date: Thu Apr 3 21:35:30 CST 2026 === Interface Statistics === 2: eth0:
link/ether 08:00:27:12:34:56 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
10.0M 10000 0 0 0 0
TX: bytes packets errors dropped carrier collsns
5.0M 5000 0 0 0 0
…
1. 部署完整的监控系统
2. 配置关键指标告警
3. 定期检查监控数据
4. 建立监控基线
5. 制定故障响应流程
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
