1. 监控与告警概述
监控与告警是NBU备份系统管理的重要组成部分,通过有效的监控可以及时发现和解决问题,确保备份系统的正常运行。本章节将详细介绍NBU的监控工具、监控方法和告警配置。更多学习教程www.fgedu.net.cn
# /usr/openv/netbackup/bin/admincmd/bpgetconfig | grep -i monitor
MONITOR_ENABLED = YES
ALERT_ENABLED = YES
REPORTING_ENABLED = YES
2. 监控工具
NBU提供了多种监控工具,包括命令行工具、图形界面工具和API接口。
2.1 命令行工具
# /usr/openv/netbackup/bin/admincmd/bpdbjobs
Job ID Type State Status Client Policy Schedule
——- ———- ——– ————— ————– ————— —————
12345 Backup Done Successful client1 FULL_BACKUP Full
12346 Backup Done Failed client2 INCR_BACKUP Differential
12347 Restore Done Successful client3 FULL_BACKUP Full
12348 Duplication Done Successful client1 FULL_BACKUP Full
# 查看服务器状态
# /usr/openv/netbackup/bin/admincmd/bpps
NB Processes
———–
root 12345 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bprd
root 12346 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpcd
root 12347 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/nbfsd
root 12348 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/vnetd
root 12349 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpjava-msvc
root 12350 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpdbm
2.2 图形界面工具
- NetBackup Administration Console:NBU的图形管理界面,提供全面的监控和管理功能
- NetBackup OpsCenter:企业级监控和报告工具,提供更高级的监控和分析功能
2.3 API接口
# curl -k -u admin:password https://master_server:1556/api/v1/jobs
{
“data”: [
{
“id”: “12345”,
“type”: “Backup”,
“state”: “Done”,
“status”: “Successful”,
“clientName”: “client1”,
“policyName”: “FULL_BACKUP”,
“scheduleName”: “Full”,
“startTime”: “2026-04-02T20:00:00Z”,
“endTime”: “2026-04-02T20:30:00Z”
},
{
“id”: “12346”,
“type”: “Backup”,
“state”: “Done”,
“status”: “Failed”,
“clientName”: “client2”,
“policyName”: “INCR_BACKUP”,
“scheduleName”: “Differential”,
“startTime”: “2026-04-02T21:00:00Z”,
“endTime”: “2026-04-02T21:15:00Z”
}
],
“links”: {
“self”: “https://master_server:1556/api/v1/jobs”,
“next”: “https://master_server:1556/api/v1/jobs?page=2”
},
“meta”: {
“page”: 1,
“pageSize”: 10,
“totalCount”: 100
}
}
3. 作业监控
作业监控是NBU监控的核心,通过监控备份作业的状态和进度,可以及时发现和解决问题。
3.1 作业状态监控
# /usr/openv/netbackup/bin/admincmd/bpdbjobs -active
Job ID Type State Status Client Policy Schedule
——- ———- ——– ————— ————– ————— —————
12345 Backup Active In Progress client1 FULL_BACKUP Full
12346 Backup Active In Progress client2 INCR_BACKUP Differential
# 查看失败作业
# /usr/openv/netbackup/bin/admincmd/bpdbjobs -failed -hours 24
Job ID Type State Status Client Policy Schedule
——- ———- ——– ————— ————– ————— —————
12347 Backup Done Failed client3 FULL_BACKUP Full
# 查看作业详细信息
# /usr/openv/netbackup/bin/admincmd/bpjobinfo -jobid 12347 -details
Job ID: 12347
Job Type: Backup
State: Done
Status: Failed
Client: client3
Policy: FULL_BACKUP
Schedule: Full
Start Time: 04/02/2026 22:00:00
End Time: 04/02/2026 22:05:00
Status Code: 2
Status Message: the requested operation was unsuccessful
3.2 作业进度监控
# /usr/openv/netbackup/bin/admincmd/bpjobinfo -jobid 12345 -progress
Job ID: 12345
Progress: 50%
Bytes Transferred: 512000000
Rate: 5702222.22 bytes/sec
# 查看作业详细进度
# /usr/openv/netbackup/bin/admincmd/bpjobinfo -jobid 12345 -detail
Job ID: 12345
Job Type: Backup
State: Active
Status: In Progress
Client: client1
Policy: FULL_BACKUP
Schedule: Full
Start Time: 04/02/2026 20:00:00
Current File: /data/file100.txt
Files Processed: 100 of 200
Bytes Processed: 512000000 of 1024000000
4. 服务器监控
服务器监控包括主服务器、媒体服务器和客户端的状态监控。
4.1 主服务器监控
# /usr/openv/netbackup/bin/admincmd/bpps
NB Processes
———–
root 12345 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bprd
root 12346 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpcd
root 12347 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/nbfsd
root 12348 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/vnetd
root 12349 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpjava-msvc
root 12350 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpdbm
# 检查数据库状态
# /usr/openv/netbackup/bin/nbdb_ping
Database [NBDB] is alive and well on server master_server.
Database [BMRDB] is alive and well on server master_server.
4.2 媒体服务器监控
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -gethost -machinename media1
Host Name: media1
Host Type: MEDIA_SERVER
State: UP
# 检查媒体服务器进程
# ssh media1 /usr/openv/netbackup/bin/bpps
NB Processes
———–
root 12345 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpcd
root 12346 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/vnetd
root 12347 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/nbrmms
4.3 客户端监控
# /usr/openv/netbackup/bin/admincmd/bpclntcmd -pn -client client1
Expecting response from server master_server
client1.fgedu.net.cn client1 192.168.1.50:1556
# 检查客户端服务状态
# ssh client1 /usr/openv/netbackup/bin/bpps
NB Processes
———–
root 12345 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/bpcd
root 12346 1 0 10:00 ? 00:00:00 /usr/openv/netbackup/bin/vnetd
5. 存储监控
存储监控包括存储单元、磁盘池和存储设备的状态监控。
5.1 存储单元监控
# /usr/openv/netbackup/bin/admincmd/nbdevquery -liststs -U
Storage Server Name: storage1
Storage Type: PureDisk
Media Server Name: media1
State: UP
Storage Server Name: cloud_storage
Storage Type: Cloud
Media Server Name: media1
State: UP
# 检查存储单元详细信息
# /usr/openv/netbackup/bin/admincmd/nbdevconfig -getconfig -storage_server storage1 -stype PureDisk
Storage Server: storage1
Storage Type: PureDisk
Media Server: media1
Connection String: storage1:9090
User Name: admin
Password: ******
Disk Pool: DP1
5.2 磁盘池监控
# /usr/openv/netbackup/bin/admincmd/nbdevquery -listdp -U
Disk Pool Name: DP1
Storage Server: storage1
Storage Type: PureDisk
State: UP
Capacity: 10000 GB
Free Space: 8000 GB
Used Space: 2000 GB
Disk Pool Name: DP_CLOUD
Storage Server: cloud_storage
Storage Type: Cloud
State: UP
Capacity: 50000 GB
Free Space: 45000 GB
Used Space: 5000 GB
5.3 存储设备监控
# /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -U
Disk Volume Name: DV1
Disk Pool Name: DP1
Status: UP
Capacity: 10000 GB
Free Space: 8000 GB
Used Space: 2000 GB
# 检查存储设备详细信息
# /usr/openv/netbackup/bin/admincmd/nbdevconfig -getconfig -diskvolume DV1 -diskpool DP1
Disk Volume: DV1
Disk Pool: DP1
Status: UP
Path: /storage/disk1
Capacity: 10000 GB
Free Space: 8000 GB
6. 告警配置
告警配置可以及时通知管理员备份系统的异常情况,确保问题得到及时处理。
6.1 邮件告警配置
# /usr/openv/netbackup/bin/admincmd/nbmailcmd -add -recipient admin@fgedu.net.cn -subject “NBU Backup Alert”
# 验证邮件告警配置
# /usr/openv/netbackup/bin/admincmd/nbmailcmd -list
Recipient: admin@fgedu.net.cn
Subject: NBU Backup Alert
# 测试邮件告警
# /usr/openv/netbackup/bin/admincmd/nbmailcmd -send -recipient admin@fgedu.net.cn -subject “Test Alert” -message “This is a test alert”
6.2 SNMP告警配置
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -setsnmp -snmp enabled -trapdest 192.168.1.100 -community public
# 验证SNMP告警配置
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -getsnmp
SNMP: enabled
Trap Destination: 192.168.1.100
Community: public
6.3 告警阈值配置
# /usr/openv/netbackup/bin/admincmd/nbdevconfig -setconfig -diskpool DP1 -property “alert_threshold=80”
# 验证告警阈值配置
# /usr/openv/netbackup/bin/admincmd/nbdevconfig -getconfig -diskpool DP1 -property “alert_threshold”
alert_threshold=80
# 配置作业失败告警
# /usr/openv/netbackup/bin/bpsetconfig “ALERT_ON_JOB_FAILURE = YES”
# 验证作业失败告警配置
# /usr/openv/netbackup/bin/bpgetconfig | grep -i alert
ALERT_ON_JOB_FAILURE = YES
7. 报表生成
报表生成可以提供备份系统的运行状态和趋势分析,帮助管理员了解系统的整体情况。
7.1 作业报表
# /usr/openv/netbackup/bin/admincmd/bpreport -summary -hours 168
Summary of backup jobs for the last 168 hours:
Total jobs: 100
Successful: 95
Failed: 5
Partially successful: 0
In progress: 0
Average throughput: 12.5 MB/sec
Average backup time: 25 minutes
# 生成详细作业报表
# /usr/openv/netbackup/bin/admincmd/bpreport -detail -hours 24
Job ID Type Client Policy Schedule Start Time End Time MB KB/sec Status
——- ———- ————– ————— ————— —————– —————– ——- ——– ——
12345 Backup client1 FULL_BACKUP Full 04/02/2026 20:00:00 04/02/2026 20:30:00 2500 14000 Successful
12346 Backup client2 INCR_BACKUP Differential 04/02/2026 21:00:00 04/02/2026 21:15:00 800 8900 Successful
12347 Backup client3 FULL_BACKUP Full 04/02/2026 22:00:00 04/02/2026 22:05:00 0 0 Failed
7.2 存储报表
# /usr/openv/netbackup/bin/admincmd/nbdevquery -listdp -U
Disk Pool Name: DP1
Storage Server: storage1
Storage Type: PureDisk
State: UP
Capacity: 10000 GB
Free Space: 8000 GB
Used Space: 2000 GB
Deduplication Ratio: 3.5:1
Compression Ratio: 1.2:1
Disk Pool Name: DP_CLOUD
Storage Server: cloud_storage
Storage Type: Cloud
State: UP
Capacity: 50000 GB
Free Space: 45000 GB
Used Space: 5000 GB
7.3 自定义报表
# 登录OpsCenter Web界面
# 导航到Reports > Create Report
# 选择报表类型:Backup Job Summary
# 配置报表参数:时间范围、客户端、策略等
# 生成报表并导出为PDF或CSV格式
8. 监控与告警最佳实践
遵循以下最佳实践,可以确保NBU备份系统的监控效果和告警的及时性。更多学习教程公众号风哥教程itpux_com
8.1 监控最佳实践
- 建立全面的监控体系,覆盖作业、服务器和存储等各个方面
- 设置合理的监控频率,确保及时发现问题
- 使用多种监控工具,互为补充
- 建立监控基线,了解系统的正常状态
- 定期分析监控数据,识别潜在问题
8.2 告警最佳实践
- 设置合理的告警阈值,避免过多的误告警
- 配置多种告警方式,确保告警能够及时送达
- 建立告警分级机制,区分严重程度
- 制定告警响应流程,确保问题得到及时处理
- 定期测试告警系统,确保其正常工作
8.3 报表最佳实践
- 定期生成报表,了解系统的运行状态
- 分析报表数据,识别趋势和问题
- 使用图表和图形化展示,提高报表的可读性
- 将报表与业务需求相结合,评估备份系统的有效性
- 建立报表归档机制,保留历史数据
# 1. 配置邮件告警
# /usr/openv/netbackup/bin/admincmd/nbmailcmd -add -recipient admin@fgedu.net.cn -subject “NBU Backup Alert”
# 2. 配置SNMP告警
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -setsnmp -snmp enabled -trapdest 192.168.1.100 -community public
# 3. 配置存储告警阈值
# /usr/openv/netbackup/bin/admincmd/nbdevconfig -setconfig -diskpool DP1 -property “alert_threshold=80”
# 4. 配置作业失败告警
# /usr/openv/netbackup/bin/bpsetconfig “ALERT_ON_JOB_FAILURE = YES”
# 5. 生成每日作业报表
# /usr/openv/netbackup/bin/admincmd/bpreport -summary -hours 24 > /var/log/nbu_daily_report.txt
# 6. 生成每周存储报表
# /usr/openv/netbackup/bin/admincmd/nbdevquery -listdp -U > /var/log/nbu_weekly_storage_report.txt
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
