WebSphere教程FG015-WebSphere性能监控工具使用与实战

本文档风哥主要介绍WebSphere Application Server 9.0.5的性能监控工具，包括管理控制台监控、PMI性能监控、Tivoli Performance Viewer、自定义监控脚本等内容，风哥教程参考WebSphere官方文档性能监控章节，适合WebSphere管理员在学习和测试中使用，如果要应用于生产环境则需要自行确认。更多视频教程www.fgedu.net.cn

Part01-基础概念与理论知识

1.1 WebSphere性能监控概述

WebSphere性能监控是确保应用服务器稳定运行的重要手段，通过监控可以及时发现性能问题并进行优化。学习交流加群风哥微信: itpux-com

性能监控目标：

性能基线：建立性能基准
问题发现：及时发现性能问题
容量规划：支持容量规划决策
优化依据：提供性能优化依据

1.1.1 性能监控层次

# WebSphere性能监控层次

1. 系统层监控
– CPU使用率
– 内存使用率
– 磁盘I/O
– 网络I/O

2. JVM层监控
– 堆内存使用
– GC频率和时间
– 线程数量
– 类加载数量

3. 应用服务器层监控
– 请求处理时间
– 请求吞吐量
– 会话数量
– 连接池状态

4. 应用层监控
– 应用响应时间
– 应用错误率
– 业务指标

# 性能监控架构

┌─────────────────────────────────────────────────────────┐
│ 监控展示层 │
│ 管理控制台 / Tivoli Performance Viewer │
└─────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 监控数据层 │
│ PMI / JMX / SNMP │
└─────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 数据采集层 │
│ 系统监控 / JVM监控 / 应用监控 │
└─────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 监控目标层 │
│ WebSphere服务器 / JVM / 应用 │
└─────────────────────────────────────────────────────────┘

# 监控数据类型

类型描述
──────────────────────────────────────────────────────
计数器（Counter）累计值，如请求总数
计量器（Gauge）当前值，如当前连接数
直方图（Histogram）分布统计，如响应时间分布
计时器（Timer）时间统计，如请求处理时间

1.2 WebSphere监控工具介绍

WebSphere监控工具：

1.2.1 内置监控工具

# WebSphere内置监控工具

1. 管理控制台
– 实时监控
– 性能监控器
– 请求指标

访问路径：
https://fgedu.net.cn:9043/ibm/console
> 监控和调整 > 性能监控器

2. PMI（Performance Monitoring Infrastructure）
– 性能数据收集
– 多种监控级别
– 可扩展架构

配置路径：
服务器 > server1 > 性能监控基础设施

3. Tivoli Performance Viewer（TPV）
– 图形化监控
– 实时数据展示
– 历史数据分析

启动方式：
管理控制台 > 监控和调整 > 性能查看器

4. 请求指标
– 请求级别监控
– 响应时间分析
– 组件调用追踪

配置路径：
服务器 > server1 > 请求指标

# 监控工具对比

工具优点缺点
──────────────────────────────────────────────────────
管理控制台集成度高、易使用功能有限
PMI 数据丰富、可扩展配置复杂
TPV 图形化、实时监控资源消耗
请求指标精细监控性能影响

1.3 WebSphere性能指标

WebSphere性能指标：

1.3.1 核心性能指标

# WebSphere核心性能指标

1. JVM指标
指标名称描述告警阈值
──────────────────────────────────────────────────────
HeapMemoryUsage 堆内存使用率 > 80%
UsedHeapSize 已用堆大小 –
FreeHeapSize 空闲堆大小 –
GCTime GC累计时间 –
GCCount GC次数 –
ThreadCount 线程数 > 500
LoadedClassCount 已加载类数 –

2. 线程池指标
指标名称描述告警阈值
──────────────────────────────────────────────────────
PoolSize 线程池大小 –
ActiveThreads 活动线程数 > 80%
MaxThreads 最大线程数 –
CreateThreadCount 创建线程数 –
DestroyThreadCount 销毁线程数 –

3. 连接池指标
指标名称描述告警阈值
──────────────────────────────────────────────────────
PoolSize 连接池大小 –
FreeConnections 空闲连接数 < 10% InUseConnections 使用中连接数 > 80%
WaitTime 等待时间 > 1000ms
CreateCount 创建连接数 –
DestroyCount 销毁连接数 –

4. Web容器指标
指标名称描述告警阈值
──────────────────────────────────────────────────────
RequestCount 请求数 –
ResponseTime 响应时间 > 2000ms
ErrorCount 错误数 > 1%
ActiveRequests 活动请求数 –
Sessions 会话数 –

5. EJB容器指标
指标名称描述告警阈值
──────────────────────────────────────────────────────
BeanCount Bean数量 –
ActiveMethodCount 活动方法数 –
MethodResponseTime 方法响应时间 > 1000ms
MethodErrorCount 方法错误数 –

# 性能指标单位

类型单位
──────────────────────────────────────────────────────
内存 Bytes / MB / GB
时间 Milliseconds / Seconds
计数 Count
比率 Percentage

1.4 WebSphere监控架构

WebSphere监控架构：

1.4.1 PMI架构

# WebSphere PMI架构

┌─────────────────────────────────────────────────────────┐
│ PMI客户端 │
│ TPV / 管理控制台 / 自定义客户端 │
└─────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ PMI数据接口 │
│ JMX / RMI / REST │
└─────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ PMI服务 │
│ 数据聚合 / 数据过滤 / 数据存储 │
└─────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ PMI数据收集器 │
│ JVM / 线程池 / 连接池 / Web容器 / EJB容器 │
└─────────────────────────────────────────────────────────┘

# PMI监控级别

级别描述性能影响
──────────────────────────────────────────────────────
None 不收集无
Low 基本指标低
Medium 标准指标中
High 详细指标高
Maximum 全部指标很高

# 推荐配置
生产环境：Low或Medium
测试环境：Medium或High
诊断环境：High或Maximum

风哥提示：WebSphere性能监控需要平衡监控粒度和性能开销，生产环境建议使用Low或Medium级别，避免过度监控影响应用性能。

Part02-生产环境规划与建议

2.1 性能监控规划

性能监控规划需要考虑多个因素：

2.1.1 监控规划要素

# 性能监控规划要素

1. 监控目标
– 系统资源监控
– JVM监控
– 应用服务器监控
– 应用监控

2. 监控频率
– 实时监控：关键指标
– 定时监控：常规指标
– 历史分析：趋势指标

3. 数据保留
– 实时数据：1天
– 短期数据：7天
– 中期数据：30天
– 长期数据：1年

4. 告警策略
– 告警阈值
– 告警方式
– 告警升级
– 告警处理

# 监控规划模板

监控项目：WebSphere生产环境监控
监控范围：
– 服务器：fgeduNode01, fgeduNode02
– 应用：fgedu-app

监控指标：
– JVM：堆内存、GC、线程
– 线程池：WebContainer、ORB
– 连接池：jdbc/fgedudb
– Web容器：请求、响应时间

监控频率：
– 实时：每5秒
– 历史：每1分钟

告警配置：
– 堆内存 > 80%：警告
– 堆内存 > 90%：严重
– 响应时间 > 2秒：警告
– 错误率 > 1%：严重

2.2 告警机制规划

告警机制规划：

2.2.1 告警配置

# 告警机制规划

1. 告警级别
级别描述处理时间
──────────────────────────────────────────────────────
Critical 严重故障立即处理
Major 主要问题 1小时内
Minor 次要问题 4小时内
Warning 警告信息 24小时内
Info 信息通知不处理

2. 告警阈值配置
指标 Warning Major Critical
──────────────────────────────────────────────────────
CPU使用率 70% 85% 95%
内存使用率 75% 85% 95%
堆内存使用率 80% 90% 95%
响应时间(ms) 1000 2000 5000
错误率 0.5% 1% 5%
连接池使用率 70% 85% 95%
线程池使用率 70% 85% 95%

3. 告警通知方式
– 邮件通知
– 短信通知
– 企业微信/钉钉
– SNMP Trap

4. 告警升级策略
– 15分钟未处理：升级到上级
– 30分钟未处理：升级到经理
– 1小时未处理：升级到总监

# 告警脚本示例
#!/bin/bash
# alert.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

ALERT_LEVEL=$1
ALERT_MSG=$2
ALERT_EMAIL=”admin@fgedu.net.cn”

send_email() {
echo “$ALERT_MSG” | mail -s “WebSphere告警[$ALERT_LEVEL]” $ALERT_EMAIL
}

send_sms() {
# 发送短信通知
curl -X POST “http://sms.fgedu.net.cn/send” \
-d “phone=13800138000&msg=$ALERT_MSG”
}

case $ALERT_LEVEL in
Critical)
send_email
send_sms
;;
Major)
send_email
;;
Minor|Warning)
send_email
;;
esac

2.3 监控数据存储

监控数据存储规划：

2.3.1 存储方案

# 监控数据存储方案

1. 本地存储
路径：/WebSphere/app/profiles/Dmgr01/pmi
格式：二进制文件
保留：7天

2. 数据库存储
数据库：fgedudb
表：WAS_METRICS, WAS_ALERTS
保留：30天

3. 时序数据库
数据库：InfluxDB / Prometheus
保留：1年
压缩：自动

# 数据库表设计

CREATE TABLE WAS_METRICS (
ID NUMBER PRIMARY KEY,
SERVER_NAME VARCHAR2(100),
METRIC_NAME VARCHAR2(200),
METRIC_VALUE NUMBER,
COLLECT_TIME TIMESTAMP,
CREATE_TIME TIMESTAMP DEFAULT SYSTIMESTAMP
);

CREATE TABLE WAS_ALERTS (
ID NUMBER PRIMARY KEY,
SERVER_NAME VARCHAR2(100),
ALERT_LEVEL VARCHAR2(20),
ALERT_MSG VARCHAR2(1000),
ALERT_TIME TIMESTAMP,
STATUS VARCHAR2(20),
HANDLE_TIME TIMESTAMP,
HANDLE_BY VARCHAR2(100)
);

# 数据归档策略

原始数据保留：7天
聚合数据保留：30天
趋势数据保留：1年

归档脚本：
#!/bin/bash
# archive_metrics.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

# 归档30天前的数据
sqlplus fgedu/fgedu123@fgedudb << EOF INSERT INTO WAS_METRICS_ARCHIVE SELECT * FROM WAS_METRICS WHERE COLLECT_TIME < SYSDATE - 30; DELETE FROM WAS_METRICS WHERE COLLECT_TIME < SYSDATE - 30; COMMIT; EOF

2.4 监控系统集成

监控系统集成规划：

2.4.1 集成方案

# 监控系统集成方案

1. 与企业监控系统集成
系统：Zabbix / Nagios / Prometheus
方式：JMX / SNMP / REST API

2. 与日志系统集成
系统：ELK / Splunk
方式：日志文件采集

3. 与APM系统集成
系统：Dynatrace / AppDynamics
方式：Agent注入

# Zabbix集成配置

1. WebSphere JMX配置
管理控制台 > 服务器 > server1 > 性能监控基础设施
> 启用PMI服务
> 监控级别：Medium

2. Zabbix配置
# zabbix_agentd.conf
UserParameter=was.heap.used,/WebSphere/scripts/check_heap.sh used
UserParameter=was.heap.max,/WebSphere/scripts/check_heap.sh max
UserParameter=was.thread.count,/WebSphere/scripts/check_thread.sh

3. 监控脚本
#!/bin/bash
# check_heap.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

TYPE=$1

/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-c “jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
print heap.get(‘$TYPE’)” 2>/dev/null

# Prometheus集成配置

1. JMX Exporter配置
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.17.0/jmx_prometheus_javaagent-0.17.0.jar

2. WebSphere启动参数
管理控制台 > 服务器 > server1 > 进程定义 > Java虚拟机
> 通用JVM参数：
-javaagent:/WebSphere/tools/jmx_prometheus_javaagent.jar=9080:/WebSphere/config/jmx_exporter.yaml

3. jmx_exporter.yaml
rules:
– pattern: “WebSphere<(.+)->(.+)>::(.+)”
name: websphere_$1_$2_$3
type: GAUGE

Part03-生产环境项目实施方案

3.1 管理控制台监控实战

管理控制台监控操作：

3.1.1 启用PMI监控

# 启用PMI监控

1. 登录管理控制台
https://fgedu.net.cn:9043/ibm/console
用户名：fgeduadmin
密码：fgedu123

2. 启用PMI
导航：服务器 > 服务器类型 > WebSphere Application Server
> server1 > 性能监控基础设施(PMI)

配置：
[x] 启用性能监控基础设施
监控级别：Medium
统计集合：
[x] JVM运行时
[x] 线程池
[x] 数据库连接池
[x] Web容器
[x] 会话管理

3. 保存配置
点击”确定” > “保存”

4. 重启服务器
/WebSphere/app/profiles/AppSrv01/bin/stopServer.sh server1 \
-username fgeduadmin -password fgedu123

/WebSphere/app/profiles/AppSrv01/bin/startServer.sh server1

# 验证PMI状态
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123

# 检查PMI是否启用
perf = AdminControl.queryNames(‘type=Perf,process=server1,*’)
print AdminControl.getAttribute(perf, ‘statSet’)

Medium

3.1.2 查看性能监控器

# 查看性能监控器

1. 访问性能监控器
管理控制台 > 监控和调整 > 性能监控器

2. 选择监控目标
服务器：server1
节点：fgeduNode01

3. 查看监控数据
┌─────────────────────────────────────────────────────────┐
│ 性能监控器 │
├─────────────────────────────────────────────────────────┤
│ JVM运行时 │
│ 堆内存使用：1024 MB / 2048 MB (50%) │
│ 已用堆：512 MB │
│ 空闲堆：512 MB │
│ GC次数：1250 │
│ GC时间：3500 ms │
├─────────────────────────────────────────────────────────┤
│ 线程池 │
│ WebContainer │
│ 活动线程：25 / 50 (50%) │
│ 挂起线程：0 │
│ ORB.thread.pool │
│ 活动线程：10 / 20 (50%) │
├─────────────────────────────────────────────────────────┤
│ 数据库连接池 │
│ jdbc/fgedudb │
│ 使用连接：15 / 50 (30%) │
│ 空闲连接：35 │
│ 等待时间：0 ms │
└─────────────────────────────────────────────────────────┘

4. 设置刷新间隔
刷新间隔：5秒 / 10秒 / 30秒 / 1分钟

5. 导出监控数据
点击”导出” > 选择格式（CSV / XML）

3.2 PMI性能监控实战

PMI性能监控操作：

3.2.1 PMI配置

# PMI配置

1. 配置PMI级别
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123

# 获取PMI MBean
perf = AdminControl.queryNames(‘type=Perf,process=server1,*’)

# 设置监控级别
AdminControl.invoke(perf, ‘setStatisticSet’, ‘Medium’)

# 启用特定模块
AdminControl.invoke(perf, ‘setInstrumentationLevel’,
‘[“name=JVMRuntime,level=high”]’)

2. 配置监控模块
# JVM监控
AdminControl.invoke(perf, ‘setInstrumentationLevel’,
‘[“name=JVMRuntime,level=high”]’)

# 线程池监控
AdminControl.invoke(perf, ‘setInstrumentationLevel’,
‘[“name=ThreadPool,level=medium”]’)

# 连接池监控
AdminControl.invoke(perf, ‘setInstrumentationLevel’,
‘[“name=ConnectionPool,level=medium”]’)

# Web容器监控
AdminControl.invoke(perf, ‘setInstrumentationLevel’,
‘[“name=WebContainer,level=medium”]’)

3. 查看监控数据
# 获取JVM监控数据
jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
print “堆内存使用：”
print ” 已用：” + str(heap.get(‘used’) / 1024 / 1024) + ” MB”
print ” 最大：” + str(heap.get(‘max’) / 1024 / 1024) + ” MB”
print ” 使用率：” + str(heap.get(‘used’) * 100 / heap.get(‘max’)) + “%”

堆内存使用：
已用：512 MB
最大：2048 MB
使用率：25%

# 获取线程池数据
tp = AdminControl.queryNames(‘type=ThreadPool,name=WebContainer,*’)
print “WebContainer线程池：”
print ” 活动线程：” + AdminControl.getAttribute(tp, ‘activeThreads’)
print ” 最大线程：” + AdminControl.getAttribute(tp, ‘maxThreads’)
print ” 池大小：” + AdminControl.getAttribute(tp, ‘poolSize’)

WebContainer线程池：
活动线程：25
最大线程：50
池大小：50

# 获取连接池数据
ds = AdminControl.queryNames(‘type=JDBCProvider,jdbc/fgedudb,*’)
print “数据源连接池：”
print AdminControl.getAttribute(ds, ‘stats’)

数据源连接池：
使用连接：15
空闲连接：35
等待时间：0

3.3 Tivoli Performance Viewer实战

Tivoli Performance Viewer操作：

3.3.1 TPV使用

# Tivoli Performance Viewer使用

1. 启动TPV
管理控制台 > 监控和调整 > 性能查看器

或直接访问：
https://fgedu.net.cn:9043/ibm/console/perfViewer.jsp

2. TPV界面
┌─────────────────────────────────────────────────────────┐
│ Tivoli Performance Viewer │
├─────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────────────────────────────┐ │
│ │ 服务器列表 │ │ 性能图表 │ │
│ │ │ │ │ │
│ │ ○ server1 │ │ ┌─────────────────────────────┐ │ │
│ │ ○ server2 │ │ │ 堆内存使用 │ │ │
│ │ │ │ │ ▓▓▓▓▓▓░░░░░░░░░░░░ 50% │ │ │
│ │ 指标选择 │ │ └─────────────────────────────┘ │ │
│ │ ☑ JVM │ │ │ │
│ │ ☑ 线程池 │ │ ┌─────────────────────────────┐ │ │
│ │ ☑ 连接池 │ │ │ 线程池使用 │ │ │
│ │ ☑ Web容器 │ │ │ ▓▓▓▓▓▓▓▓░░░░░░░░░ 40% │ │ │
│ │ │ │ └─────────────────────────────┘ │ │
│ └─────────────┘ └─────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ 时间范围：[1小时] [6小时] [24小时] [自定义] │
│ 刷新间隔：[5秒] [10秒] [30秒] [1分钟] │
└─────────────────────────────────────────────────────────┘

3. 配置监控视图
# 添加监控指标
点击”添加” > 选择指标：
– JVM > 堆内存使用
– 线程池 > WebContainer > 活动线程
– 连接池 > jdbc/fgedudb > 使用连接
– Web容器 > 请求数

4. 设置告警阈值
点击”告警” > 设置阈值：
– 堆内存使用 > 80%：警告
– 堆内存使用 > 90%：严重
– 线程池使用 > 80%：警告

5. 导出数据
点击”导出” > 选择格式：
– CSV
– XML
– 图表图片

# TPV命令行方式
/WebSphere/app/profiles/Dmgr01/bin/tpv.sh -hostname fgedu.net.cn \
-port 9080 -username fgeduadmin -password fgedu123

3.4 自定义监控脚本实战

自定义监控脚本：

3.4.1 监控脚本开发

# 自定义监控脚本

1. JVM监控脚本
#!/bin/bash
# monitor_jvm.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

WAS_HOME=/WebSphere/app
PROFILE=Dmgr01
SERVER=server1
USER=fgeduadmin
PASS=fgedu123

echo “=== WebSphere JVM监控 ===”
echo “时间：$(date)”
echo “”

# 获取JVM信息
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username $USER -password $PASS -c ”
jvm = AdminControl.queryNames(‘type=JVM,process=$SERVER,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
used = heap.get(‘used’) / 1024 / 1024
max = heap.get(‘max’) / 1024 / 1024
usage = used * 100 / max
print ‘堆内存使用：%.2f MB / %.2f MB (%.2f%%)’ % (used, max, usage)

runtime = AdminControl.queryNames(‘type=JVMRuntime,process=$SERVER,*’)
print ‘GC次数：’ + AdminControl.getAttribute(runtime, ‘gcCount’)
print ‘GC时间：’ + AdminControl.getAttribute(runtime, ‘gcTime’) + ‘ ms’
print ‘线程数：’ + AdminControl.getAttribute(runtime, ‘totalThreads’)
” 2>/dev/null

echo “”
echo “=== 监控完成 ===”

执行结果：
=== WebSphere JVM监控 ===
时间：2026年4月10日 10:00:00

堆内存使用：512.00 MB / 2048.00 MB (25.00%)
GC次数：1250
GC时间：3500 ms
线程数：125

=== 监控完成 ===

2. 线程池监控脚本
#!/bin/bash
# monitor_threadpool.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

echo “=== WebSphere线程池监控 ===”
echo “时间：$(date)”
echo “”

/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
tpList = AdminControl.queryNames(‘type=ThreadPool,*’).splitlines()
for tp in tpList:
name = AdminControl.getAttribute(tp, ‘name’)
active = AdminControl.getAttribute(tp, ‘activeThreads’)
poolSize = AdminControl.getAttribute(tp, ‘poolSize’)
maxThreads = AdminControl.getAttribute(tp, ‘maxThreads’)
usage = float(active) * 100 / float(maxThreads)
print ‘%s: 活动=%s, 池大小=%s, 最大=%s, 使用率=%.2f%%’ % (name, active, poolSize, maxThreads, usage)
” 2>/dev/null

echo “”
echo “=== 监控完成 ===”

执行结果：
=== WebSphere线程池监控 ===
时间：2026年4月10日 10:00:00

WebContainer: 活动=25, 池大小=50, 最大=50, 使用率=50.00%
ORB.thread.pool: 活动=10, 池大小=20, 最大=20, 使用率=50.00%
Default: 活动=5, 池大小=10, 最大=10, 使用率=50.00%

=== 监控完成 ===

3. 连接池监控脚本
#!/bin/bash
# monitor_connectionpool.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

echo “=== WebSphere连接池监控 ===”
echo “时间：$(date)”
echo “”

/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
dsList = AdminControl.queryNames(‘type=DataSource,*’).splitlines()
for ds in dsList:
name = AdminControl.getAttribute(ds, ‘name’)
jndi = AdminControl.getAttribute(ds, ‘jndiName’)
stats = AdminControl.getAttribute(ds, ‘stats’)
print ‘数据源：%s (%s)’ % (name, jndi)
print ‘ 统计信息：%s’ % stats
print ”
” 2>/dev/null

echo “=== 监控完成 ===”

执行结果：
=== WebSphere连接池监控 ===
时间：2026年4月10日 10:00:00

数据源：fgedudb (jdbc/fgedudb)
统计信息：使用连接=15, 空闲连接=35, 等待时间=0

=== 监控完成 ===

4. 综合监控脚本
#!/bin/bash
# was_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

LOG_FILE=/WebSphere/logs/monitor_$(date +%Y%m%d).log
ALERT_THRESHOLD_HEAP=80
ALERT_THRESHOLD_THREAD=80

echo “========================================” | tee -a $LOG_FILE
echo “WebSphere综合监控报告” | tee -a $LOG_FILE
echo “时间：$(date)” | tee -a $LOG_FILE
echo “========================================” | tee -a $LOG_FILE

# JVM监控
echo “” | tee -a $LOG_FILE
echo “【JVM监控】” | tee -a $LOG_FILE
heap_usage=$(/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
print heap.get(‘used’) * 100 / heap.get(‘max’)
” 2>/dev/null)

echo “堆内存使用率：${heap_usage}%” | tee -a $LOG_FILE

# 告警判断
if [ $(echo “$heap_usage > $ALERT_THRESHOLD_HEAP” | bc) -eq 1 ]; then
echo “【告警】堆内存使用率超过${ALERT_THRESHOLD_HEAP}%！” | tee -a $LOG_FILE
# 发送告警
echo “WebSphere告警：堆内存使用率${heap_usage}%” | mail -s “WebSphere告警” admin@fgedu.net.cn
fi

# 线程池监控
echo “” | tee -a $LOG_FILE
echo “【线程池监控】” | tee -a $LOG_FILE
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
tpList = AdminControl.queryNames(‘type=ThreadPool,*’).splitlines()
for tp in tpList:
name = AdminControl.getAttribute(tp, ‘name’)
active = int(AdminControl.getAttribute(tp, ‘activeThreads’))
maxThreads = int(AdminControl.getAttribute(tp, ‘maxThreads’))
usage = active * 100 / maxThreads
print ‘%s: %d/%d (%.2f%%)’ % (name, active, maxThreads, usage)
” 2>/dev/null | tee -a $LOG_FILE

echo “” | tee -a $LOG_FILE
echo “========================================” | tee -a $LOG_FILE

执行结果：
========================================
WebSphere综合监控报告
时间：2026年4月10日 10:00:00
========================================

【JVM监控】
堆内存使用率：25.00%

【线程池监控】
WebContainer: 25/50 (50.00%)
ORB.thread.pool: 10/20 (50.00%)

========================================

风哥提示：自定义监控脚本可以根据实际需求灵活定制，建议将监控数据存入数据库以便进行历史分析和趋势预测。学习交流加群风哥QQ113257174

Part04-生产案例与实战讲解

4.1 性能问题诊断案例

性能问题诊断案例：

4.1.1 案例背景

# 性能问题诊断案例

故障现象：
应用响应缓慢，用户投诉页面加载时间长

诊断步骤：
1. 检查系统资源
top – 10:00:00 up 30 days
%Cpu(s): 85.0 us, 10.0 sy, 5.0 id
KiB Mem : 16384000 total, 1000000 free, 14000000 used

2. 检查JVM状态
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
print ‘堆内存使用：%.2f%%’ % (heap.get(‘used’) * 100.0 / heap.get(‘max’))
” 2>/dev/null

堆内存使用：95.00%

3. 分析GC日志
grep “Full GC” /WebSphere/app/profiles/AppSrv01/logs/server1/gc.log | tail -5

[Full GC (Allocation Failure) 1900M->1800M(2048M), 5.2345678 secs]
[Full GC (Allocation Failure) 1900M->1800M(2048M), 5.1234567 secs]
[Full GC (Allocation Failure) 1900M->1800M(2048M), 5.3456789 secs]

问题定位：
– 堆内存使用率高达95%
– 频繁Full GC，每次耗时5秒以上
– CPU使用率高，大部分时间在GC

解决方案：
1. 增加堆内存
管理控制台 > 服务器 > server1 > Java虚拟机
> 初始堆大小：4096
> 最大堆大小：4096

2. 优化GC参数
> 通用JVM参数：
-Xms4096m -Xmx4096m -Xmn1536m
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

3. 重启服务器
/WebSphere/app/profiles/AppSrv01/bin/stopServer.sh server1
/WebSphere/app/profiles/AppSrv01/bin/startServer.sh server1

处理结果：
– 堆内存使用率降至60%
– Full GC频率降低
– 应用响应恢复正常

4.2 内存泄漏分析案例

内存泄漏分析案例：

4.2.1 案例背景

# 内存泄漏分析案例

故障现象：
应用运行一段时间后内存持续增长，最终OOM

诊断步骤：
1. 监控内存趋势
# 每小时记录内存使用
while true; do
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
print ‘%s %.2f’ % (\$(date +%H:%M), heap.get(‘used’) / 1024 / 1024)
” 2>/dev/null
sleep 3600
done

内存趋势：
10:00 512.00 MB
11:00 612.00 MB
12:00 712.00 MB
13:00 812.00 MB
14:00 912.00 MB
15:00 1012.00 MB
16:00 1112.00 MB

2. 生成堆转储
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
AdminControl.invoke(jvm, ‘generateHeapDump’)
” 2>/dev/null

堆转储文件：
/WebSphere/app/profiles/AppSrv01/heapdump.20260410.100000.12345.phd

3. 分析堆转储
使用Memory Analyzer Tool (MAT)分析

分析结果：
– 最大对象：FGeduCache（占用500MB）
– 对象数量：100000个
– 问题原因：缓存对象未释放

解决方案：
1. 修改缓存配置
@Stateless
public class FGeduCache {
private Map cache = new WeakHashMap<>();

// 使用WeakHashMap，允许GC回收
}

2. 添加缓存清理机制
@Schedule(hour = “*/4”)
public void cleanCache() {
cache.clear();
}

3. 限制缓存大小
private static final int MAX_SIZE = 10000;

public void put(String key, Object value) {
if (cache.size() >= MAX_SIZE) {
cache.clear();
}
cache.put(key, value);
}

处理结果：
– 内存使用稳定
– 无内存泄漏
– 应用运行正常

4.3 线程阻塞分析案例

线程阻塞分析案例：

4.3.1 案例背景

# 线程阻塞分析案例

故障现象：
应用偶尔卡死，请求超时

诊断步骤：
1. 检查线程池状态
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
tp = AdminControl.queryNames(‘type=ThreadPool,name=WebContainer,*’)
print ‘活动线程：’ + AdminControl.getAttribute(tp, ‘activeThreads’)
print ‘最大线程：’ + AdminControl.getAttribute(tp, ‘maxThreads’)
” 2>/dev/null

活动线程：50
最大线程：50

2. 生成线程转储
kill -3

或通过wsadmin：
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-username fgeduadmin -password fgedu123 -c ”
jvm = AdminControl.queryNames(‘type=JVM,process=server1,*’)
AdminControl.invoke(jvm, ‘dumpThreads’)
” 2>/dev/null

线程转储文件：
/WebSphere/app/profiles/AppSrv01/logs/server1/javacore.20260410.100000.12345.txt

3. 分析线程转储
grep -A 20 “BLOCKED” javacore.*.txt

线程分析：
“http-9080-Processor25” – Thread t00025
java.lang.Thread.State: BLOCKED
at com.fgedu.service.FGeduService.getData(FGeduService.java:100)
– waiting to lock <0x0000000012345678> (a java.lang.Object)
at com.fgedu.servlet.FGeduServlet.doGet(FGeduServlet.java:50)

“http-9080-Processor26” – Thread t00026
java.lang.Thread.State: BLOCKED
at com.fgedu.service.FGeduService.getData(FGeduService.java:100)
– waiting to lock <0x0000000012345678> (a java.lang.Object)

问题定位：
– 多个线程等待同一把锁
– FGeduService.getData方法存在锁竞争

解决方案：
1. 优化锁策略
@Stateless
public class FGeduService {
// 原来：方法级锁
public synchronized FGeduData getData() {
// …
}

// 优化后：减少锁粒度
private final Object lock = new Object();

public FGeduData getData() {
FGeduData data = cache.get(key);
if (data == null) {
synchronized (lock) {
data = cache.get(key);
if (data == null) {
data = loadData();
cache.put(key, data);
}
}
}
return data;
}
}

2. 使用并发集合
private ConcurrentHashMap cache =
new ConcurrentHashMap<>();

3. 增加线程池大小
管理控制台 > 服务器 > server1 > 线程池
> WebContainer > 最大线程数：100

处理结果：
– 线程阻塞消除
– 请求处理正常
– 应用性能提升

Part05-风哥经验总结与分享

5.1 性能监控检查清单

性能监控检查清单：

# 性能监控检查清单

配置检查：
□ PMI服务已启用
□ 监控级别配置正确
□ 告警阈值配置正确
□ 监控数据存储配置

监控检查：
□ JVM监控正常
□ 线程池监控正常
□ 连接池监控正常
□ 应用监控正常

告警检查：
□ 告警通知配置正确
□ 告警升级策略配置
□ 告警处理流程明确

# 性能监控检查脚本
#!/bin/bash
# monitor_check.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn

echo “=== WebSphere性能监控检查 ===”

# 检查PMI状态
echo “1. PMI状态：”
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-c “perf = AdminControl.queryNames(‘type=Perf,*’)
print AdminControl.getAttribute(perf, ‘statSet’)” 2>/dev/null

# 检查JVM状态
echo “”
echo “2. JVM状态：”
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-c “jvm = AdminControl.queryNames(‘type=JVM,*’)
heap = AdminControl.getAttribute(jvm, ‘heapMemoryUsage’)
print ‘堆内存使用率：%.2f%%’ % (heap.get(‘used’) * 100.0 / heap.get(‘max’))” 2>/dev/null

# 检查线程池状态
echo “”
echo “3. 线程池状态：”
/WebSphere/app/profiles/Dmgr01/bin/wsadmin.sh -lang jython \
-c “tpList = AdminControl.queryNames(‘type=ThreadPool,*’).splitlines()
for tp in tpList:
name = AdminControl.getAttribute(tp, ‘name’)
active = int(AdminControl.getAttribute(tp, ‘activeThreads’))
maxThreads = int(AdminControl.getAttribute(tp, ‘maxThreads’))
print ‘%s: %d/%d’ % (name, active, maxThreads)” 2>/dev/null

echo “”
echo “=== 检查完成 ===”

5.2 性能监控常见问题

性能监控常见问题及解决方案：

5.2.1 常见问题汇总

# 性能监控常见问题

问题1：PMI数据不准确
原因：监控级别设置过低
解决：提高监控级别

问题2：监控影响性能
原因：监控级别设置过高
解决：降低监控级别，只监控关键指标

问题3：告警风暴
原因：告警阈值设置不合理
解决：调整告警阈值，设置告警静默期

问题4：监控数据丢失
原因：存储空间不足
解决：增加存储空间，配置数据归档

问题5：无法获取监控数据
原因：PMI服务未启用
解决：启用PMI服务

5.3 性能监控最佳实践

基于多年WebSphere运维经验，总结性能监控最佳实践：

5.3.1 监控配置最佳实践

合理设置监控级别：生产环境使用Medium级别
关注关键指标：堆内存、线程池、连接池
建立性能基线：记录正常状态下的性能指标
定期审查告警：根据实际情况调整告警阈值

5.3.2 监控运维最佳实践

自动化监控：使用脚本自动采集监控数据
历史数据分析：定期分析历史数据，发现趋势
及时处理告警：建立告警处理流程
持续优化：根据监控数据持续优化系统

生产环境建议：WebSphere性能监控是保障系统稳定运行的重要手段，需要建立完善的监控体系，及时发现和处理性能问题。建议将监控数据存入数据库，便于历史分析和趋势预测。from WebSphere视频:www.itpux.com

本文档详细介绍了WebSphere 9.0.5的性能监控工具，包括管理控制台监控、PMI性能监控、Tivoli Performance Viewer、自定义监控脚本等内容。通过学习本文档，读者可以掌握WebSphere性能监控的方法和最佳实践。更多视频教程www.fgedu.net.cn

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

WebSphere教程FG015-WebSphere性能监控工具使用与实战

Part01-基础概念与理论知识

1.1 WebSphere性能监控概述

1.1.1 性能监控层次

1.2 WebSphere监控工具介绍

1.2.1 内置监控工具

1.3 WebSphere性能指标

1.3.1 核心性能指标

1.4 WebSphere监控架构

1.4.1 PMI架构

Part02-生产环境规划与建议

2.1 性能监控规划

2.1.1 监控规划要素

2.2 告警机制规划

2.2.1 告警配置

2.3 监控数据存储

2.3.1 存储方案

2.4 监控系统集成

2.4.1 集成方案

Part03-生产环境项目实施方案

3.1 管理控制台监控实战

3.1.1 启用PMI监控

3.1.2 查看性能监控器

3.2 PMI性能监控实战

3.2.1 PMI配置

3.3 Tivoli Performance Viewer实战

3.3.1 TPV使用

3.4 自定义监控脚本实战

3.4.1 监控脚本开发

Part04-生产案例与实战讲解

4.1 性能问题诊断案例

4.1.1 案例背景

4.2 内存泄漏分析案例

4.2.1 案例背景

4.3 线程阻塞分析案例

4.3.1 案例背景

Part05-风哥经验总结与分享

5.1 性能监控检查清单

5.2 性能监控常见问题

5.2.1 常见问题汇总

5.3 性能监控最佳实践

5.3.1 监控配置最佳实践

5.3.2 监控运维最佳实践

相关推荐

联系我们