opengauss教程FG072-openGauss CPU高负载优化处理生产实战解析
目录大纲
- Part01-基础概念与理论知识
- 1.1 CPU高负载的定义与影响
- 1.2 CPU高负载的常见原因
- 1.3 openGauss的CPU使用特点
- Part02-生产环境规划与建议
- 2.1 CPU资源规划
- 2.2 CPU监控体系设计
- Part03-生产环境项目实施方案
- 3.1 系统级CPU优化
- 3.2 数据库级CPU优化
- Part04-生产案例与实战讲解
- 4.1 CPU高负载诊断实战
- 4.2 CPU优化实战
- 4.3 CPU监控脚本
- 4.4 性能优化案例
- Part05-风哥经验总结与分享
- 5.1 CPU高负载处理经验
- 5.2 性能优化建议
Part01-基础概念与理论知识
1.1 CPU高负载的定义与影响
CPU高负载是指系统CPU使用率持续处于较高水平,影响系统性能和响应时间。
– 单核系统:CPU使用率超过80%
– 多核系统:load average超过CPU核心数的70%
– 持续时间:超过5分钟的高负载
1.2 CPU高负载的常见原因
openGauss数据库CPU高负载的常见原因:
1. SQL语句问题:复杂查询、全表扫描、缺少索引
2. 并发过高:大量并发连接和查询
3. 数据库参数不合理:work_mem、shared_buffers等参数设置不当
4. 系统资源竞争:其他进程占用CPU资源
风哥提示:
5. 硬件瓶颈:CPU核心数不足或性能不足
6. 数据库版本问题:存在性能bug
1.3 openGauss的CPU使用特点
openGauss数据库的CPU使用特点:
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 75.0 us, 10.0 sy, 0.0 ni, 10.0 id, 5.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32768000 total, 5120000 free, 22528000 used, 5120000 buff/cache
KiB Swap: 8388608 total, 8388608 free, 0 used. 8192000 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12345 omm 20 0 20.5g 5.2g 4.8g S 350.0 16.5 12:30.45 gaussdb
Part02-生产环境规划与建议
2.1 CPU资源规划
CPU资源规划是避免高负载的基础:
学习交流加群风哥微信: itpux-com
1. 根据业务量估算:根据并发用户数和查询复杂度估算CPU需求
2. 预留冗余:预留30%的CPU资源作为缓冲
3. 考虑扩展性:选择支持横向扩展的架构
4. NUMA优化:合理配置NUMA,避免跨NUMA访问
2.2 CPU监控体系设计
建立完善的CPU监控体系:
1. 实时监控:实时监控CPU使用率和负载
2. 历史分析:记录CPU使用历史,用于趋势分析
3. 告警机制:设置合理的告警阈值
4. 关联分析:关联CPU使用与数据库活动
Part03-生产环境项目实施方案
3.1 系统级CPU优化
系统级CPU优化配置:
kernel.sched_autogroup_enabled=0 # 禁用自动分组调度
kernel.sched_migration_cost_ns=5000000 # 增加迁移成本
kernel.sched_min_granularity_ns=10000000 # 增加调度粒度
kernel.sched_wakeup_granularity_ns=15000000 # 增加唤醒粒度
kernel.sched_migration_cost_ns = 5000000
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
3.2 数据库级CPU优化
openGauss数据库CPU优化参数配置:
— 查看当前CPU相关参数
SHOW max_worker_processes;
SHOW max_parallel_workers_per_gather;
SHOW parallel_leader_participation;
SHOW work_mem;
———————-
8
(1 row)
max_parallel_workers_per_gather
———————————
2
(1 row)
parallel_leader_participation
——————————
on
(1 row)
work_mem
———-
16MB
(1 row)
— 优化CPU相关参数
ALTER SYSTEM SET max_worker_processes = 16;
# 根据CPU核心数调整
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;
# 并行度
ALTER SYSTEM SET work_mem = ’64MB’;
# 增加工作内存
ALTER SYSTEM SET maintenance_work_mem = ‘512MB’;
# 维护工作内存
ALTER SYSTEM SET random_page_cost = 1.1;
# 调整成本估算
— 重新加载配置
SELECT pg_reload_conf();
—————-
t
(1 row)
Part04-生产案例与实战讲解
4.1 CPU高负载诊断实战
实战演示CPU高负载诊断:
Tasks: 285 total, 5 running, 280 sleeping, 0 stopped, 0 zombie
%Cpu(s): 75.0 us, 10.0 sy, 0.0 ni, 10.0 id, 5.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32768000 total, 5120000 free, 22528000 used, 5120000 buff/cache
KiB Swap: 8388608 total, 8388608 free, 0 used. 8192000 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12345 omm 20 0 20.5g 5.2g 4.8g S 350.0 16.5 12:30.45 gaussdb
6789 root 20 0 120m 20m 15m S 5.0 0.1 0:30.12 sshd
1234 root 20 0 500m 50m 40m S 2.5 0.2 1:15.34 httpd
# pidstat -p 12345 1 5
14:30:00 UID PID %usr %system %guest %CPU CPU Command
14:30:01 1000 12345 70.00 10.00 0.00 80.00 0 gaussdb
14:30:02 1000 12345 72.00 12.00 0.00 84.00 0 gaussdb
14:30:03 1000 12345 68.00 8.00 0.00 76.00 0 gaussdb
14:30:04 1000 12345 75.00 10.00 0.00 85.00 0 gaussdb
14:30:05 1000 12345 71.00 11.00 0.00 82.00 0 gaussdb
Average: 1000 12345 71.20 10.20 0.00 81.40 – gaussdb
— 查看当前活跃SQL
SELECT
pid,
usename,
application_name,
now() – query_start AS duration,
query
FROM pg_stat_activity
WHERE state = ‘active’
ORDER BY duration DESC
LIMIT 10;from DB视频:www.itpux.com
——-+———+——————+———-+——-
15234 | fgedu01 | app_server | 00:05:30 | SELECT * FROM fgedu_orders WHERE customer_id = 10001;
15289 | fgedu02 | batch_job | 00:03:15 | INSERT INTO fgedu_sales (order_id, amount) SELECT order_id, total_amount FROM fgedu_order_details WHERE create_date = current_date;
15345 | fgedu03 | report_service | 00:02:45 | SELECT o.order_id, o.customer_id, o.total_amount, c.customer_name FROM fgedu_orders o JOIN fgedu_customers c ON o.customer_id = c.customer_id WHERE o.create_date BETWEEN ‘2024-01-01’ AND ‘2024-01-15’;
(3 rows)
4.2 CPU优化实战
实战演示CPU优化:
EXPLAIN ANALYZE SELECT * FROM fgedu_orders WHERE customer_id = 10001;
————————————————————————————————————————-
Seq Scan on fgedu_orders (cost=0.00..10000.00 rows=1000 width=100) (actual time=0.010..500.000 rows=1000 loops=1)
Filter: (customer_id = 10001)
Rows Removed by Filter: 99000
Planning Time: 0.100 ms
Execution Time: 500.050 ms
(5 rows)
CREATE INDEX idx_fgedu_orders_customer_id ON fgedu_orders(customer_id);
EXPLAIN ANALYZE SELECT * FROM fgedu_orders WHERE customer_id = 10001;
—————————————————————————————————————————
Index Scan using idx_fgedu_orders_customer_id on fgedu_orders (cost=0.25..100.00 rows=1000 width=100) (actual time=0.010..5.000 rows=1000 loops=1)
Index Cond: (customer_id = 10001)
Planning Time: 0.100 ms
Execution Time: 5.050 ms
(4 rows)
— 原SQL
INSERT INTO fgedu_sales (order_id, amount)
SELECT order_id, total_amount
FROM fgedu_order_details
WHERE create_date = current_date;
— 优化后
ALTER TABLE fgedu_sales SET (parallel_workers = 4);
INSERT INTO fgedu_sales (order_id, amount)
SELECT order_id, total_amount
FROM fgedu_order_details
WHERE create_date = current_date;
INSERT 0 50000
4.3 CPU监控脚本
编写CPU监控脚本:
# cpu_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# openGauss CPU监控脚本
LOG_DIR=”/opengauss/log/cpu”
mkdir -p $LOG_DIR
LOG_FILE=”$LOG_DIR/cpu_monitor_$(date ‘+%Y%m%d’).log”
log() {
echo “[$(date ‘+%Y-%m-%d %H:%M:%S’)] $1” | tee -a $LOG_FILE
}
check_cpu_usage() {
log “=== CPU使用情况检查 ===”
# 系统负载
loadavg=$(uptime | awk ‘{print $10 $11 $12}’)
log “系统负载: $loadavg”
# CPU使用率
cpu_usage=$(top -bn1 | grep “Cpu(s)” | sed “s/.*, *\([0-9.]*\)%* id.*/\1/” | awk ‘{print 100 – $1}’)
log “CPU使用率: ${cpu_usage}%”
# 进程CPU使用
log “进程CPU使用排行:”
ps aux –sort=-%cpu | head -10 | tee -a $LOG_FILE
# gaussdb进程CPU使用
log “gaussdb进程CPU使用:”
pid=$(pgrep gaussdb)
if [ -n “$pid” ]; then
pidstat -p $pid 1 3 | tee -a $LOG_FILE
else
log “gaussdb进程未找到”
fi
}
check_database_activity() {
log “=== 数据库活动检查 ===”
# 活跃连接数
active_connections=$(gsql -h 192.168.1.10 -d fgedudb -U fgedu -t -c ”
SELECT count(*) FROM pg_stat_activity WHERE state = ‘active’;
” | tr -d ‘ ‘)
log “活跃连接数: $active_connections”
# 慢SQL
log “慢SQL:
$(gsql -h 192.168.1.10 -d fgedudb -U fgedu -t -c ”
SELECT
pid,
usename,
now() – query_start AS duration,
query
FROM pg_stat_activity
WHERE state = ‘active’
AND now() – query_start > interval ‘1 second’
ORDER BY duration DESC
LIMIT 5;
“)”
}
main() {
log “开始CPU监控检查…”
check_cpu_usage
check_database_activity
log “CPU监控检查完成”
}
main
# /opengauss/scripts/cpu_monitor.sh
[2024-01-15 14:30:00] === CPU使用情况检查 ===
[2024-01-15 14:30:00] 系统负载: 4.5,4.2,3.8
[2024-01-15 14:30:00] CPU使用率: 90.0%
[2024-01-15 14:30:00] 进程CPU使用排行:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
omm 12345 85.0 16.5 21474840 5452800 ? Ssl Jan14 12:30 gaussdb
root 6789 5.0 0.1 122880 20480 ? Ss Jan14 0:30 sshd
root 1234 2.5 0.2 512000 51200 ? Ss Jan14 1:15 httpd
[2024-01-15 14:30:01] gaussdb进程CPU使用:
Linux 5.4.0-100-generic (fgedu.net.cn) 01/15/2024 _x86_64_ (8 CPU)
14:30:01 UID PID %usr %system %guest %CPU CPU Command
14:30:02 1000 12345 70.00 10.00 0.00 80.00 0 gaussdb
14:30:03 1000 12345 72.00 12.00 0.00 84.00 0 gaussdb
14:30:04 1000 12345 68.00 8.00 0.00 76.00 0 gaussdb
Average: 1000 12345 70.00 10.00 0.00 80.00 – gaussdb
[2024-01-15 14:30:04] === 数据库活动检查 ===
[2024-01-15 14:30:04] 活跃连接数: 15
[2024-01-15 14:30:04] 慢SQL:
15234 | fgedu01 | 00:05:30 | SELECT * FROM fgedu_orders WHERE customer_id = 10001;
15289 | fgedu02 | 00:03:15 | INSERT INTO fgedu_sales (order_id, amount) SELECT order_id, total_amount FROM fgedu_order_details WHERE create_date = current_date;
15345 | fgedu03 | 00:02:45 | SELECT o.order_id, o.customer_id, o.total_amount, c.customer_name FROM fgedu_orders o JOIN fgedu_customers c ON o.customer_id = c.customer_id WHERE o.create_date BETWEEN ‘2024-01-01’ AND ‘2024-01-15’;
[2024-01-15 14:30:04] CPU监控检查完成
*/5 * * * * /opengauss/scripts/cpu_monitor.sh > /dev/null 2>&1
4.4 性能优化案例
案例:复杂查询优化
SELECT
o.order_id,
o.customer_id,
o.total_amount,
c.customer_name,
SUM(od.quantity * od.unit_price) AS calculated_amount
FROM fgedu_orders o
JOIN fgedu_customers c ON o.customer_id = c.customer_id
JOIN fgedu_order_details od ON o.order_id = od.order_id
WHERE o.create_date BETWEEN ‘2024-01-01’ AND ‘2024-01-15’
GROUP BY o.order_id, o.customer_id, o.total_amount, c.customer_name
ORDER BY o.total_amount DESC;
CREATE INDEX idx_fgedu_orders_create_date ON fgedu_orders(create_date);
CREATE INDEX idx_fgedu_order_details_order_id ON fgedu_order_details(order_id);
— 优化步骤2:分析表统计信息
ANALYZE fgedu_orders;
ANALYZE fgedu_customers;
ANALYZE fgedu_order_details;
— 优化步骤3:调整查询
SELECT
o.order_id,
o.customer_id,
o.total_amount,
c.customer_name,
SUM(od.quantity * od.unit_price) AS calculated_amount
FROM fgedu_orders o
JOIN fgedu_customers c ON o.customer_id = c.customer_id
JOIN fgedu_order_details od ON o.order_id = od.order_id
WHERE o.create_date BETWEEN ‘2024-01-01’ AND ‘2024-01-15’
GROUP BY o.order_id, o.customer_id, o.total_amount, c.customer_name
ORDER BY o.total_amount DESC;
CREATE INDEX
ANALYZE
ANALYZE
ANALYZE
— 执行时间:3秒
Part05-风哥经验总结与分享
5.1 CPU高负载处理经验
风哥提示:CPU高负载是数据库性能问题的常见表现,需要系统分析和综合优化。
1. 快速定位:使用top、pidstat等工具快速定位CPU使用高的进程
2. SQL分析:识别消耗CPU的SQL语句,进行优化
3. 索引优化:为频繁查询的字段创建合适的索引
4. 参数调整:根据系统情况调整数据库参数
5. 资源隔离:将数据库与其他高CPU进程隔离
1. 不要盲目增加并行度,可能会导致CPU竞争加剧
2. 索引不是越多越好,过多的索引会影响写入性能
3. 定期收集统计信息,确保执行计划准确
4. 监控CPU使用趋势,及时发现潜在问题
5. 结合业务场景进行优化,不同业务的优化策略不同
5.2 性能优化建议
1. 硬件层面:
– 选择高性能CPU,核心数适中
– 启用超线程技术(如果适用)
– 合理配置NUMA
2. 系统层面:
– 优化操作系统调度参数
– 关闭不必要的服务和进程
– 使用实时内核(对延迟敏感的场景)
3. 数据库层面:
– 优化查询计划
– 合理设置并行度
– 调整内存参数,减少CPU计算开销
4. 应用层面:
– 优化SQL语句
– 实现查询缓存
– 减少数据库访问次数
1. 建立CPU使用基线,了解正常运行时的CPU使用情况
2. 定期监控CPU使用趋势,及时发现异常
3. 对消耗CPU的SQL进行重点优化
4. 合理配置数据库参数,充分利用CPU资源
5. 结合业务特点,制定个性化的优化策略
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
