GreenPlum教程FG012-GreenPlum执行计划调优实战
本文档风哥主要介绍GreenPlum执行计划调优,包括执行计划概念、执行计划操作符、执行计划分析方法、执行计划实战、执行计划优化等内容,风哥教程参考GreenPlum官方文档Query Guide、Performance Tuning等内容编写,适合DBA人员在学习和测试中使用。
Part01-基础概念与理论知识
1.1 GreenPlum执行计划概念
执行计划是数据库优化器为SQL语句生成的执行步骤和策略。通过分析执行计划,可以了解查询的执行过程,找出性能瓶颈。更多视频教程www.fgedu.net.cn
1.1.1 执行计划生成
1. 查询解析
– 解析SQL语法
– 检查语义正确性
– 生成解析树
2. 查询重写
– 规则优化
– 视图展开
– 子查询优化
3. 查询优化
– 生成多个候选计划
– 估算各计划成本
– 选择最优执行计划
4. GreenPlum优化器
– ORCA优化器(默认)
– Legacy优化器
– 可通过参数切换
5. 查看执行计划
EXPLAIN [ANALYZE] query;
1.2 GreenPlum执行计划操作符
执行计划由多个操作符组成,每个操作符代表一个执行步骤。学习交流加群风哥微信: itpux-com
1.2.1 常见操作符
1. 扫描操作符
– Seq Scan:顺序扫描
– Index Scan:索引扫描
– Bitmap Index Scan:位图索引扫描
– Function Scan:函数扫描
2. Join操作符
– Hash Join:哈希Join
– Nested Loop:嵌套循环
– Merge Join:合并Join
3. 聚合操作符
– HashAggregate:哈希聚合
– GroupAggregate:分组聚合
– Sort:排序
4. Motion操作符(GreenPlum特有)
– Gather Motion:收集到Master
– Redistribute Motion:重新分布
– Broadcast Motion:广播到所有节点
5. 其他操作符
– Limit:限制行数
– Sort:排序
– Unique:去重
– Append:追加
Part02-生产环境规划与建议
2.1 GreenPlum执行计划分析方法
- 关注cost(成本)和rows(行数)
- 注意Motion操作(数据重分布)
- 检查扫描方式(Seq Scan vs Index Scan)
- 分析Join类型和顺序
- 查看实际执行时间
Part03-生产环境项目实施方案
3.1 GreenPlum执行计划实战
3.1.1 查看执行计划
$ psql -d fgedudb -U fgedu
psql (9.4.26)
Type “help” for help.
fgedudb=>
# 创建测试表
fgedudb=> CREATE TABLE fgedu.fgedu_order_explain (
fgedudb(> order_id BIGSERIAL,
fgedudb(> customer_id INT,
fgedudb(> order_date DATE,
fgedudb(> amount NUMERIC(18,2)
fgedudb(> ) DISTRIBUTED BY (order_id);
CREATE TABLE
fgedudb=> INSERT INTO fgedu.fgedu_order_explain
fgedudb-> SELECT i, (random()*1000)::INT, ‘2024-01-01’::DATE + (random()*180)::INT, (random()*10000)::NUMERIC(18,2)
fgedudb-> FROM generate_series(1, 100000) i;
INSERT 0 100000
# 查看基本执行计划
fgedudb=> EXPLAIN SELECT * FROM fgedu.fgedu_order_explain WHERE customer_id = 100;
QUERY PLAN
———————————————————————-
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=20)
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=1 width=20)
Filter: (customer_id = 100)
(3 rows)
学习交流加群风哥QQ113257174
3.1.2 执行计划详细分析
fgedudb=> EXPLAIN ANALYZE SELECT * FROM fgedu.fgedu_order_explain WHERE customer_id = 100;
QUERY PLAN
————————————————————————————————-
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=20)
Rows out: 98 rows at destination with 3.567 ms to first row, 4.123 ms to end, start offset by 0.234 ms.
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=1 width=20)
Filter: (customer_id = 100)
Rows out: Avg 32.7 rows x 3 workers. Max 35 rows (seg0) with 0.987 ms to first row, 1.234 ms to end, start offset by 0.345 ms.
Slice statistics:
(slice0) Executor memory: 123K bytes.
(slice1) Executor memory: 456K bytes avg x 3 workers, 567K bytes max (seg0).
Total runtime: 5.678 ms
(8 rows)
# 查看更详细的执行计划
fgedudb=> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS)
fgedudb-> SELECT customer_id, COUNT(*), SUM(amount)
fgedudb-> FROM fgedu.fgedu_order_explain
fgedudb-> GROUP BY customer_id
fgedudb-> ORDER BY COUNT(*) DESC
fgedudb-> LIMIT 10;
QUERY PLAN
——————————————————————————————————
Limit (cost=523.45..523.48 rows=10 width=20)
Rows out: 10 rows with 0.012 ms to first row, 0.045 ms to end, start offset by 1.234 ms.
-> Sort (cost=523.45..525.95 rows=1001 width=20)
Sort Key: (count(*))
Sort Method: top-N heapsort Memory: 25kB
Rows out: 10 rows with 0.023 ms to first row, 0.034 ms to end.
-> Gather Motion 3:1 (slice1; segments: 3) (cost=431.00..513.45 rows=1001 width=20)
Rows out: 1001 rows at destination with 2.345 ms to first row, 2.567 ms to end.
-> HashAggregate (cost=431.00..431.00 rows=334 width=20)
Group By: customer_id
Rows out: Avg 334.0 rows x 3 workers. Max 335 rows (seg0).
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=33334 width=12)
Rows out: Avg 33333.3 rows x 3 workers. Max 33334 rows (seg0).
Slice statistics:
(slice0) Executor memory: 234K bytes.
(slice1) Executor memory: 567K bytes avg x 3 workers, 678K bytes max (seg0).
Total runtime: 3.456 ms
(14 rows)
更多学习教程公众号风哥教程itpux_com
3.2 GreenPlum执行计划优化
3.2.1 优化数据重分布
# 查看Join执行计划
fgedudb=> CREATE TABLE fgedu.fgedu_customer_explain (
fgedudb(> customer_id INT PRIMARY KEY,
fgedudb(> customer_name VARCHAR(100)
fgedudb(> ) DISTRIBUTED BY (customer_id);
CREATE TABLE
fgedudb=> INSERT INTO fgedu.fgedu_customer_explain
fgedudb-> SELECT i, ‘客户’ || i FROM generate_series(1, 1000) i;
INSERT 0 1000
# 查看Join执行计划(有Redistribute Motion)
fgedudb=> EXPLAIN
fgedudb-> SELECT c.customer_name, COUNT(o.order_id)
fgedudb-> FROM fgedu.fgedu_customer_explain c
fgedudb-> JOIN fgedu.fgedu_order_explain o ON c.customer_id = o.customer_id
fgedudb-> GROUP BY c.customer_name;
QUERY PLAN
———————————————————————————————-
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..862.00 rows=1001 width=20)
-> HashAggregate (cost=0.00..862.00 rows=334 width=20)
Group By: c.customer_name
-> Hash Join (cost=0.00..862.00 rows=100000 width=20)
Hash Cond: o.customer_id = c.customer_id
-> Redistribute Motion 3:3 (slice1; segments: 3)
Hash Key: o.customer_id
-> Seq Scan on fgedu_order_explain o (cost=0.00..431.00 rows=33334 width=8)
-> Hash (cost=431.00..431.00 rows=334 width=16)
-> Broadcast Motion 3:3 (slice1)
-> Seq Scan on fgedu_customer_explain c (cost=0.00..431.00 rows=334 width=16)
Optimizer: Pivotal Optimizer (GPORCA)
(12 rows)
# 优化:使用相同分布键,避免数据重分布
fgedudb=> CREATE TABLE fgedu.fgedu_order_optimized (
fgedudb(> order_id BIGSERIAL,
fgedudb(> customer_id INT,
fgedudb(> order_date DATE,
fgedudb(> amount NUMERIC(18,2)
fgedudb(> ) DISTRIBUTED BY (customer_id);
CREATE TABLE
from GreenPlum视频:www.itpux.com
Part04-生产案例与实战讲解
4.1 GreenPlum执行计划优化案例
4.1.1 慢查询优化案例
# 原始查询
fgedudb=> EXPLAIN ANALYZE
fgedudb-> SELECT * FROM fgedu.fgedu_order_explain
fgedudb-> WHERE DATE_TRUNC(‘month’, order_date) = ‘2024-06-01’::DATE;
QUERY PLAN
————————————————————————————————-
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=20)
Rows out: 16667 rows at destination with 5.678 ms to first row, 8.901 ms to end.
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=1 width=20)
Filter: (date_trunc(‘month’::text, order_date) = ‘2024-06-01’::date)
Rows out: Avg 5555.7 rows x 3 workers. Max 5556 rows (seg0).
Slice statistics:
(slice0) Executor memory: 123K bytes.
(slice1) Executor memory: 456K bytes avg x 3 workers, 567K bytes max (seg0).
Total runtime: 9.123 ms
(8 rows)
# 优化:避免在字段上使用函数
fgedudb=> EXPLAIN ANALYZE
fgedudb-> SELECT * FROM fgedu.fgedu_order_explain
fgedudb-> WHERE order_date >= ‘2024-06-01’::DATE
fgedudb-> AND order_date < '2024-07-01'::DATE;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=5556 width=20)
Rows out: 16667 rows at destination with 2.345 ms to first row, 3.456 ms to end.
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=5556 width=20)
Filter: ((order_date >= ‘2024-06-01’::date) AND (order_date < '2024-07-01'::date))
Rows out: Avg 5555.7 rows x 3 workers. Max 5556 rows (seg0).
Slice statistics:
(slice0) Executor memory: 123K bytes.
(slice1) Executor memory: 456K bytes avg x 3 workers, 567K bytes max (seg0).
Total runtime: 4.567 ms
(8 rows)
# 性能提升:9.123ms -> 4.567ms
Part05-风哥经验总结与分享
5.1 GreenPlum执行计划解读技巧
1. 从内向外阅读
– 最内层是扫描操作
– 向外是Join、聚合等
– 最外层是结果返回
2. 关注关键指标
– cost:预估成本
– rows:预估行数
– actual time:实际执行时间
– actual rows:实际行数
3. 识别性能瓶颈
– Seq Scan:考虑添加索引
– Motion:考虑优化分布键
– Sort:考虑添加索引或优化排序
– HashAggregate:考虑优化分组
4. Motion类型分析
– Gather Motion:收集结果,正常
– Redistribute Motion:数据重分布,需关注
– Broadcast Motion:广播数据,需关注
5. 优化建议
– 减少数据扫描量
– 避免不必要的数据重分布
– 使用合适的Join类型
– 优化统计信息
本文档介绍了GreenPlum执行计划调优的核心内容,包括执行计划操作符、执行计划分析、执行计划优化等,希望对大家有所帮助。
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
