GreenPlum教程FG012-GreenPlum执行计划调优实战

本文档风哥主要介绍GreenPlum执行计划调优，包括执行计划概念、执行计划操作符、执行计划分析方法、执行计划实战、执行计划优化等内容，风哥教程参考GreenPlum官方文档Query Guide、Performance Tuning等内容编写，适合DBA人员在学习和测试中使用。

Part01-基础概念与理论知识

1.1 GreenPlum执行计划概念

执行计划是数据库优化器为SQL语句生成的执行步骤和策略。通过分析执行计划，可以了解查询的执行过程，找出性能瓶颈。更多视频教程www.fgedu.net.cn

1.1.1 执行计划生成

GreenPlum执行计划生成：

1. 查询解析
– 解析SQL语法
– 检查语义正确性
– 生成解析树

2. 查询重写
– 规则优化
– 视图展开
– 子查询优化

3. 查询优化
– 生成多个候选计划
– 估算各计划成本
– 选择最优执行计划

4. GreenPlum优化器
– ORCA优化器（默认）
– Legacy优化器
– 可通过参数切换

5. 查看执行计划
EXPLAIN [ANALYZE] query;

1.2 GreenPlum执行计划操作符

执行计划由多个操作符组成，每个操作符代表一个执行步骤。学习交流加群风哥微信: itpux-com

1.2.1 常见操作符

常见执行计划操作符：

1. 扫描操作符
– Seq Scan：顺序扫描
– Index Scan：索引扫描
– Bitmap Index Scan：位图索引扫描
– Function Scan：函数扫描

2. Join操作符
– Hash Join：哈希Join
– Nested Loop：嵌套循环
– Merge Join：合并Join

3. 聚合操作符
– HashAggregate：哈希聚合
– GroupAggregate：分组聚合
– Sort：排序

4. Motion操作符（GreenPlum特有）
– Gather Motion：收集到Master
– Redistribute Motion：重新分布
– Broadcast Motion：广播到所有节点

5. 其他操作符
– Limit：限制行数
– Sort：排序
– Unique：去重
– Append：追加

Part02-生产环境规划与建议

2.1 GreenPlum执行计划分析方法

风哥提示：执行计划分析要点：

关注cost（成本）和rows（行数）
注意Motion操作（数据重分布）
检查扫描方式（Seq Scan vs Index Scan）
分析Join类型和顺序
查看实际执行时间

Part03-生产环境项目实施方案

3.1 GreenPlum执行计划实战

3.1.1 查看执行计划

# 连接数据库
$ psql -d fgedudb -U fgedu
psql (9.4.26)
Type “help” for help.

fgedudb=>

# 创建测试表
fgedudb=> CREATE TABLE fgedu.fgedu_order_explain (
fgedudb(> order_id BIGSERIAL,
fgedudb(> customer_id INT,
fgedudb(> order_date DATE,
fgedudb(> amount NUMERIC(18,2)
fgedudb(> ) DISTRIBUTED BY (order_id);
CREATE TABLE

fgedudb=> INSERT INTO fgedu.fgedu_order_explain
fgedudb-> SELECT i, (random()*1000)::INT, ‘2024-01-01’::DATE + (random()*180)::INT, (random()*10000)::NUMERIC(18,2)
fgedudb-> FROM generate_series(1, 100000) i;
INSERT 0 100000

# 查看基本执行计划
fgedudb=> EXPLAIN SELECT * FROM fgedu.fgedu_order_explain WHERE customer_id = 100;
QUERY PLAN
———————————————————————-
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=20)
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=1 width=20)
Filter: (customer_id = 100)
(3 rows)

学习交流加群风哥QQ113257174

3.1.2 执行计划详细分析

# 查看详细执行计划（包含实际执行信息）
fgedudb=> EXPLAIN ANALYZE SELECT * FROM fgedu.fgedu_order_explain WHERE customer_id = 100;
QUERY PLAN
————————————————————————————————-
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=20)
Rows out: 98 rows at destination with 3.567 ms to first row, 4.123 ms to end, start offset by 0.234 ms.
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=1 width=20)
Filter: (customer_id = 100)
Rows out: Avg 32.7 rows x 3 workers. Max 35 rows (seg0) with 0.987 ms to first row, 1.234 ms to end, start offset by 0.345 ms.
Slice statistics:
(slice0) Executor memory: 123K bytes.
(slice1) Executor memory: 456K bytes avg x 3 workers, 567K bytes max (seg0).
Total runtime: 5.678 ms
(8 rows)

# 查看更详细的执行计划
fgedudb=> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS)
fgedudb-> SELECT customer_id, COUNT(*), SUM(amount)
fgedudb-> FROM fgedu.fgedu_order_explain
fgedudb-> GROUP BY customer_id
fgedudb-> ORDER BY COUNT(*) DESC
fgedudb-> LIMIT 10;
QUERY PLAN
——————————————————————————————————
Limit (cost=523.45..523.48 rows=10 width=20)
Rows out: 10 rows with 0.012 ms to first row, 0.045 ms to end, start offset by 1.234 ms.
-> Sort (cost=523.45..525.95 rows=1001 width=20)
Sort Key: (count(*))
Sort Method: top-N heapsort Memory: 25kB
Rows out: 10 rows with 0.023 ms to first row, 0.034 ms to end.
-> Gather Motion 3:1 (slice1; segments: 3) (cost=431.00..513.45 rows=1001 width=20)
Rows out: 1001 rows at destination with 2.345 ms to first row, 2.567 ms to end.
-> HashAggregate (cost=431.00..431.00 rows=334 width=20)
Group By: customer_id
Rows out: Avg 334.0 rows x 3 workers. Max 335 rows (seg0).
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=33334 width=12)
Rows out: Avg 33333.3 rows x 3 workers. Max 33334 rows (seg0).
Slice statistics:
(slice0) Executor memory: 234K bytes.
(slice1) Executor memory: 567K bytes avg x 3 workers, 678K bytes max (seg0).
Total runtime: 3.456 ms
(14 rows)

更多学习教程公众号风哥教程itpux_com

3.2 GreenPlum执行计划优化

3.2.1 优化数据重分布

# 场景：优化数据重分布

# 查看Join执行计划
fgedudb=> CREATE TABLE fgedu.fgedu_customer_explain (
fgedudb(> customer_id INT PRIMARY KEY,
fgedudb(> customer_name VARCHAR(100)
fgedudb(> ) DISTRIBUTED BY (customer_id);
CREATE TABLE

fgedudb=> INSERT INTO fgedu.fgedu_customer_explain
fgedudb-> SELECT i, ‘客户’ || i FROM generate_series(1, 1000) i;
INSERT 0 1000

# 查看Join执行计划（有Redistribute Motion）
fgedudb=> EXPLAIN
fgedudb-> SELECT c.customer_name, COUNT(o.order_id)
fgedudb-> FROM fgedu.fgedu_customer_explain c
fgedudb-> JOIN fgedu.fgedu_order_explain o ON c.customer_id = o.customer_id
fgedudb-> GROUP BY c.customer_name;
QUERY PLAN
———————————————————————————————-
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..862.00 rows=1001 width=20)
-> HashAggregate (cost=0.00..862.00 rows=334 width=20)
Group By: c.customer_name
-> Hash Join (cost=0.00..862.00 rows=100000 width=20)
Hash Cond: o.customer_id = c.customer_id
-> Redistribute Motion 3:3 (slice1; segments: 3)
Hash Key: o.customer_id
-> Seq Scan on fgedu_order_explain o (cost=0.00..431.00 rows=33334 width=8)
-> Hash (cost=431.00..431.00 rows=334 width=16)
-> Broadcast Motion 3:3 (slice1)
-> Seq Scan on fgedu_customer_explain c (cost=0.00..431.00 rows=334 width=16)
Optimizer: Pivotal Optimizer (GPORCA)
(12 rows)

# 优化：使用相同分布键，避免数据重分布
fgedudb=> CREATE TABLE fgedu.fgedu_order_optimized (
fgedudb(> order_id BIGSERIAL,
fgedudb(> customer_id INT,
fgedudb(> order_date DATE,
fgedudb(> amount NUMERIC(18,2)
fgedudb(> ) DISTRIBUTED BY (customer_id);
CREATE TABLE

from GreenPlum视频:www.itpux.com

Part04-生产案例与实战讲解

4.1 GreenPlum执行计划优化案例

4.1.1 慢查询优化案例

# 场景：优化慢查询

# 原始查询
fgedudb=> EXPLAIN ANALYZE
fgedudb-> SELECT * FROM fgedu.fgedu_order_explain
fgedudb-> WHERE DATE_TRUNC(‘month’, order_date) = ‘2024-06-01’::DATE;
QUERY PLAN
————————————————————————————————-
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=20)
Rows out: 16667 rows at destination with 5.678 ms to first row, 8.901 ms to end.
-> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=1 width=20)
Filter: (date_trunc(‘month’::text, order_date) = ‘2024-06-01’::date)
Rows out: Avg 5555.7 rows x 3 workers. Max 5556 rows (seg0).
Slice statistics:
(slice0) Executor memory: 123K bytes.
(slice1) Executor memory: 456K bytes avg x 3 workers, 567K bytes max (seg0).
Total runtime: 9.123 ms
(8 rows)

# 优化：避免在字段上使用函数
fgedudb=> EXPLAIN ANALYZE
fgedudb-> SELECT * FROM fgedu.fgedu_order_explain
fgedudb-> WHERE order_date >= ‘2024-06-01’::DATE
fgedudb-> AND order_date < '2024-07-01'::DATE; QUERY PLAN ------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=5556 width=20) Rows out: 16667 rows at destination with 2.345 ms to first row, 3.456 ms to end. -> Seq Scan on fgedu_order_explain (cost=0.00..431.00 rows=5556 width=20)
Filter: ((order_date >= ‘2024-06-01’::date) AND (order_date < '2024-07-01'::date)) Rows out: Avg 5555.7 rows x 3 workers. Max 5556 rows (seg0). Slice statistics: (slice0) Executor memory: 123K bytes. (slice1) Executor memory: 456K bytes avg x 3 workers, 567K bytes max (seg0). Total runtime: 4.567 ms (8 rows) # 性能提升：9.123ms -> 4.567ms

Part05-风哥经验总结与分享

5.1 GreenPlum执行计划解读技巧

执行计划解读技巧：

1. 从内向外阅读
– 最内层是扫描操作
– 向外是Join、聚合等
– 最外层是结果返回

2. 关注关键指标
– cost：预估成本
– rows：预估行数
– actual time：实际执行时间
– actual rows：实际行数

3. 识别性能瓶颈
– Seq Scan：考虑添加索引
– Motion：考虑优化分布键
– Sort：考虑添加索引或优化排序
– HashAggregate：考虑优化分组

4. Motion类型分析
– Gather Motion：收集结果，正常
– Redistribute Motion：数据重分布，需关注
– Broadcast Motion：广播数据，需关注

5. 优化建议
– 减少数据扫描量
– 避免不必要的数据重分布
– 使用合适的Join类型
– 优化统计信息

本文档介绍了GreenPlum执行计划调优的核心内容，包括执行计划操作符、执行计划分析、执行计划优化等，希望对大家有所帮助。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html