GreenPlum教程FG009-GreenPlum复杂查询与统计实战

本文档风哥主要介绍GreenPlum复杂查询与统计，包括Join类型、子查询概念、Join最佳实践、Join操作、聚合统计、销售报表统计案例等内容，风哥教程参考GreenPlum官方文档Query
Guide、Performance Tuning等内容编写，适合DBA人员在学习和测试中使用。

Part01-基础概念与理论知识

1.1 GreenPlum Join类型

Join是将两个或多个表的数据按照一定条件关联起来的操作。GreenPlum支持多种Join类型。更多视频教程www.fgedu.net.cn

1.1.1 Join类型

GreenPlum Join类型：

1. 内连接（INNER JOIN）
– 只返回匹配的行
– 最常用的Join类型

2. 左外连接（LEFT JOIN）
– 返回左表所有行
– 右表匹配的行显示，不匹配显示NULL

3. 右外连接（RIGHT JOIN）
– 返回右表所有行
– 左表匹配的行显示，不匹配显示NULL

4. 全外连接（FULL JOIN）
– 返回左右表所有行
– 不匹配的部分显示NULL

5. 交叉连接（CROSS JOIN）
– 返回笛卡尔积
– 谨慎使用，数据量大

1.1.2 Join算法

GreenPlum Join算法：

1. Nested Loop Join（嵌套循环）
– 外层表逐行扫描
– 内层表查找匹配行
– 适合小表Join大表

2. Hash Join（哈希Join）
– 构建哈希表
– 哈希探测匹配
– 适合大表Join
– GreenPlum最常用

3. Merge Join（合并Join）
– 两个表都排序
– 顺序扫描匹配
– 适合已排序的表

1.2 GreenPlum子查询概念

子查询是嵌套在其他SQL语句中的查询。GreenPlum支持多种类型的子查询。学习交流加群风哥微信: itpux-com

1.2.1 子查询类型

GreenPlum子查询类型：

1. 标量子查询
– 返回单个值
– 可用在SELECT、WHERE等子句

2. 表子查询
– 返回多行多列
– 可用在FROM子句

3. IN子查询
– 检查值是否在列表中
– 可优化为Join

4. EXISTS子查询
– 检查是否存在匹配行
– 性能通常较好

5. 关联子查询
– 引用外部查询的列
– 逐行执行，需注意性能

Part02-生产环境规划与建议

2.1 GreenPlum Join最佳实践

风哥提示：Join最佳实践：

Join的表使用相同的分布键
小表使用Replicated分布
Join条件尽量简单
避免在Join条件中使用函数
先过滤后Join

Part03-生产环境项目实施方案

3.1 GreenPlum Join操作实战

3.1.1 内连接（INNER JOIN）

# 连接数据库
$ psql -d fgedudb -U fgedu
psql (9.4.26)
Type “help” for help.

fgedudb=>

# 内连接：查询有订单的客户
fgedudb=> SELECT DISTINCT
fgedudb-> c.customer_id,
fgedudb-> c.customer_name
fgedudb-> FROM fgedu.fgedu_customer c
fgedudb-> INNER JOIN fgedu.fgedu_order o
fgedudb-> ON c.customer_id = o.customer_id
fgedudb-> ORDER BY c.customer_id
fgedudb-> LIMIT 5;
customer_id | customer_name
————-+—————
1 | 客户1
2 | 客户2
3 | 客户3
4 | 客户4
5 | 客户5
(5 rows)

学习交流加群风哥QQ113257174

3.1.2 左外连接（LEFT JOIN）

# 左外连接：查询所有客户及其订单数
fgedudb=> SELECT
fgedudb-> c.customer_id,
fgedudb-> c.customer_name,
fgedudb-> COUNT(o.order_id) AS order_count
fgedudb-> FROM fgedu.fgedu_customer c
fgedudb-> LEFT JOIN fgedu.fgedu_order o
fgedudb-> ON c.customer_id = o.customer_id
fgedudb-> GROUP BY c.customer_id, c.customer_name
fgedudb-> ORDER BY order_count DESC
fgedudb-> LIMIT 5;
customer_id | customer_name | order_count
————-+—————+————-
1234 | 客户1234 | 25
5678 | 客户5678 | 23
9012 | 客户9012 | 22
3456 | 客户3456 | 21
7890 | 客户7890 | 20
(5 rows)

3.2 GreenPlum聚合统计实战

3.2.1 基本聚合函数

# 基本聚合函数
fgedudb=> SELECT
fgedudb-> COUNT(*) AS total_orders,
fgedudb-> COUNT(DISTINCT customer_id) AS total_customers,
fgedudb-> SUM(total_amount) AS total_amount,
fgedudb-> AVG(total_amount) AS avg_amount,
fgedudb-> MAX(total_amount) AS max_amount,
fgedudb-> MIN(total_amount) AS min_amount
fgedudb-> FROM fgedu.fgedu_order;
total_orders | total_customers | total_amount | avg_amount | max_amount | min_amount
————–+—————–+————–+——————–+————+————
100000 | 10000 | 500000000.00 | 5000.000000000000 | 9999.99 | 1000.00
(1 row)

更多学习教程公众号风哥教程itpux_com

3.2.2 分组统计

# 按地区分组统计
fgedudb=> SELECT
fgedudb-> c.region,
fgedudb-> COUNT(DISTINCT c.customer_id) AS customer_count,
fgedudb-> COUNT(o.order_id) AS order_count,
fgedudb-> SUM(o.total_amount) AS total_amount
fgedudb-> FROM fgedu.fgedu_customer c
fgedudb-> LEFT JOIN fgedu.fgedu_order o
fgedudb-> ON c.customer_id = o.customer_id
fgedudb-> GROUP BY c.region
fgedudb-> ORDER BY total_amount DESC;
region | customer_count | order_count | total_amount
——–+—————-+————-+————–
华东 | 2500 | 25000 | 125000000.00
华北 | 2500 | 25000 | 125000000.00
华南 | 2500 | 25000 | 125000000.00
西南 | 2500 | 25000 | 125000000.00
(4 rows)

from GreenPlum视频:www.itpux.com

Part04-生产案例与实战讲解

4.1 GreenPlum销售报表统计案例

4.1.1 销售日报表

# 场景：生成销售日报表

# 按日期统计销售数据
fgedudb=> SELECT
fgedudb-> o.order_date AS report_date,
fgedudb-> COUNT(o.order_id) AS order_count,
fgedudb-> SUM(o.total_amount) AS total_amount,
fgedudb-> AVG(o.total_amount) AS avg_amount
fgedudb-> FROM fgedu.fgedu_order o
fgedudb-> WHERE o.order_date BETWEEN ‘2024-06-01’ AND ‘2024-06-30’
fgedudb-> GROUP BY o.order_date
fgedudb-> ORDER BY o.order_date
fgedudb-> LIMIT 5;
report_date | order_count | total_amount | avg_amount
————-+————-+————–+————
2024-06-01 | 500 | 2500000.00 | 5000.00
2024-06-02 | 500 | 2500000.00 | 5000.00
2024-06-03 | 500 | 2500000.00 | 5000.00
2024-06-04 | 500 | 2500000.00 | 5000.00
2024-06-05 | 500 | 2500000.00 | 5000.00
(5 rows)

Part05-风哥经验总结与分享

5.1 GreenPlum查询最佳实践

5.1.1 查询编写最佳实践

查询编写最佳实践：

1. SELECT子句
– 只查询需要的字段，避免SELECT *
– 使用COALESCE处理NULL值
– 合理使用别名

2. WHERE子句
– 尽量使用分区字段
– 尽量使用索引字段
– 避免在字段上使用函数
– 避免使用NOT IN
– 条件尽量精确

3. JOIN优化
– 小表在前，大表在后
– 使用相同的分布键
– 小表使用Replicated
– JOIN条件简单明确
– 先过滤后JOIN

4. 子查询优化
– 优先使用EXISTS
– 考虑改写为JOIN
– 避免关联子查询

5. 聚合优化
– 先过滤后聚合
– 合理分组
– HAVING过滤分组结果

本文档介绍了GreenPlum复杂查询与统计的核心内容，包括Join类型、子查询、聚合统计等，希望对大家有所帮助。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

Greenplum优化 Greenplum性能性能优化数据库优化数据库性能