1. 首页 > PostgreSQL教程 > 正文

PostgreSQL教程FG032-PG聚合查询:GROUP BY/HAVING与常用聚合函数

目录大纲

Part01-基础概念与理论知识

1.1 聚合查询的概念与作用

聚合查询是一种将多行数据汇总为单个结果的查询方式。它通过聚合函数对数据进行计算,常见的聚合操作包括求和、计数、平均值、最大值、最小值等。

聚合查询的主要作用:

  • 数据汇总:将大量数据汇总为有意义的统计信息
  • 数据分析:对数据进行分析,提取关键信息
  • 报表生成:生成各种统计报表
  • 性能优化:减少数据传输量,提高查询效率

学习交流加群风哥微信: itpux-com

1.2 常用聚合函数

PostgreSQL提供了丰富的聚合函数,常用的包括:

  • COUNT():计数,返回结果集中的行数
  • SUM():求和,返回数值列的总和
  • AVG():平均值,返回数值列的平均值
  • MAX():最大值,返回列中的最大值
  • MIN():最小值,返回列中的最小值
  • STRING_AGG():字符串聚合,将多个字符串连接为一个字符串
  • ARRAY_AGG():数组聚合,将多个值聚合为一个数组
  • JSON_AGG():JSON聚合,将多个值聚合为一个JSON数组

更多视频教程www.fgedu.net.cn

Part02-生产环境规划与建议

2.1 GROUP BY的使用场景

GROUP BY用于将数据按照指定的列进行分组,然后对每组数据应用聚合函数。适合以下场景:

  • 按类别统计:如按部门统计员工数量、按产品类别统计销售额
  • 按时间统计:如按天、周、月统计销售数据
  • 按地理位置统计:如按地区统计用户数量
  • 多维度统计:如同时按部门和职位统计员工薪资

使用建议:

  • 合理选择分组列:选择具有实际业务意义的列进行分组
  • 注意分组列的数据类型:确保分组列的数据类型适合分组操作
  • 考虑索引:为分组列创建索引可以提高查询性能
  • 避免过度分组:过度分组会导致结果集过大,影响性能

风哥提示:在使用GROUP BY时,SELECT列表中的列要么是分组列,要么是聚合函数,否则会导致错误。

2.2 HAVING的使用场景

HAVING用于过滤聚合后的结果,类似于WHERE子句,但HAVING是在聚合后进行过滤,而WHERE是在聚合前进行过滤。适合以下场景:

  • 过滤聚合结果:如只显示销售额大于10000的产品类别
  • 复杂条件过滤:如显示平均薪资大于8000的部门
  • 多条件过滤:如显示员工数量大于5且平均薪资大于8000的部门

使用建议:

  • 合理使用HAVING:只在需要过滤聚合结果时使用HAVING
  • 结合WHERE使用:先用WHERE过滤掉不需要的行,再用HAVING过滤聚合结果
  • 注意性能:HAVING会增加查询的计算量,应尽量减少HAVING中的复杂条件

更多学习教程公众号风哥教程itpux_com

Part03-生产环境项目实施方案

3.1 GROUP BY的使用

3.1.1 基本GROUP BY

— 创建测试表
CREATE TABLE fgedu_sales (
id SERIAL PRIMARY KEY,
product_id INTEGER,
product_name VARCHAR(100),
category VARCHAR(50),
sale_date DATE,
quantity INTEGER,
unit_price DECIMAL(10, 2),
total_amount DECIMAL(10, 2)
);

— 插入测试数据
INSERT INTO fgedu_sales (product_id, product_name, category, sale_date, quantity, unit_price, total_amount) VALUES
(1, ‘手机’, ‘电子产品’, ‘2026-04-01’, 10, 5999.00, 59990.00),
(2, ‘电脑’, ‘电子产品’, ‘2026-04-01’, 5, 9999.00, 49995.00),
(3, ‘平板’, ‘电子产品’, ‘2026-04-02’, 8, 3999.00, 31992.00),
(4, ‘耳机’, ‘电子产品’, ‘2026-04-02’, 15, 999.00, 14985.00),
(5, ‘键盘’, ‘电脑配件’, ‘2026-04-03’, 12, 1999.00, 23988.00),
(6, ‘鼠标’, ‘电脑配件’, ‘2026-04-03’, 20, 599.00, 11980.00),
(7, ‘手机’, ‘电子产品’, ‘2026-04-04’, 12, 5999.00, 71988.00),
(8, ‘电脑’, ‘电子产品’, ‘2026-04-04’, 6, 9999.00, 59994.00);

— 按类别分组统计
SELECT
category,
COUNT(*) AS product_count,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales,
AVG(unit_price) AS avg_unit_price
FROM fgedu_sales
GROUP BY category
ORDER BY total_sales DESC;

— 按日期分组统计
SELECT
sale_date,
COUNT(*) AS order_count,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales
FROM fgedu_sales
GROUP BY sale_date
ORDER BY sale_date;

— 按产品和类别分组统计
SELECT
category,
product_name,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales
FROM fgedu_sales
GROUP BY category, product_name
ORDER BY category, total_sales DESC;

from PostgreSQL视频:www.itpux.com

3.1.2 多列GROUP BY

— 创建员工表
CREATE TABLE fgedu_employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department VARCHAR(100),
position VARCHAR(100),
salary DECIMAL(10, 2),
hire_date DATE
);

— 插入测试数据
INSERT INTO fgedu_employees (name, department, position, salary, hire_date) VALUES
(‘张三’, ‘技术部’, ‘开发工程师’, 10000.00, ‘2024-01-01’),
(‘李四’, ‘技术部’, ‘开发工程师’, 12000.00, ‘2024-02-01’),
(‘王五’, ‘技术部’, ‘测试工程师’, 8000.00, ‘2024-03-01’),
(‘赵六’, ‘市场部’, ‘销售经理’, 15000.00, ‘2024-04-01’),
(‘钱七’, ‘市场部’, ‘销售人员’, 9000.00, ‘2024-05-01’),
(‘孙八’, ‘市场部’, ‘销售人员’, 8500.00, ‘2024-06-01’),
(‘周九’, ‘财务部’, ‘会计’, 9500.00, ‘2024-07-01’),
(‘吴十’, ‘财务部’, ‘出纳’, 8000.00, ‘2024-08-01’);

— 按部门和职位分组统计
SELECT
department,
position,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary
FROM fgedu_employees
GROUP BY department, position
ORDER BY department, position;

3.2 HAVING的使用

3.2.1 基本HAVING

— 使用HAVING过滤聚合结果
SELECT
category,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales
FROM fgedu_sales
GROUP BY category
HAVING SUM(total_amount) > 50000
ORDER BY total_sales DESC;

— 使用HAVING过滤多个条件
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM fgedu_employees
GROUP BY department
HAVING COUNT(*) > 2 AND AVG(salary) > 9000
ORDER BY avg_salary DESC;

— 结合WHERE和HAVING
SELECT
department,
position,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM fgedu_employees
WHERE hire_date >= ‘2024-03-01’
GROUP BY department, position
HAVING AVG(salary) > 8500
ORDER BY department, position;

学习交流加群风哥QQ113257174

3.3 常用聚合函数的使用

3.3.1 COUNT函数

— COUNT(*):统计所有行
SELECT COUNT(*) AS total_employees FROM fgedu_employees;

— COUNT(column):统计非NULL值的行数
SELECT COUNT(salary) AS employees_with_salary FROM fgedu_employees;

— COUNT(DISTINCT column):统计不同值的数量
SELECT COUNT(DISTINCT department) AS distinct_departments FROM fgedu_employees;

— 按部门统计员工数量
SELECT
department,
COUNT(*) AS employee_count
FROM fgedu_employees
GROUP BY department
ORDER BY employee_count DESC;

3.3.2 SUM和AVG函数

— SUM函数:计算总和
SELECT
category,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales
FROM fgedu_sales
GROUP BY category
ORDER BY total_sales DESC;

— AVG函数:计算平均值
SELECT
department,
AVG(salary) AS avg_salary
FROM fgedu_employees
GROUP BY department
ORDER BY avg_salary DESC;

— 结合SUM和AVG
SELECT
department,
COUNT(*) AS employee_count,
SUM(salary) AS total_salary,
AVG(salary) AS avg_salary
FROM fgedu_employees
GROUP BY department
ORDER BY total_salary DESC;

3.3.3 MAX和MIN函数

— MAX函数:获取最大值
SELECT
department,
MAX(salary) AS max_salary
FROM fgedu_employees
GROUP BY department
ORDER BY max_salary DESC;

— MIN函数:获取最小值
SELECT
department,
MIN(salary) AS min_salary
FROM fgedu_employees
GROUP BY department
ORDER BY min_salary ASC;

— 结合MAX和MIN
SELECT
department,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary,
MAX(salary) – MIN(salary) AS salary_range
FROM fgedu_employees
GROUP BY department
ORDER BY salary_range DESC;

3.3.4 字符串和数组聚合函数

— STRING_AGG:字符串聚合
SELECT
department,
STRING_AGG(name, ‘, ‘) AS employee_names
FROM fgedu_employees
GROUP BY department;

— STRING_AGG带排序
SELECT
department,
STRING_AGG(name, ‘, ‘ ORDER BY name) AS employee_names
FROM fgedu_employees
GROUP BY department;

— ARRAY_AGG:数组聚合
SELECT
department,
ARRAY_AGG(name) AS employee_names
FROM fgedu_employees
GROUP BY department;

— JSON_AGG:JSON聚合
SELECT
department,
JSON_AGG(name) AS employee_names
FROM fgedu_employees
GROUP BY department;

Part04-生产案例与实战讲解

4.1 销售数据分析

4.1.1 销售趋势分析

— 按日期统计销售数据
SELECT
sale_date,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales,
AVG(total_amount) AS avg_order_amount
FROM fgedu_sales
GROUP BY sale_date
ORDER BY sale_date;

— 按类别统计销售数据
SELECT
category,
COUNT(*) AS order_count,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales,
ROUND((SUM(total_amount) / (SELECT SUM(total_amount) FROM fgedu_sales)) * 100, 2) AS percentage
FROM fgedu_sales
GROUP BY category
ORDER BY total_sales DESC;

— 按产品统计销售数据
SELECT
product_name,
category,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales,
AVG(unit_price) AS avg_unit_price
FROM fgedu_sales
GROUP BY product_name, category
ORDER BY total_sales DESC
LIMIT 5;

4.1.2 销售业绩分析

— 创建销售业绩表
CREATE TABLE fgedu_sales_performance (
id SERIAL PRIMARY KEY,
salesperson_id INTEGER,
salesperson_name VARCHAR(100),
sale_date DATE,
product_id INTEGER,
product_name VARCHAR(100),
quantity INTEGER,
unit_price DECIMAL(10, 2),
total_amount DECIMAL(10, 2)
);

— 插入测试数据
INSERT INTO fgedu_sales_performance (salesperson_id, salesperson_name, sale_date, product_id, product_name, quantity, unit_price, total_amount) VALUES
(1, ‘张三’, ‘2026-04-01’, 1, ‘手机’, 5, 5999.00, 29995.00),
(1, ‘张三’, ‘2026-04-02’, 3, ‘平板’, 4, 3999.00, 15996.00),
(1, ‘张三’, ‘2026-04-03’, 5, ‘键盘’, 6, 1999.00, 11994.00),
(2, ‘李四’, ‘2026-04-01’, 2, ‘电脑’, 3, 9999.00, 29997.00),
(2, ‘李四’, ‘2026-04-02’, 4, ‘耳机’, 8, 999.00, 7992.00),
(2, ‘李四’, ‘2026-04-03’, 6, ‘鼠标’, 10, 599.00, 5990.00),
(3, ‘王五’, ‘2026-04-01’, 1, ‘手机’, 6, 5999.00, 35994.00),
(3, ‘王五’, ‘2026-04-02’, 2, ‘电脑’, 4, 9999.00, 39996.00),
(3, ‘王五’, ‘2026-04-03’, 3, ‘平板’, 5, 3999.00, 19995.00);

— 销售业绩排名
SELECT
salesperson_name,
COUNT(*) AS order_count,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_sales,
RANK() OVER (ORDER BY SUM(total_amount) DESC) AS rank
FROM fgedu_sales_performance
GROUP BY salesperson_name
ORDER BY total_sales DESC;

— 按日期和销售人员统计
SELECT
sale_date,
salesperson_name,
SUM(total_amount) AS daily_sales
FROM fgedu_sales_performance
GROUP BY sale_date, salesperson_name
ORDER BY sale_date, daily_sales DESC;

风哥提示:在进行销售数据分析时,结合使用GROUP BY和聚合函数可以快速获取有价值的统计信息。

4.2 员工数据统计

4.2.1 部门人员统计

— 部门人员统计
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary,
SUM(salary) AS total_salary
FROM fgedu_employees
GROUP BY department
ORDER BY employee_count DESC;

— 职位人员统计
SELECT
position,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM fgedu_employees
GROUP BY position
ORDER BY avg_salary DESC;

— 部门和职位交叉统计
SELECT
department,
position,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM fgedu_employees
GROUP BY department, position
ORDER BY department, avg_salary DESC;

4.2.2 入职时间统计

— 按月份统计入职人数
SELECT
TO_CHAR(hire_date, ‘YYYY-MM’) AS hire_month,
COUNT(*) AS hire_count
FROM fgedu_employees
GROUP BY hire_month
ORDER BY hire_month;

— 按季度统计入职人数
SELECT
EXTRACT(QUARTER FROM hire_date) AS quarter,
COUNT(*) AS hire_count
FROM fgedu_employees
GROUP BY quarter
ORDER BY quarter;

— 按年份统计入职人数
SELECT
EXTRACT(YEAR FROM hire_date) AS hire_year,
COUNT(*) AS hire_count
FROM fgedu_employees
GROUP BY hire_year
ORDER BY hire_year;

更多视频教程www.fgedu.net.cn

Part05-风哥经验总结与分享

5.1 聚合查询的性能优化

5.1.1 索引优化

  • 为GROUP BY列创建索引:提高分组操作的性能
  • 为过滤条件列创建索引:加速WHERE子句的执行
  • 考虑使用复合索引:对于多列GROUP BY,复合索引可能更有效
  • 定期更新统计信息:帮助查询优化器做出更好的决策

5.1.2 查询优化

  • 减少分组列:只选择必要的分组列
  • 合理使用WHERE子句:在聚合前过滤掉不需要的行
  • 避免在GROUP BY中使用函数:这会导致索引失效
  • 使用适当的聚合函数:根据实际需求选择合适的聚合函数
  • 考虑使用物化视图:对于频繁的聚合查询,物化视图可以提高性能

5.1.3 数据结构优化

  • 合理设计表结构:减少冗余数据
  • 使用适当的分区表:对于大型表,分区表可以提高聚合查询性能
  • 考虑数据仓库设计:对于复杂的分析查询,数据仓库设计可能更合适

学习交流加群风哥微信: itpux-com

5.2 常见问题与解决方案

5.2.1 聚合查询常见问题

问题 解决方案
GROUP BY性能慢 为分组列创建索引,减少分组列的数量
聚合函数计算错误 检查数据类型,确保聚合函数适用于该数据类型
结果集过大 添加适当的HAVING条件,限制结果集大小
内存不足 增加服务器内存,或优化查询以减少内存使用

5.2.2 聚合查询最佳实践

  • 使用适当的分组列:选择具有实际业务意义的列
  • 合理使用HAVING:只在需要过滤聚合结果时使用
  • 监控执行计划:使用EXPLAIN分析聚合查询的执行情况
  • 定期维护数据库:运行VACUUM和ANALYZE
  • 考虑使用并行查询:对于大型聚合查询,并行查询可以提高性能

5.2.3 实战经验总结

  • 根据实际需求选择合适的聚合函数
  • 合理使用GROUP BY和HAVING
  • 为分组列和过滤条件列创建索引
  • 优化查询结构,减少不必要的计算
  • 定期分析查询性能,优化慢查询

from PostgreSQL视频:www.itpux.com

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息