风哥教程参考DB2官方文档Query Optimization、Performance Tuning等内容,详细介绍海量数据查询优化、分区裁剪、并行查询等。更多视频教程www.fgedu.net.cn
目录大纲
Part01-海量数据查询挑战
1.1 海量数据查询特点
海量数据查询面临的挑战:
- 查询响应时间长
- 资源消耗大
- 并发性能差
- 维护成本高
1.2 优化策略
- 数据分区
- 索引优化
- 并行查询
- 查询重写
- 物化查询表
Part02-分区优化
2.1 分区表设计
CREATE TABLE BIG_TABLE (
ID BIGINT NOT NULL,
DATA_DATE DATE NOT NULL,
DATA_VALUE DECIMAL(20, 6),
CREATE_TIME TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP,
CONSTRAINT PK_BIG_TABLE PRIMARY KEY (ID, DATA_DATE)
) PARTITION BY RANGE (DATA_DATE)
(
PARTITION P202601 STARTING ‘2026-01-01’ ENDING ‘2026-01-31’,
PARTITION P202602 STARTING ‘2026-02-01’ ENDING ‘2026-02-28’,
PARTITION P202603 STARTING ‘2026-03-01’ ENDING ‘2026-03-31’,
PARTITION P202604 STARTING ‘2026-04-01’ ENDING ‘2026-04-30’
);
# 分区裁剪查询
SELECT * FROM BIG_TABLE
WHERE DATA_DATE BETWEEN ‘2026-04-01’ AND ‘2026-04-30’;
# 查看分区信息
SELECT
TABNAME,
DATAPARTITIONNAME,
LOWVALUE,
HIGHVALUE
FROM SYSCAT.DATAPARTITIONS
WHERE TABNAME = ‘BIG_TABLE’;
# 添加新分区
ALTER TABLE BIG_TABLE
ADD PARTITION P202605 STARTING ‘2026-05-01’ ENDING ‘2026-05-31’;
# 分离旧分区
ALTER TABLE BIG_TABLE
DETACH PARTITION P202601 INTO TABLE BIG_TABLE_OLD;
# 删除旧分区数据
DROP TABLE BIG_TABLE_OLD;
2.2 MDC表优化
CREATE TABLE MDC_SALES (
SALE_ID BIGINT NOT NULL,
SALE_DATE DATE NOT NULL,
REGION VARCHAR(50) NOT NULL,
PRODUCT_ID VARCHAR(50) NOT NULL,
QUANTITY INTEGER,
AMOUNT DECIMAL(18, 2),
CONSTRAINT PK_MDC_SALES PRIMARY KEY (SALE_ID)
) ORGANIZE BY (
SALE_DATE,
REGION,
PRODUCT_ID
);
# MDC查询
SELECT * FROM MDC_SALES
WHERE SALE_DATE = ‘2026-04-08’
AND REGION = ‘NORTH’;
# 查看MDC块信息
SELECT
TABNAME,
BLOCK_SIZE,
BLOCK_SIZE_UNITS
FROM SYSCAT.TABLES
WHERE TABNAME = ‘MDC_SALES’;
Part03-索引优化
3.1 索引设计
CREATE INDEX IDX_BIG_TABLE_DATE_VALUE ON BIG_TABLE(DATA_DATE, DATA_VALUE);
# 创建包含列索引
CREATE INDEX IDX_BIG_TABLE_COVER ON BIG_TABLE(DATA_DATE)
INCLUDE (DATA_VALUE, CREATE_TIME);
# 创建分区索引
CREATE INDEX IDX_BIG_TABLE_PART ON BIG_TABLE(DATA_DATE, ID)
PARTITIONED;
# 查看索引使用情况
SELECT
SUBSTR(INDNAME, 1, 30) AS INDNAME,
TABNAME,
COLNAMES,
NUMRIDS,
NLEAF,
NLEVELS,
CLUSTERRATIO
FROM SYSCAT.INDEXES
WHERE TABNAME = ‘BIG_TABLE’;
# 查看索引统计信息
SELECT
INDNAME,
TABNAME,
CARD,
FULLKEYCARD,
CLUSTERFACTOR
FROM SYSCAT.INDEXES
WHERE TABNAME = ‘BIG_TABLE’;
3.2 索引优化技巧
SELECT * FROM BIG_TABLE
WHERE DATA_DATE = ‘2026-04-08’
OPTIMIZE FOR 100 ROWS;
# 使用FETCH FIRST限制结果集
SELECT * FROM BIG_TABLE
WHERE DATA_DATE = ‘2026-04-08’
FETCH FIRST 100 ROWS ONLY;
# 避免全表扫描
# 确保WHERE条件使用索引列
# 使用索引列排序
SELECT * FROM BIG_TABLE
WHERE DATA_DATE = ‘2026-04-08’
ORDER BY DATA_VALUE;
# 使用索引列分组
SELECT DATA_DATE, COUNT(*)
FROM BIG_TABLE
WHERE DATA_DATE >= ‘2026-04-01’
GROUP BY DATA_DATE;
Part04-并行查询
4.1 并行查询配置
UPDATE DATABASE CONFIGURATION USING DFT_DEGREE 4;
# 设置并行度
SET CURRENT DEGREE = 4;
# 查询时指定并行度
SELECT * FROM BIG_TABLE
WHERE DATA_DATE >= ‘2026-04-01’
DEGREE 4;
# 查看并行执行计划
EXPLAIN PLAN FOR
SELECT * FROM BIG_TABLE
WHERE DATA_DATE >= ‘2026-04-01’;
SELECT * FROM EXPLAIN_OPERATOR
WHERE OPERATOR_TYPE = ‘BTQ’;
# 配置并行参数
UPDATE DATABASE CONFIGURATION USING INTRA_PARALLEL YES;
UPDATE DATABASE CONFIGURATION USING MAX_QUERYDEGREE 8;
# 监控并行查询
SELECT
AGENT_ID,
DEGREE_OF_PARALLELISM,
TOTAL_CPU_TIME
FROM SYSIBMADM.APPLICATIONS
WHERE DEGREE_OF_PARALLELISM > 1;
4.2 查询重写优化
CREATE TABLE MQT_BIG_TABLE_SUMMARY AS (
SELECT
DATA_DATE,
COUNT(*) AS RECORD_COUNT,
SUM(DATA_VALUE) AS TOTAL_VALUE,
AVG(DATA_VALUE) AS AVG_VALUE
FROM BIG_TABLE
GROUP BY DATA_DATE
) DATA INITIALLY DEFERRED REFRESH DEFERRED;
REFRESH TABLE MQT_BIG_TABLE_SUMMARY;
# 使用CTE优化
WITH DAILY_SUMMARY AS (
SELECT
DATA_DATE,
COUNT(*) AS RECORD_COUNT,
SUM(DATA_VALUE) AS TOTAL_VALUE
FROM BIG_TABLE
WHERE DATA_DATE >= ‘2026-04-01’
GROUP BY DATA_DATE
)
SELECT * FROM DAILY_SUMMARY
WHERE RECORD_COUNT > 1000;
# 使用临时表优化
DECLARE GLOBAL TEMPORARY TABLE SESSION.TEMP_DATA AS (
SELECT * FROM BIG_TABLE
WHERE DATA_DATE = ‘2026-04-08’
) DEFINITION ONLY
ON COMMIT PRESERVE ROWS
NOT LOGGED;
INSERT INTO SESSION.TEMP_DATA
SELECT * FROM BIG_TABLE
WHERE DATA_DATE = ‘2026-04-08’;
SELECT * FROM SESSION.TEMP_DATA;
Part05-风哥经验总结与分享
5.1 海量数据查询优化要点
- 合理使用分区表
- 创建合适的索引
- 使用并行查询
- 使用MQT预计算
- 优化SQL语句
- 定期维护统计信息
5.2 性能优化建议
| 场景 | 优化方案 |
|---|---|
| 大表查询 | 分区裁剪,索引优化 |
| 聚合查询 | MQT预计算,并行查询 |
| 复杂查询 | 查询重写,临时表 |
| 历史数据 | 分区归档,定期清理 |
5.3 运维要点
- 定期RUNSTATS更新统计信息
- 定期REORG重组表
- 监控慢查询
- 定期归档历史数据
- 监控表空间使用
- 建立查询性能基线
学习交流加群风哥微信: itpux-com
风哥Oracle/MySQL/PostgreSQL/Greenplum/DB2/Redis等数据库培训课程,10年一线实战经验,企业级培训,真正掌握数据库核心技术!
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
