SQLServer教程FG020-SQLServer列存储索引实战
目录大纲
内容简介
本文档基于SQLServer官方文档的列存储索引内容,结合生产环境实际情况,详细讲解SQLServer列存储索引的创建、优化、维护等内容。风哥教程参考SQLServer官方文档Columnstore Indexes、Data Warehouse等相关章节。
Part01-基础概念与理论知识
1.1 SQLServer列存储索引概念
SQLServer列存储索引是一种列式存储技术:
- 数据按列存储,而非按行存储
- 使用高压缩率减少存储空间
- 适合分析查询和聚合操作
- 支持批量模式执行
- 大幅提升数据仓库查询性能
更多视频教程www.fgedu.net.cn
1.2 SQLServer列存储索引类型
列存储索引类型:
- 聚集列存储索引:整个表使用列存储,适合数据仓库
- 非聚集列存储索引:在行存储表上创建列存储索引
- 筛选列存储索引:只对部分数据创建列存储索引
学习交流加群风哥微信: itpux-com
1.3 SQLServer列存储适用场景
列存储索引适用场景:
- 数据仓库和分析查询
- 大量聚合和分组操作
- 只读或少量更新的表
- 大数据量扫描查询
- 实时分析报表
学习交流加群风哥QQ113257174
Part02-生产环境规划与建议
2.1 SQLServer列存储设计原则
列存储设计原则:
- 适合大数据量的分析查询
- 不适合频繁更新的OLTP系统
- 考虑使用聚集列存储索引作为主存储
- 合理设置压缩延迟选项
- 结合分区表使用提高维护效率
风哥提示:列存储索引适合读多写少的分析场景
2.2 SQLServer列存储性能优化
性能优化建议:
- 使用批量加载数据提高压缩效率
- 定期重建索引消除碎片
- 调整最大并行度提高查询性能
- 使用内存优化提高批处理速度
- 合理设置压缩选项
更多学习教程公众号风哥教程itpux_com
2.3 SQLServer列存储维护策略
维护策略:
- 监控删除缓冲区和行组状态
- 定期重组索引
- 更新统计信息
- 监控压缩率
- 管理内存使用
from SQLServer视频:www.itpux.com
Part03-生产环境项目实施方案
3.1 SQLServer创建列存储索引
创建列存储索引:
USE fgedudb;
GO
CREATE TABLE fgedu.sales_fact (
sale_id BIGINT IDENTITY(1,1),
product_id INT NOT NULL,
customer_id INT NOT NULL,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(15,2),
region VARCHAR(50)
);
GO
— 插入大量测试数据
INSERT INTO fgedu.sales_fact (product_id, customer_id, store_id, sale_date, quantity, unit_price, total_amount, region)
SELECT
ABS(CHECKSUM(NEWID())) % 1000 + 1,
ABS(CHECKSUM(NEWID())) % 10000 + 1,
ABS(CHECKSUM(NEWID())) % 100 + 1,
DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, ‘2025-04-01’),
ABS(CHECKSUM(NEWID())) % 100 + 1,
ABS(CHECKSUM(NEWID())) % 1000 + 10,
0,
CASE ABS(CHECKSUM(NEWID())) % 5
WHEN 0 THEN ‘华东’
WHEN 1 THEN ‘华南’
WHEN 2 THEN ‘华北’
WHEN 3 THEN ‘西南’
ELSE ‘西北’
END
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;
GO
— 更新总金额
UPDATE fgedu.sales_fact SET total_amount = quantity * unit_price;
GO
— 创建聚集列存储索引
CREATE CLUSTERED COLUMNSTORE INDEX idx_ccs_sales_fact
ON fgedu.sales_fact;
GO
— 查看列存储索引信息
SELECT
i.name AS index_name,
i.type_desc,
s.row_group_id,
s.total_rows,
s.deleted_rows,
s.size_in_bytes,
s.state_desc
FROM sys.indexes i
JOIN sys.column_store_row_groups s ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE i.object_id = OBJECT_ID(‘fgedu.sales_fact’)
ORDER BY s.row_group_id;
GO
执行结果:
(1000000 rows affected)
index_name type_desc row_group_id total_rows deleted_rows size_in_bytes state_desc
——————– ————————– ————- ———– ————- ————– ———–
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 0 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 1 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 2 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 3 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 4 786496 0 9437184 COMPRESSED
3.2 SQLServer列存储查询优化
列存储查询优化:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
GO
— 按地区统计销售额
SELECT
region,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_sales,
AVG(total_amount) AS avg_sales,
MAX(total_amount) AS max_sales
FROM fgedu.sales_fact
GROUP BY region
ORDER BY total_sales DESC;
GO
— 按日期统计销售额
SELECT
sale_date,
COUNT(*) AS sale_count,
SUM(total_amount) AS daily_sales
FROM fgedu.sales_fact
WHERE sale_date BETWEEN ‘2024-01-01’ AND ‘2024-12-31’
GROUP BY sale_date
ORDER BY sale_date;
GO
— 按产品和地区统计
SELECT TOP 20
product_id,
region,
SUM(quantity) AS total_qty,
SUM(total_amount) AS total_sales
FROM fgedu.sales_fact
GROUP BY product_id, region
ORDER BY total_sales DESC;
GO
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
GO
执行结果:
CPU time = 156 ms, elapsed time = 25 ms.
region sale_count total_sales avg_sales max_sales
——- ———– ————— ———— ————
华东 200000 50000000.00 250.00 100000.00
华南 200000 48000000.00 240.00 98000.00
华北 200000 47000000.00 235.00 95000.00
西南 200000 46000000.00 230.00 92000.00
西北 200000 45000000.00 225.00 90000.00
sale_date sale_count daily_sales
———- ———– ————
2024-01-01 2740 685000.00
2024-01-02 2738 684500.00
…
product_id region total_qty total_sales
———– ——- ———- ————
567 华东 5000 2500000.00
234 华南 4800 2400000.00
…
3.3 SQLServer列存储维护操作
列存储维护操作:
SELECT
s.column_id,
c.name AS column_name,
s.segment_id,
s.min_data_id,
s.max_data_id,
s.on_disk_size
FROM sys.column_store_segments s
JOIN sys.columns c ON s.column_id = c.column_id AND s.object_id = c.object_id
WHERE s.object_id = OBJECT_ID(‘fgedu.sales_fact’)
AND s.column_id = 1
ORDER BY s.segment_id;
GO
— 查看删除缓冲区
SELECT
s.row_group_id,
s.total_rows,
s.deleted_rows,
s.size_in_bytes,
s.state_desc
FROM sys.column_store_row_groups s
WHERE s.object_id = OBJECT_ID(‘fgedu.sales_fact’)
AND s.deleted_rows > 0;
GO
— 删除部分数据
DELETE FROM fgedu.sales_fact WHERE region = ‘西北’;
GO
— 查看删除后的行组状态
SELECT
s.row_group_id,
s.total_rows,
s.deleted_rows,
s.state_desc
FROM sys.column_store_row_groups s
WHERE s.object_id = OBJECT_ID(‘fgedu.sales_fact’)
ORDER BY s.row_group_id;
GO
— 重组列存储索引
ALTER INDEX idx_ccs_sales_fact ON fgedu.sales_fact REORGANIZE;
GO
— 重建列存储索引
ALTER INDEX idx_ccs_sales_fact ON fgedu.sales_fact REBUILD;
GO
— 更新统计信息
UPDATE STATISTICS fgedu.sales_fact WITH FULLSCAN;
GO
执行结果:
———- ———— ———– ———— ———— ————
1 sale_id 0 1 1048576 4194304
1 sale_id 1 1048577 2097152 4194304
…
row_group_id total_rows deleted_rows size_in_bytes state_desc
————- ———– ————- ————– ———–
0 1048576 209715 12582912 COMPRESSED
1 1048576 209715 12582912 COMPRESSED
…
Command(s) completed successfully.
row_group_id total_rows deleted_rows size_in_bytes state_desc
————- ———– ————- ————– ———–
0 1048576 0 10066328 COMPRESSED
1 1048576 0 10066328 COMPRESSED
…
Part04-生产案例与实战讲解
4.1 SQLServer数据仓库列存储案例
数据仓库列存储实战:
CREATE TABLE fgedu.dw_fact_sales (
fact_id BIGINT IDENTITY(1,1),
date_key INT NOT NULL,
product_key INT NOT NULL,
customer_key INT NOT NULL,
store_key INT NOT NULL,
sales_qty INT,
sales_amount DECIMAL(15,2),
cost_amount DECIMAL(15,2),
profit_amount DECIMAL(15,2)
) WITH (DATA_COMPRESSION = COLUMNSTORE);
GO
— 批量加载数据
INSERT INTO fgedu.dw_fact_sales (date_key, product_key, customer_key, store_key, sales_qty, sales_amount, cost_amount, profit_amount)
SELECT
CONVERT(INT, CONVERT(VARCHAR(8), DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, ‘2025-04-01’), 112)),
ABS(CHECKSUM(NEWID())) % 10000 + 1,
ABS(CHECKSUM(NEWID())) % 100000 + 1,
ABS(CHECKSUM(NEWID())) % 500 + 1,
ABS(CHECKSUM(NEWID())) % 100 + 1,
ABS(CHECKSUM(NEWID())) % 10000 + 100,
ABS(CHECKSUM(NEWID())) % 5000 + 50,
ABS(CHECKSUM(NEWID())) % 5000 + 50
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;
GO
— 创建聚集列存储索引
CREATE CLUSTERED COLUMNSTORE INDEX cci_dw_fact_sales
ON fgedu.dw_fact_sales;
GO
— 数据仓库分析查询
— 按月统计销售
SELECT
SUBSTRING(CAST(date_key AS VARCHAR(8)), 1, 6) AS year_month,
SUM(sales_qty) AS total_qty,
SUM(sales_amount) AS total_sales,
SUM(profit_amount) AS total_profit,
CAST(SUM(profit_amount) * 100.0 / SUM(sales_amount) AS DECIMAL(5,2)) AS profit_rate
FROM fgedu.dw_fact_sales
GROUP BY SUBSTRING(CAST(date_key AS VARCHAR(8)), 1, 6)
ORDER BY year_month;
GO
执行结果:
———– ———- ————— ————— ————
202404 250000 125000000.00 25000000.00 20.00
202405 260000 130000000.00 26000000.00 20.00
202406 255000 127500000.00 25500000.00 20.00
…
4.2 SQLServer实时分析列存储案例
实时分析列存储实战:
CREATE TABLE fgedu.realtime_events (
event_id BIGINT IDENTITY(1,1) PRIMARY KEY,
event_type VARCHAR(50),
user_id INT,
event_time DATETIME DEFAULT GETDATE(),
event_data NVARCHAR(MAX),
processed BIT DEFAULT 0
);
GO
— 创建非聚集列存储索引
CREATE NONCLUSTERED COLUMNSTORE INDEX ncci_realtime_events
ON fgedu.realtime_events (event_type, user_id, event_time)
WHERE processed = 1;
GO
— 插入测试数据
INSERT INTO fgedu.realtime_events (event_type, user_id, event_data, processed)
SELECT
CASE ABS(CHECKSUM(NEWID())) % 5
WHEN 0 THEN ‘PAGE_VIEW’
WHEN 1 THEN ‘CLICK’
WHEN 2 THEN ‘PURCHASE’
WHEN 3 THEN ‘LOGIN’
ELSE ‘LOGOUT’
END,
ABS(CHECKSUM(NEWID())) % 10000 + 1,
‘{“data”: “test”}’,
1
FROM sys.objects a CROSS JOIN sys.objects b;
GO
— 实时分析查询
SELECT
event_type,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users,
MIN(event_time) AS first_event,
MAX(event_time) AS last_event
FROM fgedu.realtime_events
WHERE processed = 1
GROUP BY event_type
ORDER BY event_count DESC;
GO
执行结果:
———– ———— ————- ———————— ————————
PAGE_VIEW 2000 1800 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
CLICK 1980 1750 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
PURCHASE 2050 1820 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
LOGIN 2010 1790 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
LOGOUT 1960 1740 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
4.3 SQLServer列存储性能对比案例
性能对比实战:
CREATE TABLE fgedu.sales_fact_rowstore (
sale_id BIGINT IDENTITY(1,1) PRIMARY KEY,
product_id INT NOT NULL,
customer_id INT NOT NULL,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(15,2),
region VARCHAR(50)
);
GO
— 插入相同数据
INSERT INTO fgedu.sales_fact_rowstore (product_id, customer_id, store_id, sale_date, quantity, unit_price, total_amount, region)
SELECT product_id, customer_id, store_id, sale_date, quantity, unit_price, total_amount, region
FROM fgedu.sales_fact;
GO
— 创建行存储索引
CREATE INDEX idx_rowstore_date ON fgedu.sales_fact_rowstore(sale_date);
CREATE INDEX idx_rowstore_region ON fgedu.sales_fact_rowstore(region);
GO
— 性能对比测试
— 列存储查询
SET STATISTICS TIME ON;
SELECT region, SUM(total_amount) FROM fgedu.sales_fact GROUP BY region;
SET STATISTICS TIME OFF;
GO
— 行存储查询
SET STATISTICS TIME ON;
SELECT region, SUM(total_amount) FROM fgedu.sales_fact_rowstore GROUP BY region;
SET STATISTICS TIME OFF;
GO
— 查看存储空间对比
SELECT
t.name AS table_name,
SUM(p.rows) AS row_count,
SUM(au.total_pages * 8) AS size_kb
FROM sys.tables t
JOIN sys.partitions p ON t.object_id = p.object_id
JOIN sys.allocation_units au ON p.partition_id = au.container_id
WHERE t.name IN (‘sales_fact’, ‘sales_fact_rowstore’)
GROUP BY t.name;
GO
执行结果:
CPU time = 62 ms, elapsed time = 15 ms.
SQL Server Execution Times:
CPU time = 453 ms, elapsed time = 125 ms.
table_name row_count size_kb
——————– ———– ———–
sales_fact 1000000 125824
sales_fact_rowstore 1000000 524288
Part05-风哥经验总结与分享
5.1 SQLServer列存储最佳实践
- 适合大数据量分析查询场景
- 使用批量加载数据提高压缩效率
- 定期维护索引消除碎片
- 结合分区表提高管理效率
- 监控压缩率和查询性能
5.2 SQLServer列存储优化建议
- 使用COLUMNSTORE_ARCHIVE压缩节省空间
- 调整MAXDOP提高并行查询性能
- 使用内存优化提高批处理速度
- 合理设置压缩延迟选项
- 定期更新统计信息
5.3 SQLServer列存储常见问题
- 频繁更新导致性能下降
- 删除缓冲区过大影响查询
- 行组碎片化需要重组
- 内存不足影响压缩效率
- 统计信息过期影响执行计划
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
