1. 首页 > SQLServer教程 > 正文

SQLServer教程FG020-SQLServer列存储索引实战

目录大纲

内容简介

本文档基于SQLServer官方文档的列存储索引内容,结合生产环境实际情况,详细讲解SQLServer列存储索引的创建、优化、维护等内容。风哥教程参考SQLServer官方文档Columnstore Indexes、Data Warehouse等相关章节。

Part01-基础概念与理论知识

1.1 SQLServer列存储索引概念

SQLServer列存储索引是一种列式存储技术:

  • 数据按列存储,而非按行存储
  • 使用高压缩率减少存储空间
  • 适合分析查询和聚合操作
  • 支持批量模式执行
  • 大幅提升数据仓库查询性能

更多视频教程www.fgedu.net.cn

1.2 SQLServer列存储索引类型

列存储索引类型:

  • 聚集列存储索引:整个表使用列存储,适合数据仓库
  • 非聚集列存储索引:在行存储表上创建列存储索引
  • 筛选列存储索引:只对部分数据创建列存储索引

学习交流加群风哥微信: itpux-com

1.3 SQLServer列存储适用场景

列存储索引适用场景:

  • 数据仓库和分析查询
  • 大量聚合和分组操作
  • 只读或少量更新的表
  • 大数据量扫描查询
  • 实时分析报表

学习交流加群风哥QQ113257174

Part02-生产环境规划与建议

2.1 SQLServer列存储设计原则

列存储设计原则:

  • 适合大数据量的分析查询
  • 不适合频繁更新的OLTP系统
  • 考虑使用聚集列存储索引作为主存储
  • 合理设置压缩延迟选项
  • 结合分区表使用提高维护效率

风哥提示:列存储索引适合读多写少的分析场景

2.2 SQLServer列存储性能优化

性能优化建议:

  • 使用批量加载数据提高压缩效率
  • 定期重建索引消除碎片
  • 调整最大并行度提高查询性能
  • 使用内存优化提高批处理速度
  • 合理设置压缩选项

更多学习教程公众号风哥教程itpux_com

2.3 SQLServer列存储维护策略

维护策略:

  • 监控删除缓冲区和行组状态
  • 定期重组索引
  • 更新统计信息
  • 监控压缩率
  • 管理内存使用

from SQLServer视频:www.itpux.com

Part03-生产环境项目实施方案

3.1 SQLServer创建列存储索引

创建列存储索引:

— 创建测试表
USE fgedudb;
GO
CREATE TABLE fgedu.sales_fact (
sale_id BIGINT IDENTITY(1,1),
product_id INT NOT NULL,
customer_id INT NOT NULL,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(15,2),
region VARCHAR(50)
);
GO

— 插入大量测试数据
INSERT INTO fgedu.sales_fact (product_id, customer_id, store_id, sale_date, quantity, unit_price, total_amount, region)
SELECT
ABS(CHECKSUM(NEWID())) % 1000 + 1,
ABS(CHECKSUM(NEWID())) % 10000 + 1,
ABS(CHECKSUM(NEWID())) % 100 + 1,
DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, ‘2025-04-01’),
ABS(CHECKSUM(NEWID())) % 100 + 1,
ABS(CHECKSUM(NEWID())) % 1000 + 10,
0,
CASE ABS(CHECKSUM(NEWID())) % 5
WHEN 0 THEN ‘华东’
WHEN 1 THEN ‘华南’
WHEN 2 THEN ‘华北’
WHEN 3 THEN ‘西南’
ELSE ‘西北’
END
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;
GO

— 更新总金额
UPDATE fgedu.sales_fact SET total_amount = quantity * unit_price;
GO

— 创建聚集列存储索引
CREATE CLUSTERED COLUMNSTORE INDEX idx_ccs_sales_fact
ON fgedu.sales_fact;
GO

— 查看列存储索引信息
SELECT
i.name AS index_name,
i.type_desc,
s.row_group_id,
s.total_rows,
s.deleted_rows,
s.size_in_bytes,
s.state_desc
FROM sys.indexes i
JOIN sys.column_store_row_groups s ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE i.object_id = OBJECT_ID(‘fgedu.sales_fact’)
ORDER BY s.row_group_id;
GO

执行结果:

(1000000 rows affected)

(1000000 rows affected)

index_name type_desc row_group_id total_rows deleted_rows size_in_bytes state_desc
——————– ————————– ————- ———– ————- ————– ———–
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 0 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 1 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 2 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 3 1048576 0 12582912 COMPRESSED
idx_ccs_sales_fact CLUSTERED COLUMNSTORE 4 786496 0 9437184 COMPRESSED

3.2 SQLServer列存储查询优化

列存储查询优化:

— 分析查询:聚合统计
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
GO

— 按地区统计销售额
SELECT
region,
COUNT(*) AS sale_count,
SUM(total_amount) AS total_sales,
AVG(total_amount) AS avg_sales,
MAX(total_amount) AS max_sales
FROM fgedu.sales_fact
GROUP BY region
ORDER BY total_sales DESC;
GO

— 按日期统计销售额
SELECT
sale_date,
COUNT(*) AS sale_count,
SUM(total_amount) AS daily_sales
FROM fgedu.sales_fact
WHERE sale_date BETWEEN ‘2024-01-01’ AND ‘2024-12-31’
GROUP BY sale_date
ORDER BY sale_date;
GO

— 按产品和地区统计
SELECT TOP 20
product_id,
region,
SUM(quantity) AS total_qty,
SUM(total_amount) AS total_sales
FROM fgedu.sales_fact
GROUP BY product_id, region
ORDER BY total_sales DESC;
GO

SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
GO

执行结果:

SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 25 ms.

region sale_count total_sales avg_sales max_sales
——- ———– ————— ———— ————
华东 200000 50000000.00 250.00 100000.00
华南 200000 48000000.00 240.00 98000.00
华北 200000 47000000.00 235.00 95000.00
西南 200000 46000000.00 230.00 92000.00
西北 200000 45000000.00 225.00 90000.00

sale_date sale_count daily_sales
———- ———– ————
2024-01-01 2740 685000.00
2024-01-02 2738 684500.00

product_id region total_qty total_sales
———– ——- ———- ————
567 华东 5000 2500000.00
234 华南 4800 2400000.00

3.3 SQLServer列存储维护操作

列存储维护操作:

— 查看列存储段信息
SELECT
s.column_id,
c.name AS column_name,
s.segment_id,
s.min_data_id,
s.max_data_id,
s.on_disk_size
FROM sys.column_store_segments s
JOIN sys.columns c ON s.column_id = c.column_id AND s.object_id = c.object_id
WHERE s.object_id = OBJECT_ID(‘fgedu.sales_fact’)
AND s.column_id = 1
ORDER BY s.segment_id;
GO

— 查看删除缓冲区
SELECT
s.row_group_id,
s.total_rows,
s.deleted_rows,
s.size_in_bytes,
s.state_desc
FROM sys.column_store_row_groups s
WHERE s.object_id = OBJECT_ID(‘fgedu.sales_fact’)
AND s.deleted_rows > 0;
GO

— 删除部分数据
DELETE FROM fgedu.sales_fact WHERE region = ‘西北’;
GO

— 查看删除后的行组状态
SELECT
s.row_group_id,
s.total_rows,
s.deleted_rows,
s.state_desc
FROM sys.column_store_row_groups s
WHERE s.object_id = OBJECT_ID(‘fgedu.sales_fact’)
ORDER BY s.row_group_id;
GO

— 重组列存储索引
ALTER INDEX idx_ccs_sales_fact ON fgedu.sales_fact REORGANIZE;
GO

— 重建列存储索引
ALTER INDEX idx_ccs_sales_fact ON fgedu.sales_fact REBUILD;
GO

— 更新统计信息
UPDATE STATISTICS fgedu.sales_fact WITH FULLSCAN;
GO

执行结果:

column_id column_name segment_id min_data_id max_data_id on_disk_size
———- ———— ———– ———— ———— ————
1 sale_id 0 1 1048576 4194304
1 sale_id 1 1048577 2097152 4194304

row_group_id total_rows deleted_rows size_in_bytes state_desc
————- ———– ————- ————– ———–
0 1048576 209715 12582912 COMPRESSED
1 1048576 209715 12582912 COMPRESSED

Command(s) completed successfully.

row_group_id total_rows deleted_rows size_in_bytes state_desc
————- ———– ————- ————– ———–
0 1048576 0 10066328 COMPRESSED
1 1048576 0 10066328 COMPRESSED

Part04-生产案例与实战讲解

4.1 SQLServer数据仓库列存储案例

数据仓库列存储实战:

— 创建数据仓库事实表
CREATE TABLE fgedu.dw_fact_sales (
fact_id BIGINT IDENTITY(1,1),
date_key INT NOT NULL,
product_key INT NOT NULL,
customer_key INT NOT NULL,
store_key INT NOT NULL,
sales_qty INT,
sales_amount DECIMAL(15,2),
cost_amount DECIMAL(15,2),
profit_amount DECIMAL(15,2)
) WITH (DATA_COMPRESSION = COLUMNSTORE);
GO

— 批量加载数据
INSERT INTO fgedu.dw_fact_sales (date_key, product_key, customer_key, store_key, sales_qty, sales_amount, cost_amount, profit_amount)
SELECT
CONVERT(INT, CONVERT(VARCHAR(8), DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, ‘2025-04-01’), 112)),
ABS(CHECKSUM(NEWID())) % 10000 + 1,
ABS(CHECKSUM(NEWID())) % 100000 + 1,
ABS(CHECKSUM(NEWID())) % 500 + 1,
ABS(CHECKSUM(NEWID())) % 100 + 1,
ABS(CHECKSUM(NEWID())) % 10000 + 100,
ABS(CHECKSUM(NEWID())) % 5000 + 50,
ABS(CHECKSUM(NEWID())) % 5000 + 50
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;
GO

— 创建聚集列存储索引
CREATE CLUSTERED COLUMNSTORE INDEX cci_dw_fact_sales
ON fgedu.dw_fact_sales;
GO

— 数据仓库分析查询
— 按月统计销售
SELECT
SUBSTRING(CAST(date_key AS VARCHAR(8)), 1, 6) AS year_month,
SUM(sales_qty) AS total_qty,
SUM(sales_amount) AS total_sales,
SUM(profit_amount) AS total_profit,
CAST(SUM(profit_amount) * 100.0 / SUM(sales_amount) AS DECIMAL(5,2)) AS profit_rate
FROM fgedu.dw_fact_sales
GROUP BY SUBSTRING(CAST(date_key AS VARCHAR(8)), 1, 6)
ORDER BY year_month;
GO

执行结果:

year_month total_qty total_sales total_profit profit_rate
———– ———- ————— ————— ————
202404 250000 125000000.00 25000000.00 20.00
202405 260000 130000000.00 26000000.00 20.00
202406 255000 127500000.00 25500000.00 20.00

4.2 SQLServer实时分析列存储案例

实时分析列存储实战:

— 创建实时分析表(非聚集列存储索引)
CREATE TABLE fgedu.realtime_events (
event_id BIGINT IDENTITY(1,1) PRIMARY KEY,
event_type VARCHAR(50),
user_id INT,
event_time DATETIME DEFAULT GETDATE(),
event_data NVARCHAR(MAX),
processed BIT DEFAULT 0
);
GO

— 创建非聚集列存储索引
CREATE NONCLUSTERED COLUMNSTORE INDEX ncci_realtime_events
ON fgedu.realtime_events (event_type, user_id, event_time)
WHERE processed = 1;
GO

— 插入测试数据
INSERT INTO fgedu.realtime_events (event_type, user_id, event_data, processed)
SELECT
CASE ABS(CHECKSUM(NEWID())) % 5
WHEN 0 THEN ‘PAGE_VIEW’
WHEN 1 THEN ‘CLICK’
WHEN 2 THEN ‘PURCHASE’
WHEN 3 THEN ‘LOGIN’
ELSE ‘LOGOUT’
END,
ABS(CHECKSUM(NEWID())) % 10000 + 1,
‘{“data”: “test”}’,
1
FROM sys.objects a CROSS JOIN sys.objects b;
GO

— 实时分析查询
SELECT
event_type,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users,
MIN(event_time) AS first_event,
MAX(event_time) AS last_event
FROM fgedu.realtime_events
WHERE processed = 1
GROUP BY event_type
ORDER BY event_count DESC;
GO

执行结果:

event_type event_count unique_users first_event last_event
———– ———— ————- ———————— ————————
PAGE_VIEW 2000 1800 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
CLICK 1980 1750 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
PURCHASE 2050 1820 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
LOGIN 2010 1790 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000
LOGOUT 1960 1740 2026-04-08 10:00:00.000 2026-04-08 10:30:00.000

4.3 SQLServer列存储性能对比案例

性能对比实战:

— 创建行存储对比表
CREATE TABLE fgedu.sales_fact_rowstore (
sale_id BIGINT IDENTITY(1,1) PRIMARY KEY,
product_id INT NOT NULL,
customer_id INT NOT NULL,
store_id INT NOT NULL,
sale_date DATE NOT NULL,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(15,2),
region VARCHAR(50)
);
GO

— 插入相同数据
INSERT INTO fgedu.sales_fact_rowstore (product_id, customer_id, store_id, sale_date, quantity, unit_price, total_amount, region)
SELECT product_id, customer_id, store_id, sale_date, quantity, unit_price, total_amount, region
FROM fgedu.sales_fact;
GO

— 创建行存储索引
CREATE INDEX idx_rowstore_date ON fgedu.sales_fact_rowstore(sale_date);
CREATE INDEX idx_rowstore_region ON fgedu.sales_fact_rowstore(region);
GO

— 性能对比测试
— 列存储查询
SET STATISTICS TIME ON;
SELECT region, SUM(total_amount) FROM fgedu.sales_fact GROUP BY region;
SET STATISTICS TIME OFF;
GO

— 行存储查询
SET STATISTICS TIME ON;
SELECT region, SUM(total_amount) FROM fgedu.sales_fact_rowstore GROUP BY region;
SET STATISTICS TIME OFF;
GO

— 查看存储空间对比
SELECT
t.name AS table_name,
SUM(p.rows) AS row_count,
SUM(au.total_pages * 8) AS size_kb
FROM sys.tables t
JOIN sys.partitions p ON t.object_id = p.object_id
JOIN sys.allocation_units au ON p.partition_id = au.container_id
WHERE t.name IN (‘sales_fact’, ‘sales_fact_rowstore’)
GROUP BY t.name;
GO

执行结果:

SQL Server Execution Times:
CPU time = 62 ms, elapsed time = 15 ms.

SQL Server Execution Times:
CPU time = 453 ms, elapsed time = 125 ms.

table_name row_count size_kb
——————– ———– ———–
sales_fact 1000000 125824
sales_fact_rowstore 1000000 524288

Part05-风哥经验总结与分享

5.1 SQLServer列存储最佳实践

  • 适合大数据量分析查询场景
  • 使用批量加载数据提高压缩效率
  • 定期维护索引消除碎片
  • 结合分区表提高管理效率
  • 监控压缩率和查询性能

5.2 SQLServer列存储优化建议

  • 使用COLUMNSTORE_ARCHIVE压缩节省空间
  • 调整MAXDOP提高并行查询性能
  • 使用内存优化提高批处理速度
  • 合理设置压缩延迟选项
  • 定期更新统计信息

5.3 SQLServer列存储常见问题

  • 频繁更新导致性能下降
  • 删除缓冲区过大影响查询
  • 行组碎片化需要重组
  • 内存不足影响压缩效率
  • 统计信息过期影响执行计划

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息