SQL Server 2008 索引优化 - 集群查找与非集群包含

发布于 2024-12-03 12:17:49 字数 1320 浏览 1 评论 0原文

这是一个关于索引优化理论的又长又复杂的问题。这不是家庭作业，尽管我是在 Microsoft 70-432 的样本考试中第一次接触到这个问题的。最初的问题是关于一般查询优化，但后来我发现这种奇怪的行为我无法解释。

首先，表：

CREATE TABLE Invoice_details (
Invoice_id int NOT NULL,
Customer_id int NOT NULL,
Invoice_date datetime DEFAULT GETDATE() NULL,
Amount_total int NULL,
Serial_num int IDENTITY (1,1) NOT NULL)

现在，一个聚集索引和两个用于测试的索引：

CREATE UNIQUE CLUSTERED INDEX [ix_serial] ON [dbo].[Invoice_details] ([Serial_num] ASC)
/* Below is the "original" index */
CREATE NONCLUSTERED INDEX [ix_invoice_customer] ON [dbo].[Invoice_details] 
    ([Invoice_id] ASC,[Customer_id] ASC)
/* Below is the "optimized" index (adds one included field) */
CREATE NONCLUSTERED INDEX [ix_invoice_customer_inc] ON [dbo].[Invoice_details] 
    ([Invoice_id] ASC,[Customer_id] ASC) INCLUDE ([Invoice_date])

我还在表中添加了一些随机测试数据 - 100000 行。 Invoice_id、Customer_id 和 Amount_total 各自收到自己的随机值（范围 1000-9999），Invoice_date 收到 GETDATE() 加上随机秒数（范围 1000-9999）。我可以提供我使用的实际例程，但认为具体细节不相关。

最后是查询：

SELECT Invoice_id,Customer_id,Invoice_date FROM Invoice_details WHERE Customer_id=1234;

显然，查询的第一步将是非聚集索引扫描。无论使用哪个索引，第一步都将返回相同数量的索引行。使用“原始”索引，下一步将通过聚集索引进行查找以检索 Invoice_date，然后在两个集合之间进行内部 JOIN。使用“优化”索引，该字段包含在索引叶中，因此规划器直接返回结果。

哪个索引可以加快执行速度，为什么？

原文

This is a long, involved question about index optimization theory. This is not homework, though I was first exposed to this question in a sample exam for Microsoft's 70-432. The original question was about general query optimization, but then I found this peculiar behavior I could not explain.

First, the table:

CREATE TABLE Invoice_details (
Invoice_id int NOT NULL,
Customer_id int NOT NULL,
Invoice_date datetime DEFAULT GETDATE() NULL,
Amount_total int NULL,
Serial_num int IDENTITY (1,1) NOT NULL)

Now, a clustered index, and the two indexes for testing:

CREATE UNIQUE CLUSTERED INDEX [ix_serial] ON [dbo].[Invoice_details] ([Serial_num] ASC)
/* Below is the "original" index */
CREATE NONCLUSTERED INDEX [ix_invoice_customer] ON [dbo].[Invoice_details] 
    ([Invoice_id] ASC,[Customer_id] ASC)
/* Below is the "optimized" index (adds one included field) */
CREATE NONCLUSTERED INDEX [ix_invoice_customer_inc] ON [dbo].[Invoice_details] 
    ([Invoice_id] ASC,[Customer_id] ASC) INCLUDE ([Invoice_date])

I also added some random test data to the table - 100000 rows. Invoice_id, Customer_id, and Amount_total each received their own random values (range 1000-9999), and Invoice_date received GETDATE() plus a random number of seconds (range 1000-9999). I can provide the actual routine I used, but did not think the specifics would be relevant.

And finally, the query:

SELECT Invoice_id,Customer_id,Invoice_date FROM Invoice_details WHERE Customer_id=1234;

Obviously, the query's first step will be a nonclustered index scan. Regardless of which index is used, that first step will return the same number of index rows. With the "original" index, the next step will be a lookup via the clustered index to retrieve Invoice_date, followed by an internal JOIN between the two sets. With the "optimized" index, that field is included in the index leaf, so the planner goes straight to returning the results.

Which index results in faster execution, and why?

分享到QQ

分享到微博