SQL Server中如何选择聚集索引？

发布于 2024-08-21 19:15:00 字数 466 浏览 6 评论 0原文

通常聚集索引是通过设置主键在 SQL Server Management Studio 中创建的，但是我最近关于 PK <-> 的问题是：聚集索引（Microsoft SQL Server 2008 主键的含义)表明没有必要将PK和聚集索引设置为相等。

那么我们应该如何选择聚集索引呢？让我们看以下示例：

create table Customers (ID int, ...)
create table Orders (ID int, CustomerID int)

我们通常会在两个 ID 列上创建 PK/CI，但我考虑为 CustomerID 中的订单创建它。这是最好的选择吗？

原文

Usually the clustered index is created in SQL Server Management Studio by setting the primary key, however my recent question about PK <-> clustered index (Meaning of Primary Key to Microsoft SQL Server 2008) has shown that it is not necessary to set PK and clustered index to be equal.

So how should we choose clustered indexes then? Let's have the following example:

create table Customers (ID int, ...)
create table Orders (ID int, CustomerID int)

We would usually create the PK/CI on both ID columns but I thought about creating it for Orders in CustomerID. Is that the best choice?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌化 2024-08-28 19:15:00

根据索引女王 - Kimberly Tripp 的说法 - 她在聚集索引中寻找的主要是：

唯一的
窄
静态

如果您还可以保证：

不断增加的模式

，那么您就非常接近拥有您理想的聚类键！

查看她的整个博客文章在这里，还有另一篇关于集群对表操作的关键影响的非常有趣的文章：聚集索引争论仍在继续。

任何像 INT（尤其是 INT IDENTITY）或可能的 INT 和 DATETIME 之类的东西都是理想的候选者。由于其他原因，GUID 根本不是好的候选者 - 因此您可能有一个 GUID 作为您的 PK，但不要将您的表聚集在它上面 - 它会变得支离破碎，无法识别，并且性能会受到影响。

回复收藏 0 原文

晚雾 2024-08-28 19:15:00

CLUSTERED 索引的最佳候选者是您最常用来引用记录的键。

通常，这是一个主键，因为它是在搜索和/或外键关系中使用的。

在您的情况下，Orders.ID 最有可能参与搜索和引用，因此它是作为聚类表达式的最佳候选。

如果您在 Orders.CustomerID 上创建 CLUSTERED 索引，则会发生以下情况：

CustomerID 不唯一。为了确保唯一性，将向每条记录添加一个特殊的隐藏 32 位 列，称为 uniquifier。
表中的记录将根据这对列(CustomerID, uniquifier)存储。
将创建 Order.ID 上的二级索引，以 (CustomerID, uniquifier) 作为记录指针。
像这样的查询：
<前><代码>选择*
来自订单
其中 ID = 1234567
必须执行外部操作，即Clustered Seek，因为并非所有列都存储在ID 上的索引中。要检索所有列，记录应首先位于聚集表中。

此附加操作需要 IndexDepth，因为读取的页面数量与简单的Clustered Seek 一样，IndexDepth 为 O(log(n))< /code> 表中的记录总数。

A best candidate for a CLUSTERED index is the key you use to refer to your records most often.

Usually, this is a PRIMARY KEY, since it's what used in searches and/or FOREIGN KEY relationships.

In your case, Orders.ID will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.

If you create the CLUSTERED index on Orders.CustomerID, the following things will happen:

CustomerID is not unique. To ensure uniqueness, a special hidden 32-bit column known as uniquifier will be added to each record.
Records in the table will be stored according to this pair of columns (CustomerID, uniquifier).
A secondary index on Order.ID will be created, with (CustomerID, uniquifier) as the record pointers.
Queries like this:
```
SELECT  *
FROM    Orders
WHERE   ID = 1234567
```
will have to do an external operation, a Clustered Seek, since not all columns are stored in the index on ID. To retrieve all columns, the record should first be located in the clustered table.

This additional operation requires IndexDepth as many page reads as a simple Clustered Seek, the IndexDepth beign O(log(n)) of total number of the records in your table.

回复收藏 0 原文