SQL Server中如何选择聚集索引?

发布于 2024-08-21 19:15:00 字数 466 浏览 1 评论 0原文

通常聚集索引是通过设置主键在 SQL Server Management Studio 中创建的,但是我最近关于 PK <-> 的问题是:聚集索引(Microsoft SQL Server 2008 主键的含义)表明没有必要将PK和聚集索引设置为相等。

那么我们应该如何选择聚集索引呢?让我们看以下示例:

create table Customers (ID int, ...)
create table Orders (ID int, CustomerID int)

我们通常会在两个 ID 列上创建 PK/CI,但我考虑为 CustomerID 中的订单创建它。这是最好的选择吗?

Usually the clustered index is created in SQL Server Management Studio by setting the primary key, however my recent question about PK <-> clustered index (Meaning of Primary Key to Microsoft SQL Server 2008) has shown that it is not necessary to set PK and clustered index to be equal.

So how should we choose clustered indexes then? Let's have the following example:

create table Customers (ID int, ...)
create table Orders (ID int, CustomerID int)

We would usually create the PK/CI on both ID columns but I thought about creating it for Orders in CustomerID. Is that the best choice?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

萌化 2024-08-28 19:15:00

根据索引女王 - Kimberly Tripp 的说法 - 她在聚集索引中寻找的主要是:

  • 唯一的
  • 静态

如果您还可以保证:

  • 不断增加的模式

,那么您就非常接近拥有您理想的聚类键!

查看她的整个 博客文章在这里,还有另一篇关于集群对表操作的关键影响的非常有趣的文章:聚集索引争论仍在继续

任何像 INT(尤其是 INT IDENTITY)或可能的 INT 和 DATETIME 之类的东西都是理想的候选者。由于其他原因,GUID 根本不是好的候选者 - 因此您可能有一个 GUID 作为您的 PK,但不要将您的表聚集在它上面 - 它会变得支离破碎,无法识别,并且性能会受到影响。

According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

  • Unique
  • Narrow
  • Static

And if you can also guarantee:

  • Ever-increasing pattern

then you're pretty close to having your ideal clustering key!

Check out her entire blog post here, and another really interesting one about clustering key impacts on table operations here: The Clustered Index Debate Continues.

Anything like an INT (esp. an INT IDENTITY) or possibly an INT and a DATETIME are ideal candiates. For other reasons, GUID's aren't good candidates at all - so you might have a GUID as your PK, but don't cluster your table on it - it'll be fragmented beyond recognition and performance will suffer.

晚雾 2024-08-28 19:15:00

CLUSTERED 索引的最佳候选者是您最常用来引用记录的键。

通常,这是一个主键,因为它是在搜索和/或外键关系中使用的。

在您的情况下,Orders.ID 最有可能参与搜索和引用,因此它是作为聚类表达式的最佳候选。

如果您在 Orders.CustomerID 上创建 CLUSTERED 索引,则会发生以下情况:

  1. CustomerID 不唯一。为了确保唯一性,将向每条记录添加一个特殊的隐藏 32 位 列,称为 uniquifier

  2. 表中的记录将根据这对列(CustomerID, uniquifier)存储。

  3. 将创建 Order.ID 上的二级索引,以 (CustomerID, uniquifier) 作为记录指针。

  4. 像这样的查询:

    <前><代码>选择*
    来自订单
    其中 ID = 1234567

    必须执行外部操作,即Clustered Seek,因为并非所有列都存储在ID 上的索引中。要检索所有列,记录应首先位于聚集表中。

此附加操作需要 IndexDepth,因为读取的页面数量与简单的Clustered Seek 一样,IndexDepthO(log(n))< /code> 表中的记录总数。

A best candidate for a CLUSTERED index is the key you use to refer to your records most often.

Usually, this is a PRIMARY KEY, since it's what used in searches and/or FOREIGN KEY relationships.

In your case, Orders.ID will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.

If you create the CLUSTERED index on Orders.CustomerID, the following things will happen:

  1. CustomerID is not unique. To ensure uniqueness, a special hidden 32-bit column known as uniquifier will be added to each record.

  2. Records in the table will be stored according to this pair of columns (CustomerID, uniquifier).

  3. A secondary index on Order.ID will be created, with (CustomerID, uniquifier) as the record pointers.

  4. Queries like this:

    SELECT  *
    FROM    Orders
    WHERE   ID = 1234567
    

    will have to do an external operation, a Clustered Seek, since not all columns are stored in the index on ID. To retrieve all columns, the record should first be located in the clustered table.

This additional operation requires IndexDepth as many page reads as a simple Clustered Seek, the IndexDepth beign O(log(n)) of total number of the records in your table.

假装爱人 2024-08-28 19:15:00

如果您担心集群,它通常是为了帮助改进数据检索。在您的示例中,您可能需要立即获取给定客户的所有记录。 customerID 上的集群会将这些行保留在同一物理页上,而不是分散在文件中的多个页面中。

ROT:对您想要显示的集合进行聚类。采购订单中的行项目就是一个典型的例子。

If you're concerned about clustering it's usually to help improve data retrieval. In you example, you're probably going to want all records for a given customer at once. Clustering on customerID will keep those rows on the same physical page rather than scattered throughout multiple pages in your file.

ROT: Cluster on what you want to show a collection of. Line items in a purchase order is the classic example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文