在 SQL Server 上将堆索引转换为聚集索引会产生什么后果?

发布于 2024-10-07 10:03:50 字数 743 浏览 6 评论 0原文

我最近收到建议,我应该将所有表从使用堆索引转换为每个表都有一个聚集索引。实行这一策略会产生什么后果?例如,定期重组数据库更重要吗?数据增长?插入速度太慢有危险吗?如果 PK 是 GUID,会有页面碎片整理的危险吗? 我的应用程序速度显着提高? 您的经验是什么?

为了提供良好答案的灵感,以下是我从其他人那里收集到的一些“事实” stackoverflow 上的线程

  1. 几乎肯定想要在数据库中的每个表上建立聚集索引。如果一张桌子没有。最常见查询的性能更好。
  2. 聚集索引对于 GUID 来说并不总是不好......这完全取决于应用程序的需求。 INSERT 速度会受到影响,但 SELECT 速度会提高。
  3. GUID 字段中的聚集索引的问题在于 GUID 是随机的,因此当插入新记录时,必须移动磁盘上的大部分数据才能将记录插入表的中间。
  4. GUID 上的聚集索引在 GUID 有意义的情况下是可以的,并且通过将相关数据彼此靠近放置来提高性能 http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html
  5. 集群不影响查找速度——独特的非集群索引应该可以完成这项工作。

I've recently got the advice, that I should convert all our tables from using heap indexes such that each table has a clustered index. What are the consequences of persuing this strategy? E.g. is it more important to regularly reorganize the database? datagrowth? danger of really slow inserts? Danger of page-defragmentation if the PK is a GUID? Noticable speed-increase of my application? What are your experiences?

To serve as inspiration for good answers, here are some of the "facts" I've picked up from other threads here on stackoverflow

  1. Almost certainly want to establish a clustered index on every table in your database. If a table does not have one. Performance of most common queries is better.
  2. Clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. The INSERT speed will suffer, but the SELECT speed will be improved.
  3. The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
  4. Clustered index on GUID is ok in situations where the GUID has a meaning and improves performance by placing related data close to each other http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html
  5. Clustering doesn't affect lookup speed - a unique non-clustered index should do the job.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

林空鹿饮溪 2024-10-14 10:03:50

如果您的键是 GUID,那么其上的非聚集索引可能与其上的聚集索引一样有效。这是因为在 GUID 上,您绝对无法对它们进行范围扫描(“b4e8e994-c315-49c5-bbc1-f0e1b000ad7c”和“3cd22676-dffe-4152-9aef-54a6a18d32ac”之间可能有什么可能)意思是??)。 GUID 聚集索引键的宽度为 16 字节,比从堆中获取的行 id 更宽,因此 PK guid 上的 NC 索引实际上是可以在讨论中辩护的策略。

但是,将主键设置为聚集索引键并不是在堆上构建聚集索引的唯一方法。您是否有其他频繁查询请求特定列的范围?典型的候选列是诸如 datestatedeleted 之类的列。如果这样做,那么您应该考虑将这些列设置为聚集索引键(它不必是唯一的),因为这样做可能有助于请求范围的查询,例如“昨天的所有记录”。

堆具有显着性能优势的唯一场景是插入,特别是批量插入。如果您的插入负载不重,那么您绝对应该选择聚集索引。请参阅聚集索引设计指南

回顾一下你的观点:

几乎肯定要在每个表上建立聚集索引
你的数据库。如果一个表没有
有一个。最常见的性能
查询更好。

能够满足大多数查询的范围要求的聚集索引将显着提高性能,这是事实。可以满足顺序要求的聚集索引也可能有帮助,但没有什么比可以满足范围的聚集索引更有用。

聚集索引并不总是对 GUID 不利...这完全取决于
您的应用程序的需求。这
INSERT 速度会受到影响,但是
SELECT 速度将得到提高。

仅探测 SELECT 会得到改进:SELECT ... WHERE key='someguid';。通过对象 ID 和外键查找进行的查询将从该聚集索引中受益。 NC 索引也可以达到相同的目的。

GUID 字段中的聚集索引的问题是 GUID 是
随机的,所以当有新记录时
插入,很大一部分
必须移动磁盘上的数据才能插入
将记录放入中间
表。

错误的。插入索引中的位置不需要移动数据。最糟糕的情况可能是页面拆分。页面分割(不知何故)成本高昂,但并不是世界末日。您的评论建议必须移动所有数据(或至少“重要”部分)以为新行腾出空间,这远非事实。

在 GUID 具有一个的情况下,GUID 上的聚集索引是可以的
意义并通过以下方式提高性能
将相关数据靠近放置
其他
http://randommadness.blogspot.com/2008/07/guids -and-clustered-indexes.html

我无法想象 GUID 可以拥有“相关数据”的场景。 GUID 是典型的随机结构,两个随机 GUID 如何以任何方式关联?唐纳德给出的场景有一个更好的解决方案: 解决高并发 INSERT 工作负载上的 PAGELATCH 争用,实现起来更便宜(所需的存储空间更少),并且也适用于唯一键(链接文章中的解决方案不适用于唯一键)键,仅适用于外键)。

聚集不会影响查找速度——唯一的非聚集索引
应该完成这项工作。

对于探针(查找特定的唯一键)是的。 NC 索引几乎与聚集索引一样快(NC 索引查找确实需要额外的键查找来获取其余列)。聚集索引的亮点在于范围扫描,因为聚集索引可以覆盖任何查询,而可能可能满足相同范围的NC索引可能会在上丢失覆盖范围并触发指数临界点

If your key is a GUID, then a non-clustered index on it is probably just as effective as a clustered index on it. This is because on GUIDs you absolutely never ever can have range scans on them (what could between 'b4e8e994-c315-49c5-bbc1-f0e1b000ad7c' and '3cd22676-dffe-4152-9aef-54a6a18d32ac' possibly mean??). With a width of 16 bytes, a GUID clustered index key is wider than a row id that you'd get from a heap, so a NC index on a PK guid is actually strategy that can be defended in a discussion.

But making the primary key the clustered index key is not the only way to build a clustered index over your heap. Do you have other frequent queries that request ranges over a certain column? Typical candidates are columns like date, state or deleted. If you do, then you should consider making those columns the clustered index key (it does not have to be unique) because doing so may help queries that request ranges, like 'all records from yesterday'.

The only scenario where heaps have significant performance benefit is inserts, specially bulk inserts. IF your load is not insert heavy, then you should definitely go for a clustered index. See Clustered Index Design Guidelines.

Going over over your points:

Almost certainly want to establish a clustered index on every table in
your database. If a table does not
have one. Performance of most common
queries is better.

A clustered index that can satisfy range requirements for most queries will dramatically improve performance, true. A clustered index that can satisfy order requirements can be helpful too, but nowhere as helpful as one that can satisfy a range.

Clustered indexes are not always bad on GUIDs... it all depends upon
the needs of your application. The
INSERT speed will suffer, but the
SELECT speed will be improved.

Only probe SELECTs will be improved: SELECT ... WHERE key='someguid';. Queries by object ID and Foreign key lookups will benefit from this clustered index. A NC index can server the same purpose just as well.

The problem with clustered indexes in a GUID field are that the GUIDs are
random, so when a new record is
inserted, a significant portion of the
data on disk has to be moved to insert
the records into the middle of the
table.

Wrong. Insert into position in an index does not have to move data. The worst it can happen is a page-split. A Page-split is (somehow) expensive, but is not the end of the world. You comment suggest that all data (or at least a 'significant' part) has to be moved to make room for the new row, this is nowhere near true.

Clustered index on GUID is ok in situations where the GUID has a
meaning and improves performance by
placing related data close to each
other
http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html

I can't possibly imagine a scenario where GUID can have 'related data'. A GUID is the quintessential random structure how could two random GUIDs relate in any way? The scenario Donald gives has a better solution: Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, which is cheaper to implement (less storage required) and works for unique keys too (the solution in linked article would not work for unique keys, only for foreign keys).

Clustering doesn't affect lookup speed - a unique non-clustered index
should do the job.

For probes (lookup a specific unique key) yes. A NC index is almost as fast as the clustered index (the NC index lookup does require and additional key lookup to fetch in the rest of the columns). Where clustered index shines is range scans, as it the clustered index can cover any query, while a NC index that could potentially satisfy the same range may loose on the coverage and trigger the Index Tipping Point.

反目相谮 2024-10-14 10:03:50

我还建议您阅读 Kimberly Tripp 的 聚集索引争论仍在继续...,其中她非常清楚地详细描述了拥有*良好集群键相对于堆的所有好处。

几乎所有操作都更快 - 是的!甚至插入和更新!

但这需要一个良好的集群密钥,而 GUID 具有非常随机和不可预测的性质,因此不被认为是集群密钥的良好候选者。 GUID 作为集群键是不好的 - 无论它们是否具有应用程序意义 - 只是避免使用它们。

最好的选择是狭窄、稳定、唯一且不断增加的键 - INT IDENTITY 类型的列可以理想地满足所有这些要求。

有关 GUID 为什么不能成为良好的集群键以及它有多糟糕的更多背景信息,请参阅 Kim Tripp 的更多博客文章:

I would also recommend you read Kimberly Tripp's The Clustered Index Debate Continues... in which she details quite clearly all the benefits of having a *good clustering key over having a heap.

Pretty much all operations are faster - yes! even inserts and updates!

But this requires a good clustering key, and a GUID with its very random and unpredictable nature is not considered a good candidate for a clustering key. GUIDs as clustering key are bad - whether they have application meaning or not - just avoid those.

Your best bet is a key which is narrow, stable, unique and ever-increasing - a column of type INT IDENTITY fulfills all those requirements ideally.

For a lot more background on why a GUID doesn't make a good clustering key, and on just how bad it is, see more of Kim Tripp's blog posts:

倾`听者〃 2024-10-14 10:03:50

我可以推荐《SQL Performance Explaned》一书——这是一本关于索引的 200 页书。

它还提到聚集索引的性能何时比普通索引差。问题之一是聚集索引本身是一棵 B 树。因此,当同一个表上有其他索引时,它们不能指向特定行 - 相反,它们指向聚集索引中的“键”,因此数据的“路径”会变得更长。

I can recommend the book "SQL Performance Explained" - it is a 200 page book about indexes.

It also mentions when clustered indexes have worse performance than normal indexs. One of the problems is that the clustered index itself is a B-tree. So when you have other indexes on the same table, they can't point to a specifik row - instead they point to a "key" in the clustered index, so "the way" to the data gets longer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文