在 SQL Server 上将堆索引转换为聚集索引会产生什么后果？

发布于 2024-10-07 10:03:50 字数 743 浏览 6 评论 0原文

我最近收到建议，我应该将所有表从使用堆索引转换为每个表都有一个聚集索引。实行这一策略会产生什么后果？例如，定期重组数据库更重要吗？数据增长？插入速度太慢有危险吗？如果 PK 是 GUID，会有页面碎片整理的危险吗？ 我的应用程序速度显着提高？ 您的经验是什么？

为了提供良好答案的灵感，以下是我从其他人那里收集到的一些“事实” stackoverflow 上的线程

几乎肯定想要在数据库中的每个表上建立聚集索引。如果一张桌子没有。最常见查询的性能更好。
聚集索引对于 GUID 来说并不总是不好......这完全取决于应用程序的需求。 INSERT 速度会受到影响，但 SELECT 速度会提高。
GUID 字段中的聚集索引的问题在于 GUID 是随机的，因此当插入新记录时，必须移动磁盘上的大部分数据才能将记录插入表的中间。
GUID 上的聚集索引在 GUID 有意义的情况下是可以的，并且通过将相关数据彼此靠近放置来提高性能 http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html
集群不影响查找速度——独特的非集群索引应该可以完成这项工作。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

林空鹿饮溪 2024-10-14 10:03:50

如果您的键是 GUID，那么其上的非聚集索引可能与其上的聚集索引一样有效。这是因为在 GUID 上，您绝对无法对它们进行范围扫描（“b4e8e994-c315-49c5-bbc1-f0e1b000ad7c”和“3cd22676-dffe-4152-9aef-54a6a18d32ac”之间可能有什么可能）意思是？？）。 GUID 聚集索引键的宽度为 16 字节，比从堆中获取的行 id 更宽，因此 PK guid 上的 NC 索引实际上是可以在讨论中辩护的策略。

但是，将主键设置为聚集索引键并不是在堆上构建聚集索引的唯一方法。您是否有其他频繁查询请求特定列的范围？典型的候选列是诸如 date、state 或 deleted 之类的列。如果这样做，那么您应该考虑将这些列设置为聚集索引键（它不必是唯一的），因为这样做可能有助于请求范围的查询，例如“昨天的所有记录”。

堆具有显着性能优势的唯一场景是插入，特别是批量插入。如果您的插入负载不重，那么您绝对应该选择聚集索引。请参阅聚集索引设计指南。

回顾一下你的观点：

几乎肯定要在每个表上建立聚集索引
你的数据库。如果一个表没有
有一个。最常见的性能
查询更好。

能够满足大多数查询的范围要求的聚集索引将显着提高性能，这是事实。可以满足顺序要求的聚集索引也可能有帮助，但没有什么比可以满足范围的聚集索引更有用。

聚集索引并不总是对 GUID 不利...这完全取决于
您的应用程序的需求。这
INSERT 速度会受到影响，但是
SELECT 速度将得到提高。

仅探测 SELECT 会得到改进：SELECT ... WHERE key='someguid';。通过对象 ID 和外键查找进行的查询将从该聚集索引中受益。 NC 索引也可以达到相同的目的。

GUID 字段中的聚集索引的问题是 GUID 是
随机的，所以当有新记录时
插入，很大一部分
必须移动磁盘上的数据才能插入
将记录放入中间
表。

错误的。插入索引中的位置不需要移动数据。最糟糕的情况可能是页面拆分。页面分割（不知何故）成本高昂，但并不是世界末日。您的评论建议必须移动所有数据（或至少“重要”部分）以为新行腾出空间，这远非事实。

在 GUID 具有一个的情况下，GUID 上的聚集索引是可以的
意义并通过以下方式提高性能
将相关数据靠近放置
其他
http://randommadness.blogspot.com/2008/07/guids -and-clustered-indexes.html

我无法想象 GUID 可以拥有“相关数据”的场景。 GUID 是典型的随机结构，两个随机 GUID 如何以任何方式关联？唐纳德给出的场景有一个更好的解决方案：解决高并发 INSERT 工作负载上的 PAGELATCH 争用，实现起来更便宜（所需的存储空间更少），并且也适用于唯一键（链接文章中的解决方案不适用于唯一键）键，仅适用于外键）。

聚集不会影响查找速度——唯一的非聚集索引
应该完成这项工作。

对于探针（查找特定的唯一键）是的。 NC 索引几乎与聚集索引一样快（NC 索引查找确实需要额外的键查找来获取其余列）。聚集索引的亮点在于范围扫描，因为聚集索引可以覆盖任何查询，而可能可能满足相同范围的NC索引可能会在上丢失覆盖范围并触发指数临界点。

If your key is a GUID, then a non-clustered index on it is probably just as effective as a clustered index on it. This is because on GUIDs you absolutely never ever can have range scans on them (what could between 'b4e8e994-c315-49c5-bbc1-f0e1b000ad7c' and '3cd22676-dffe-4152-9aef-54a6a18d32ac' possibly mean??). With a width of 16 bytes, a GUID clustered index key is wider than a row id that you'd get from a heap, so a NC index on a PK guid is actually strategy that can be defended in a discussion.

But making the primary key the clustered index key is not the only way to build a clustered index over your heap. Do you have other frequent queries that request ranges over a certain column? Typical candidates are columns like date, state or deleted. If you do, then you should consider making those columns the clustered index key (it does not have to be unique) because doing so may help queries that request ranges, like 'all records from yesterday'.

The only scenario where heaps have significant performance benefit is inserts, specially bulk inserts. IF your load is not insert heavy, then you should definitely go for a clustered index. See Clustered Index Design Guidelines.

Going over over your points:

Almost certainly want to establish a clustered index on every table in
your database. If a table does not
have one. Performance of most common
queries is better.

A clustered index that can satisfy range requirements for most queries will dramatically improve performance, true. A clustered index that can satisfy order requirements can be helpful too, but nowhere as helpful as one that can satisfy a range.

Clustered indexes are not always bad on GUIDs... it all depends upon
the needs of your application. The
INSERT speed will suffer, but the
SELECT speed will be improved.

Only probe SELECTs will be improved: SELECT ... WHERE key='someguid';. Queries by object ID and Foreign key lookups will benefit from this clustered index. A NC index can server the same purpose just as well.

The problem with clustered indexes in a GUID field are that the GUIDs are
random, so when a new record is
inserted, a significant portion of the
data on disk has to be moved to insert
the records into the middle of the
table.

Wrong. Insert into position in an index does not have to move data. The worst it can happen is a page-split. A Page-split is (somehow) expensive, but is not the end of the world. You comment suggest that all data (or at least a 'significant' part) has to be moved to make room for the new row, this is nowhere near true.

Clustered index on GUID is ok in situations where the GUID has a
meaning and improves performance by
placing related data close to each
other
http://randommadness.blogspot.com/2008/07/guids-and-clustered-indexes.html

I can't possibly imagine a scenario where GUID can have 'related data'. A GUID is the quintessential random structure how could two random GUIDs relate in any way? The scenario Donald gives has a better solution: Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, which is cheaper to implement (less storage required) and works for unique keys too (the solution in linked article would not work for unique keys, only for foreign keys).

Clustering doesn't affect lookup speed - a unique non-clustered index
should do the job.

For probes (lookup a specific unique key) yes. A NC index is almost as fast as the clustered index (the NC index lookup does require and additional key lookup to fetch in the rest of the columns). Where clustered index shines is range scans, as it the clustered index can cover any query, while a NC index that could potentially satisfy the same range may loose on the coverage and trigger the Index Tipping Point.

回复收藏 0 原文