当前位置：文江博客话题详情

在理解聚集索引时我错过了什么？

发布于 2024-09-29 21:26:52 字数 2156 浏览 3 评论 0原文

如果没有任何索引，则通过 IAM（（索引分配映射）访问表行。
我可以使用 IAM 以编程方式直接访问行吗？

没有索引是否意味着读取特定行的唯一方法是全表扫描读取所有表？
为什么 IAM 不能参与更具体的直接访问？

“如果表是堆（换句话说，它没有聚集索引），则书签是行标识符 (RID)，它是 File#:Page#:Slot# 形式的实际行定位符” [1a]

插槽没有进一步的定义。好吧，其他消息来源告诉我们 Slot# 实际上是行号。正确的？或者需要与 IAM 进一步并置以确定特定行？

现在，引入聚集索引意味着无法直接访问数据，只能通过最终的聚集索引查找或顺序遍历聚集叶节点。

我是否正确理解引入聚集索引仅对选择连续相邻（范围）行并且仅通过聚集索引键有益？
对表进行集群还有哪些好处？

我是否正确理解聚集索引的引入会恶化非聚集索引参与非精确匹配查询的性能优势？没有直接访问、顺序访问无法并行、非聚集索引由聚集索引键增加等等，对吗？

好吧，我发现对表进行聚类对于非常具体且易于理解的上下文是有意义的，而在对表进行聚类时总是默认创建主键。这是为什么呢？

我在理解聚集索引时错过了什么？

[1]
Microsoft® SQL Server™ 2005 内部：存储引擎
作者：Kalen Delaney -（扎实的优质学习）
................................................
出版商：微软出版社
发布日期：2006 年 10 月 11 日
打印 ISBN-10：0-7356-2105-5
打印 ISBN-13：978-0-7356-2105-3
页数：464

[1a] p.250 第 7 章中的索引组织。索引内部结构和管理

这里是有用的在线复制粘贴
http://sqlserverindexeorgnization.blogspot.com/
尽管没有任何来源

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

镜花水月 2024-10-06 21:26:52

这是很多问题。是的，IAM 用于查找堆上的页面。不同之处在于，如果没有索引，就无法知道对于任何给定的数据片段要检索哪些页面。 SQL/关系数据模型的一个重要特征是查询仅通过数据值访问数据，而不是直接使用指针或其他结构。

槽号仅标识页面内的行。行数据在页内没有逻辑排序，即使在聚集索引中也是如此。每个数据页都包含一个行偏移表，该表指向页内行的位置。

由于需要额外的书签查找，聚集索引可能会减慢非聚集索引的数据访问速度。这可以通过使用 INCLUDE 子句将列添加到 NC 索引来缓解。有时，表上没有聚集索引可能会更有效。

回复收藏 0 原文

月亮邮递员 2024-10-06 21:26:52

请首先阅读我在“无法直接访问聚集表中的数据行 - 为什么？”下的回答。。

如果有 IAM，那么就有索引。

但如果没有 CI，那么行就在堆中，是的，如果您想直接读取它（不使用 NCI，或者不存在索引），您只能对堆进行表扫描。

聚集索引总是比没有聚集索引好。对于异常或不合标准的情况，有一个例外和一个警告：

非唯一 CI 密钥。这会导致溢出页面。关系表需要具有唯一键，因此这不是关系表。通过超载列可以很容易地使 CI 变得唯一。非唯一 CI 仍然比没有 CI 更好（根据我的其他帖子）。
非唯一
单调键。通常是 IDENTITY 列。插入的键始终位于最后一页，而不是随机插入分布在整个数据存储结构中的行（这对于“良好”的自然关系键来说是正常的）。这会导致插入热点，并降低并发性。关系键应该是自然唯一的；代理始终是一个附加索引。仅代理根本就不是关系表（它是一组非标准化电子表格，行标识符将它们链接在一起；您将无法从中获得数据库的功能）。
.
因此，长期建议是，使用 NCI 作为单调密钥，并确保 CI 允许良好的数据分布。

CI 的优点是巨大的，没有充分的理由去拥有它（可能有上面提到的不好的理由）。

CI 允许范围查询； NCI 则不然。但这并不是唯一的原因。

另一个需要注意的是，您需要保持 CI 密钥的宽度较小，因为它是在 NCI 中携带的。现在通常这不是问题，因为宽 CI 键就可以了。但是，如果您有一堆伪装成数据库的非标准化电子表格，这会导致比标准化数据库多得多的索引，这确实成为一个考虑因素。因此，对 Empire 爱好者的长期建议是，保持 CI 键的宽度较小。 CI 不会“增加”NCI，这一点没有准确说明。如果您有 NCI，那么它将有一个指针或 CI 密钥；如果您有 CI（具有所有好处），那么 CI 键而不是 RowId 的成本可以忽略不计。所以准确的说法是，宽 CI 键会增加 NCI。

任何说 CI 的顺序访问不能并行化的人都是错误的（MS 可能会在一个版本中破坏它并在下一个版本中修复它，但这是暂时的）。

使用 ANSI SQL ...PRIMARY KEY ... 表示法默认为 UNIQUE CLUSTERED。因为数据库应该是关系型的。 Unique PK 应该是一个友好的关系键，而不是一个愚蠢的 IDENTITY 列。因此（不包括例外）主键总是聚类的最佳候选。

您始终可以通过避免 ANSI SQL ...PRIMARY KEY ... 表示法并使用 CREATE [UNIQUE] [CLUSTERED] INDEX 表示法来创建所需的任何索引。

不可能回答你的最后一个问题，你必须继续问问题，直到问完为止。但是请忘记您引用的 doco 并重新开始，否则我们将在这里花几天时间讨论清晰的知识和晦涩难懂的知识之间的区别。

Please read my answer under "No direct access to data row in clustered table - why?", first.

If there is an IAM, then there is an Index.

But if the is no CI, then the rows are in a Heap, and yes, if you want to read it directly (without using an NCI, or where no Indices exist), you can only table scan a Heap.

A Clustered Index is always better that not having one. There is one exception, and one caveat, both for abnormal or sub-standard conditions:

Non-Unique CI Key. This causes Overflow Pages. Relational tables are required to have unique keys, so this is not a Relational table. The CI can be made unique quite easily by overloading the columns. A Non-unique CI is still better (as per my other post) to have a Non-unique CI than no CI.
Monotonic Key. Typically an IDENTITY column. Instead of random Inserts which insert rows distributed throughout the data storage structure (as is normal with a "good" natural Relational key), the inserted Key is always on the last page. This causes an Insert hotspot, and reduces concurrency. Relational keys are supposed to naturally unique; the surrogate is always an additional index. A surrogate-only is simply not a relational table (it is group of Unnormalised spreadsheets with row identifiers linking them together; you will not get th epower of a databse from that).
.
So the standing advice is, use an NCI for monotonic keys, and ensure that the CI allows good data distribution.

The advantages of CIs are vast, there is no good reason to have one (there may be bad reasons as alluded to above).

CIs allow range queries; NCIs do not. But that is not the only reason.

Another caveat is you need to keep the width of the CI Key small, because it is carried in the NCIs. Now normally this is not a problem, as in, wide CI keys are fine. But where you have an Unormalise dbunch of spreadsheets masquerading as a database, which results in many more indices than a Normalised database, that does become a consideration. Therefore the standing advice for Empire devotees is, keep the width of the CI key down. CIs do not "increase" the NCIs, that is not stated accurately. If you have NCIs, well, it is going to have a pointer or a CI key; if you have a CI (with all the benefits) then the cost is, a CI key instead of a RowId, negligible. So the accurate statement is, fat wide CI keys increase the NCIs.

Whoever says sequential access of CIs cannot be parallelised is wrong (MS may break it in one version and fix it in the next, but that is transient).

Using the ANSI SQL ...PRIMARY KEY ... notation defaults to UNIQUE CLUSTERED. because the db is supposed to be Relational. And the Unique PK is supposed to be a nice friendly Relational key, not a idiotic IDENTITY column. Therefore invariably (not counting exceptions) the PRIMARY KEY is the best candidate for clustering.

You can always create whatever indices you want by avoiding the ANSI SQL ...PRIMARY KEY ... notation and using CREATE [UNIQUE] [CLUSTERED] INDEX notation instead.

It is not possible to answer that last question of yours, you will have to keep asking questions until you run out. But please, forget the doco you reference and start again, otherwise we will be here for days discussing the difference between clear knowledge and gobbledegook.

回复收藏 0 原文

~没有更多了~