为什么要向 NCI 上的所有（中间）节点添加聚集索引键？

发布于 2024-09-30 09:06:57 字数 764 浏览 2 评论 0原文

考虑聚簇表，
Quassnoi 写道（答案中的最后一句话）：

如果二级索引声明为UNIQUE，则聚集键仅附加到二级索引的叶级记录。

这听起来像是聚集键被添加到非唯一非聚集索引的（所有）中间节点。并且通过相同的逻辑，在非聚集表的情况下，RID 被添加到中间节点（？）

它的目的是什么？

更新：
目前这个问题有 9 票：-5，+4，从纯粹的匿名开始 -3), 正确答案与大多数 msdn 文档相矛盾。
它的价值实际上并不在于其本身，而在于如何解决此类与 SQL Server 内部结构相关的问题，这些问题要么是文档中描述的矛盾、不正确或不充分。

更新2：@Quassnoi，
感谢您的回答丰富了我调查自己而不问愚蠢问题的能力。

DBCC IND() 不输出 PageID。我不明白它的 PagePID（来自 DBCC IND 的输出）对应于 DBCC DBCC Page() 输出中的 PageID。
我对它们的使用（以及内部结构的研究/调查）或其他替代方案有更多疑问。我不确定为什么此类问题在这里被视为垃圾邮件。
您能为我推荐适合此类问题（关于 SQL Server 内部）的论坛/论坛吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羁客 2024-10-07 09:06:57

这听起来像是聚集键被添加到非唯一非聚集索引的（所有）中间节点。并且通过相同的逻辑，在非聚集表的情况下，RID 将添加到中间节点（？）

是的，这是真的。

这样做是为了提高索引的可维护性。

假设您在 column 上有一个辅助（非聚集）索引，其中 1,000,000 记录的 column = 1 并且想要删除其中一条记录。

该记录也需要从索引中删除。

要找到要删除的记录，应在索引上执行B-Tree搜索。但如果分支节点没有存储行指针的值（无论是聚集键还是 RID），引擎将必须扫描所有 1M 记录以确定要删除的那个。

如果辅助键是UNIQUE，则column的值足以唯一地定位索引中的节点，因此不需要将行指针存储在分支节点中（这就是为什么它们不被存储）。

您可能也会对此讨论感兴趣：

http://www.sqlservercentral.com/ Forums/Topic714684-1545-6.aspx

更新：

要检查分支节点的内容，可以使用DBCC IND：

CREATE TABLE t_clustered (id INT NOT NULL PRIMARY KEY, nval INT, uval INT)
CREATE TABLE t_nonclustered (id INT NOT NULL PRIMARY KEY NONCLUSTERED, nval INT, uval INT)

CREATE INDEX ix_clustered_nval ON t_clustered (nval)
CREATE UNIQUE INDEX ux_clustered_uval ON t_clustered (uval)
CREATE INDEX ix_nonclustered_nval ON t_nonclustered (nval)
CREATE UNIQUE INDEX ux_nonclustered_nval ON t_nonclustered (uval)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_clustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_nonclustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)

-- Replace mydb with your database name

DBCC IND (mydb, t_clustered, -1)
DBCC IND (mydb, t_nonclustered, -1)

在这些命令的输出中您应该搜索 PageType = 2 （索引页）和 IndexLevel > 的记录0（非叶节点）并找到其 PageID。

就我而言，我得到以下 PageID：21074、21076、21105、21107。请注意，它们是特定于站点的：您将拥有其他值。

然后你应该使用DBCC PAGE来检查这些页面的内容：

DBCC PAGE (mydb, 1, 21074, 3)
DBCC PAGE (mydb, 1, 21076, 3)
DBCC PAGE (mydb, 1, 21105, 3)
DBCC PAGE (mydb, 1, 21107, 3)

FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  id (key)    KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  HEAP RID (key)     KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue

我们看到nval上的非唯一二级索引的非叶节点包含记录指针（< code>id (PRIMARY KEY CLUSTERED) 和 RID，相应地），而 uval 上的唯一索引不包含记录指针，仅包含索引列本身。

这又是因为对于唯一索引，索引列的值足以在索引中定位其节点，而对于非唯一索引则不然。

This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)

Yes, this is true.

This is done to improve the maintainability of an index.

Say, you have a secondary (non-clustered) index on column, 1,000,000 records with column = 1 and want to delete one of these records.

The record needs to be deleted from the index as well.

To locate the record to be deleted, a B-Tree search should be performed on the index. But if the branch nodes were not storing the value of the row pointer (be it a clustered key or a RID), the engine would have to scan all 1M records to determine the one to delete.

If the secondary key were UNIQUE, the value of the column would be enough to uniquely locate the node in the index, so storing the row pointer in the branch nodes is not required (and that's why they are not stored).

This discussion may be also interesting to you:

http://www.sqlservercentral.com/Forums/Topic714684-1545-6.aspx

Update:

To check the contents of the branch nodes, you can use DBCC IND:

CREATE TABLE t_clustered (id INT NOT NULL PRIMARY KEY, nval INT, uval INT)
CREATE TABLE t_nonclustered (id INT NOT NULL PRIMARY KEY NONCLUSTERED, nval INT, uval INT)

CREATE INDEX ix_clustered_nval ON t_clustered (nval)
CREATE UNIQUE INDEX ux_clustered_uval ON t_clustered (uval)
CREATE INDEX ix_nonclustered_nval ON t_nonclustered (nval)
CREATE UNIQUE INDEX ux_nonclustered_nval ON t_nonclustered (uval)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_clustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_nonclustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)

-- Replace mydb with your database name

DBCC IND (mydb, t_clustered, -1)
DBCC IND (mydb, t_nonclustered, -1)

In the output of these commands you should search for records with PageType = 2 (index page) and IndexLevel > 0 (non-leaf node) and find their PageID.

In my case, I got the following PageID: 21074, 21076, 21105, 21107. Note they are site specific: you will have the other values.

Then you should used DBCC PAGE to examine the contents of these pages:

DBCC PAGE (mydb, 1, 21074, 3)
DBCC PAGE (mydb, 1, 21076, 3)
DBCC PAGE (mydb, 1, 21105, 3)
DBCC PAGE (mydb, 1, 21107, 3)

FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  id (key)    KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  HEAP RID (key)     KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue

We see that the non-leaf nodes of the non-unique secondary index on nval contain record pointers (id (PRIMARY KEY CLUSTERED) and RID, appropriately), while those of the unique index on uval do not contain record pointers, only the values on the indexed column itself.

This is, again, because with a unique index the value of the column indexed is sufficient to locate its node in the index, while with a non-unique index it's not.

回复收藏 0 原文

命硬 2024-10-07 09:06:57

您在对其他人所说的内容（IT、B 树、索引结构）没有任何了解的情况下就他们所说的、所做的陈述提出问题。这是解答服务，而不是辅导服务。

“这听起来像是聚集键被添加到非唯一非聚集索引的（所有）中间节点”

不。Quassnoi 没有说过类似的话。你不能单独评估陈述（上下文中的答案；问题）。 CI 密钥仅适用于叶级别，不适用于“中间节点”。

“并且通过相同的逻辑，在非聚集表的情况下，RID 将添加到中间节点（？）”

逻辑？又不行了。确定大象的尾巴是由又厚又长的毛发制成的，并不意味着象鼻也是由毛发制成的。

问另一个关于非唯一、非聚集索引的非叶节点的问题。我对这个非问题感到有点困惑。

回答。对于您现在一致证明的理解水平，非聚集索引具有完整的聚集键值作为叶级别的条目。时期。故事结束。这没什么大不了的，因为 (a) 步骤数是相同的 (b) CI 索引（不是叶子）无论如何都会在缓存中，因此速度非常快，直到最后（叶子级别）才需要磁盘访问。

NCI 密钥查找，无 CI：
索引查找-> RID->数据行查找->数据行

NCI 键查找，使用 CI：
索引查找-> CI键->聚集索引查找->数据行

它的目的是什么？

性能。所有供应商都明白，由查询激活的功能链中最慢的组件是磁盘，它是唯一具有移动部件的组件。他们都尽力避免磁盘访问并提高性能。自 20 世纪 60 年代以来，索引本身就是避免磁盘访问的最基本结构。从那时起，基本的 B 树就没有改变过，它只是有上百万个微小的进步。

现在的问题是，虽然这是事实，但每个供应商都有（a）有自己的小特殊技术来增强（添加到，而不改变我的帖子中描述的基本操作）操作和（b）在 MicroShifty 中世界一直在变化，因为增强并不是真正的增强。重点是，这个极其低的水平与理解指数如何运作无关。或者 CI 或 NCI 是否适合您的特定用途；或各自的优点/缺点。

我已经确定，为了帮助你，不要涉足较低的层次，直到你了解基础知识，更高的层次......如果你这样做，你会迷失方向，这将成为你提出的意图的障碍的学习。正如这里所证明的。再次。

You are asking questions re what others have said without any understanding of the subject matter (IT; B-Trees; Index structures) re what they said, made statements about. This is an answer service, not a tutorial service.

"This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index"

No. Quassnoi said nothing of the sort. You cannot take statements (answers within a context; the question) and evaluate them in isolation. The CI key is only applicable to the leaf level, not the "intermediate nodes".

"And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)"

Logic ? No again. The determination that the elephants tail is made of thick, long hairs, does not imply that the trunk is made of hair also.

Ask another question re the non-leaf nodes of a non-unique, non-clustered index. I am getting a bit non-plussed about the non-issue.

Answer. For your now consistently evidenced level of understanding, the nonclustered index has the full clustered key value as the entry at the leaf level. Period. End of story. That is no big deal because (a) the number of steps are the same (b) the CI index (not leaf) will be in cache anyway, and thus very fast, without requiring disk access until the last (leaf level).

NCI key lookup, No CI:
Index lookup -> RID -> Data row lookup -> Data Row

NCI key lookup, with CI:
Index lookup -> CI key -> Clustered index lookup -> Data Row

What is the purpose of it?

Performance. All vendors understand that the slowest component in the chain of functions activated by a query, is the disk, the only component with moving parts. They all do their best to avoid disk access, and improve performance. The Index itself is the most basic structure for avoiding disk access, since the 1960's. The basic B-Tree has not changed since then, it has merely had a million tiny advancements.

Now the problem is, while that is true, each vendor has (a) has their own little special techniques that enhance (add to, without changing the basic operation as described in my posts to you) the operation and (b) in the MicroShifty world, it changes all the time, because the enhancements are, well, not really enhancements. Point being, that excruciatingly low level is not relevant to understanding how indices work; or whether a CI or NCI is good for your particular use; or the advantages/disadvantages of each.

I have already identified, in order to assist you, do not get involved in the lower levels, until you understand the basics, higher levels ... if you do, you will get lost, and it will be an obstacle to your presented intention of learning. As evidenced here. Again.

回复收藏 0 原文

~没有更多了~