为什么要向 NCI 上的所有(中间)节点添加聚集索引键?
考虑聚簇表,
Quassnoi 写道(答案中的最后一句话):
这听起来像是聚集键被添加到非唯一非聚集索引的(所有)中间节点。并且通过相同的逻辑,在非聚集表的情况下,RID 被添加到中间节点(?)
它的目的是什么?
更新:
目前这个问题有 9 票:-5,+4,从纯粹的匿名开始 -3), 正确答案与大多数 msdn 文档相矛盾。
它的价值实际上并不在于其本身,而在于如何解决此类与 SQL Server 内部结构相关的问题,这些问题要么是文档中描述的矛盾、不正确或不充分。
更新2:@Quassnoi,
感谢您的回答丰富了我调查自己而不问愚蠢问题的能力。
DBCC IND() 不输出 PageID。我不明白它的 PagePID(来自 DBCC IND 的输出)对应于 DBCC DBCC Page() 输出中的 PageID。
我对它们的使用(以及内部结构的研究/调查)或其他替代方案有更多疑问。我不确定为什么此类问题在这里被视为垃圾邮件。
您能为我推荐适合此类问题(关于 SQL Server 内部)的论坛/论坛吗?
Considering clustered table,
Quassnoi wrote (the last phrase in answer):
This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)
What is the purpose of it?
Update:
Currently this question has 9 votes: -5, +4, started from mere anonymous -3),
the correct answer contradicts to most msdn docs.
Its value is not in fact itself but in how to solve this kind of issues concerning of SQL Server internals either contradictory or incorrectly or insufficiently described in docs.
Update2: @Quassnoi,
thanks for your answer enriching my abilities to investigate myself without asking silly questions.
DBCC IND() does not output PageID. I undestood that its PagePID instead (from output of DBCC IND) corresponds to PageID in output of DBCC DBCC Page().
I have more questions on their use (and study/investigation of internals), or other alternatives. I am not sure why this type of questions is considered as spam here.
Can you advise me proper forums/board for this type of questions (on SQL Server internals)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,这是真的。
这样做是为了提高索引的可维护性。
假设您在
column
上有一个辅助(非聚集)索引,其中1,000,000
记录的column = 1
并且想要删除其中一条记录。该记录也需要从索引中删除。
要找到要删除的记录,应在索引上执行
B-Tree
搜索。但如果分支节点没有存储行指针的值(无论是聚集键还是 RID),引擎将必须扫描所有 1M 记录以确定要删除的那个。如果辅助键是
UNIQUE
,则column
的值足以唯一地定位索引中的节点,因此不需要将行指针存储在分支节点中(这就是为什么它们不被存储)。您可能也会对此讨论感兴趣:
http://www.sqlservercentral.com/ Forums/Topic714684-1545-6.aspx
更新:
要检查分支节点的内容,可以使用
DBCC IND
:在这些命令的输出中您应该搜索
PageType = 2
(索引页)和IndexLevel > 的记录0
(非叶节点)并找到其PageID
。就我而言,我得到以下
PageID
:21074、21076、21105、21107。请注意,它们是特定于站点的:您将拥有其他值。然后你应该使用
DBCC PAGE
来检查这些页面的内容:我们看到
nval
上的非唯一二级索引的非叶节点包含记录指针(< code>id (PRIMARY KEY CLUSTERED) 和RID
,相应地),而uval
上的唯一索引不包含记录指针,仅包含索引列本身。这又是因为对于唯一索引,索引列的值足以在索引中定位其节点,而对于非唯一索引则不然。
Yes, this is true.
This is done to improve the maintainability of an index.
Say, you have a secondary (non-clustered) index on
column
,1,000,000
records withcolumn = 1
and want to delete one of these records.The record needs to be deleted from the index as well.
To locate the record to be deleted, a
B-Tree
search should be performed on the index. But if the branch nodes were not storing the value of the row pointer (be it a clustered key or aRID
), the engine would have to scan all1M
records to determine the one to delete.If the secondary key were
UNIQUE
, the value of thecolumn
would be enough to uniquely locate the node in the index, so storing the row pointer in the branch nodes is not required (and that's why they are not stored).This discussion may be also interesting to you:
http://www.sqlservercentral.com/Forums/Topic714684-1545-6.aspx
Update:
To check the contents of the branch nodes, you can use
DBCC IND
:In the output of these commands you should search for records with
PageType = 2
(index page) andIndexLevel > 0
(non-leaf node) and find theirPageID
.In my case, I got the following
PageID
: 21074, 21076, 21105, 21107. Note they are site specific: you will have the other values.Then you should used
DBCC PAGE
to examine the contents of these pages:We see that the non-leaf nodes of the non-unique secondary index on
nval
contain record pointers (id (PRIMARY KEY CLUSTERED)
andRID
, appropriately), while those of the unique index onuval
do not contain record pointers, only the values on the indexed column itself.This is, again, because with a unique index the value of the column indexed is sufficient to locate its node in the index, while with a non-unique index it's not.
您在对其他人所说的内容(IT、B 树、索引结构)没有任何了解的情况下就他们所说的、所做的陈述提出问题。这是解答服务,而不是辅导服务。
“这听起来像是聚集键被添加到非唯一非聚集索引的(所有)中间节点”
不。Quassnoi 没有说过类似的话。你不能单独评估陈述(上下文中的答案;问题)。 CI 密钥仅适用于叶级别,不适用于“中间节点”。
“并且通过相同的逻辑,在非聚集表的情况下,RID 将添加到中间节点(?)”
逻辑?又不行了。确定大象的尾巴是由又厚又长的毛发制成的,并不意味着象鼻也是由毛发制成的。
问另一个关于非唯一、非聚集索引的非叶节点的问题。我对这个非问题感到有点困惑。
回答。对于您现在一致证明的理解水平,非聚集索引具有完整的聚集键值作为叶级别的条目。时期。故事结束。这没什么大不了的,因为 (a) 步骤数是相同的 (b) CI 索引(不是叶子)无论如何都会在缓存中,因此速度非常快,直到最后(叶子级别)才需要磁盘访问。
NCI 密钥查找,无 CI:
索引查找-> RID->数据行查找->数据行
NCI 键查找,使用 CI:
索引查找-> CI键->聚集索引查找->数据行
它的目的是什么?
性能。所有供应商都明白,由查询激活的功能链中最慢的组件是磁盘,它是唯一具有移动部件的组件。他们都尽力避免磁盘访问并提高性能。自 20 世纪 60 年代以来,索引本身就是避免磁盘访问的最基本结构。从那时起,基本的 B 树就没有改变过,它只是有上百万个微小的进步。
现在的问题是,虽然这是事实,但每个供应商都有(a)有自己的小特殊技术来增强(添加到,而不改变我的帖子中描述的基本操作)操作和(b)在 MicroShifty 中世界一直在变化,因为增强并不是真正的增强。重点是,这个极其低的水平与理解指数如何运作无关。或者 CI 或 NCI 是否适合您的特定用途;或各自的优点/缺点。
我已经确定,为了帮助你,不要涉足较低的层次,直到你了解基础知识,更高的层次......如果你这样做,你会迷失方向,这将成为你提出的意图的障碍的学习。正如这里所证明的。再次。
You are asking questions re what others have said without any understanding of the subject matter (IT; B-Trees; Index structures) re what they said, made statements about. This is an answer service, not a tutorial service.
"This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index"
No. Quassnoi said nothing of the sort. You cannot take statements (answers within a context; the question) and evaluate them in isolation. The CI key is only applicable to the leaf level, not the "intermediate nodes".
"And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)"
Logic ? No again. The determination that the elephants tail is made of thick, long hairs, does not imply that the trunk is made of hair also.
Ask another question re the non-leaf nodes of a non-unique, non-clustered index. I am getting a bit non-plussed about the non-issue.
Answer. For your now consistently evidenced level of understanding, the nonclustered index has the full clustered key value as the entry at the leaf level. Period. End of story. That is no big deal because (a) the number of steps are the same (b) the CI index (not leaf) will be in cache anyway, and thus very fast, without requiring disk access until the last (leaf level).
NCI key lookup, No CI:
Index lookup -> RID -> Data row lookup -> Data Row
NCI key lookup, with CI:
Index lookup -> CI key -> Clustered index lookup -> Data Row
What is the purpose of it?
Performance. All vendors understand that the slowest component in the chain of functions activated by a query, is the disk, the only component with moving parts. They all do their best to avoid disk access, and improve performance. The Index itself is the most basic structure for avoiding disk access, since the 1960's. The basic B-Tree has not changed since then, it has merely had a million tiny advancements.
Now the problem is, while that is true, each vendor has (a) has their own little special techniques that enhance (add to, without changing the basic operation as described in my posts to you) the operation and (b) in the MicroShifty world, it changes all the time, because the enhancements are, well, not really enhancements. Point being, that excruciatingly low level is not relevant to understanding how indices work; or whether a CI or NCI is good for your particular use; or the advantages/disadvantages of each.
I have already identified, in order to assist you, do not get involved in the lower levels, until you understand the basics, higher levels ... if you do, you will get lost, and it will be an obstacle to your presented intention of learning. As evidenced here. Again.