在理解聚集索引时我错过了什么?
如果没有任何索引,则通过 IAM((索引分配映射)访问表行。
我可以使用 IAM 以编程方式直接访问行吗?
没有索引是否意味着读取特定行的唯一方法是全表扫描读取所有表?
为什么 IAM 不能参与更具体的直接访问?
“如果表是堆(换句话说,它没有聚集索引),则书签是行标识符 (RID),它是 File#:Page#:Slot# 形式的实际行定位符” [1a]
插槽没有进一步的定义。好吧,其他消息来源告诉我们 Slot# 实际上是行号。正确的?或者需要与 IAM 进一步并置以确定特定行?
现在,引入聚集索引意味着无法直接访问数据,只能通过最终的聚集索引查找或顺序遍历聚集叶节点。
我是否正确理解引入聚集索引仅对选择连续相邻(范围)行并且仅通过聚集索引键有益?
对表进行集群还有哪些好处?
我是否正确理解聚集索引的引入会恶化非聚集索引参与非精确匹配查询的性能优势?没有直接访问、顺序访问无法并行、非聚集索引由聚集索引键增加等等,对吗?
好吧,我发现对表进行聚类对于非常具体且易于理解的上下文是有意义的,而在对表进行聚类时总是默认创建主键。这是为什么呢?
我在理解聚集索引时错过了什么?
[1]
Microsoft® SQL Server™ 2005 内部:存储引擎
作者:Kalen Delaney -(扎实的优质学习)
................................................
出版商:微软出版社
发布日期:2006 年 10 月 11 日
打印 ISBN-10:0-7356-2105-5
打印 ISBN-13:978-0-7356-2105-3
页数:464
[1a] p.250 第 7 章中的索引组织。索引内部结构和管理
这里是有用的在线复制粘贴
http://sqlserverindexeorgnization.blogspot.com/
尽管没有任何来源
相关问题:
更新: @PerformanceDBA,
- “请忘记您引用的 doco,然后重新开始”
基于什么重新开始我?
任何参考,任何建议。技术如何重新开始?
- **“聚集索引总是更好”
你能回答我的问题为什么/何时/如何选择全聚集索引扫描而不是全表扫描? 疑问是全聚集索引扫描的含义是什么。不是比全表扫描读取的多吗?
- ““如果有 IAM,那么就有索引”
那么,如果根本没有索引,就没有 IAM 吗?
有CI就有IAM吗?
我该如何验证/研究它?
如果所有文档都写相反的内容:
- 非索引表上有 IAM
- 如果有聚集索引,则没有 IAM。
In absence of any index the table rows are accessed through IAM ((Index Allocation Map).
Can I directly access a row programmatically using IAM?
Does absence of index mean that the only way to read specific row is full table scan reading all table?
Why IAM cannot be engaged for more specific direct access?
"If the table is a heap (in other words, it has no clustered index), the bookmark is a row identifier (RID), which is an actual row locator in the form File#:Page#:Slot#" [1a]
There was no further definition of slot. Well, other sources tell that Slot# is really row number. Correct? or some further juxtaposing with IAM needed to determine specific row?
Now, introduction of clustered index means that no data can be directly accessed but only through eventually clustered index lookup or traversing clustered leaf nodes sequentially.
Do I understand correctly that introduction of clustered indexes is beneficial only for selecting continuous adjacent (ranges of) rows and only through clustered index keys?
Which else benefits are in clustering a table?
Do I understand correctly that clustered index introduction worsen the performance benefits of non-clustered indexes engagement for non-exact match queries? No direct access, sequential access cannot be parallelized, non-clustered indexes are increased by clustered index keys, etc., correct?
Well, I see that clustering a table makes sense for quite specific and well understood contexts while creation of primary keys always default in clustering a table. Why is it?
What do I miss in clustered indexes understanding?
[1]
Inside Microsoft® SQL Server™ 2005: The Storage Engine
By Kalen Delaney - (Solid Quality Learning)
...............................................
Publisher: Microsoft Press
Pub Date: October 11, 2006
Print ISBN-10: 0-7356-2105-5
Print ISBN-13: 978-0-7356-2105-3
Pages: 464
[1a] p.250 Section Index organization from Chapter 7. Index Internals and Management
Here is helpful online copypaste from it
http://sqlserverindexeorgnization.blogspot.com/
though without any credits to source
Related questions:
- No direct access to data row in clustered table - why?
- Why/when/how is whole clustered index scan chosen rather than full table scan?
- Reasons not to have a clustered index in SQL Server 2005
Update: @PerformanceDBA,
- "please, forget the doco you reference and start again"
Starting me again on the basis of what?
Any references, any advices. techniques how to start again?
- **"A Clustered Index is always better"
Can you answer my question Why/when/how is whole clustered index scan chosen rather than full table scan? The doubt is what is the meaning of Full Clustered Index Scan. Does not it read more than Full Table Scan?
- ""If there is an IAM, then there is an Index"
So, there is no IAM if there is no index at all?
There is IAM if there is CI?
How am I supposed to verify/study it?
if all docs write the opposite:
- there is IAM on non-indexed table
- there is no IAM if there is clustered index.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是很多问题。是的,IAM 用于查找堆上的页面。不同之处在于,如果没有索引,就无法知道对于任何给定的数据片段要检索哪些页面。 SQL/关系数据模型的一个重要特征是查询仅通过数据值访问数据,而不是直接使用指针或其他结构。
槽号仅标识页面内的行。行数据在页内没有逻辑排序,即使在聚集索引中也是如此。每个数据页都包含一个行偏移表,该表指向页内行的位置。
由于需要额外的书签查找,聚集索引可能会减慢非聚集索引的数据访问速度。这可以通过使用 INCLUDE 子句将列添加到 NC 索引来缓解。有时,表上没有聚集索引可能会更有效。
That's a lot of questions. Yes the IAM is used to look up pages on a heap. The difference is that without an index there is no way to know what pages to retrieve for any given piece of data. An important feature of the SQL / relational model of data is that queries access data by data values only - never by using pointers or other structures directly.
A slot number just identifies a row within a page. Row data is not logically ordered within a page, even in a clustered index. Each data page contains a row offset table that points to the position of rows within a page.
A clustered index can slow down data access from nonclustered indexes because of the additional bookmark lookups required. This can be mitigated by using the INCLUDE clause to add columns to a NC index. Sometimes it may be more efficient not to have a clustered index on a table.
请首先阅读我在“无法直接访问聚集表中的数据行 - 为什么?”下的回答。。
如果有 IAM,那么就有索引。
但如果没有 CI,那么行就在堆中,是的,如果您想直接读取它(不使用 NCI,或者不存在索引),您只能对堆进行表扫描。
聚集索引总是比没有聚集索引好。对于异常或不合标准的情况,有一个例外和一个警告:
非唯一 CI 密钥。这会导致溢出页面。关系表需要具有唯一键,因此这不是关系表。通过超载列可以很容易地使 CI 变得唯一。非唯一 CI 仍然比没有 CI 更好(根据我的其他帖子)。
非唯一
单调键。通常是 IDENTITY 列。插入的键始终位于最后一页,而不是随机插入分布在整个数据存储结构中的行(这对于“良好”的自然关系键来说是正常的)。这会导致插入热点,并降低并发性。关系键应该是自然唯一的;代理始终是一个附加索引。仅代理根本就不是关系表(它是一组非标准化电子表格,行标识符将它们链接在一起;您将无法从中获得数据库的功能)。
.
因此,长期建议是,使用 NCI 作为单调密钥,并确保 CI 允许良好的数据分布。
CI 的优点是巨大的,没有充分的理由去拥有它(可能有上面提到的不好的理由)。
CI 允许范围查询; NCI 则不然。但这并不是唯一的原因。
另一个需要注意的是,您需要保持 CI 密钥的宽度较小,因为它是在 NCI 中携带的。现在通常这不是问题,因为宽 CI 键就可以了。但是,如果您有一堆伪装成数据库的非标准化电子表格,这会导致比标准化数据库多得多的索引,这确实成为一个考虑因素。因此,对 Empire 爱好者的长期建议是,保持 CI 键的宽度较小。 CI 不会“增加”NCI,这一点没有准确说明。如果您有 NCI,那么它将有一个指针或 CI 密钥;如果您有 CI(具有所有好处),那么 CI 键而不是 RowId 的成本可以忽略不计。所以准确的说法是,宽 CI 键会增加 NCI。
任何说 CI 的顺序访问不能并行化的人都是错误的(MS 可能会在一个版本中破坏它并在下一个版本中修复它,但这是暂时的)。
使用 ANSI SQL ...PRIMARY KEY ... 表示法默认为 UNIQUE CLUSTERED。因为数据库应该是关系型的。 Unique PK 应该是一个友好的关系键,而不是一个愚蠢的 IDENTITY 列。因此(不包括例外)主键总是聚类的最佳候选。
您始终可以通过避免 ANSI SQL ...PRIMARY KEY ... 表示法并使用 CREATE [UNIQUE] [CLUSTERED] INDEX 表示法来创建所需的任何索引。
不可能回答你的最后一个问题,你必须继续问问题,直到问完为止。但是请忘记您引用的 doco 并重新开始,否则我们将在这里花几天时间讨论清晰的知识和晦涩难懂的知识之间的区别。
Please read my answer under "No direct access to data row in clustered table - why?", first.
If there is an IAM, then there is an Index.
But if the is no CI, then the rows are in a Heap, and yes, if you want to read it directly (without using an NCI, or where no Indices exist), you can only table scan a Heap.
A Clustered Index is always better that not having one. There is one exception, and one caveat, both for abnormal or sub-standard conditions:
Non-Unique CI Key. This causes Overflow Pages. Relational tables are required to have unique keys, so this is not a Relational table. The CI can be made unique quite easily by overloading the columns. A Non-unique CI is still better (as per my other post) to have a Non-unique CI than no CI.
Monotonic Key. Typically an IDENTITY column. Instead of random Inserts which insert rows distributed throughout the data storage structure (as is normal with a "good" natural Relational key), the inserted Key is always on the last page. This causes an Insert hotspot, and reduces concurrency. Relational keys are supposed to naturally unique; the surrogate is always an additional index. A surrogate-only is simply not a relational table (it is group of Unnormalised spreadsheets with row identifiers linking them together; you will not get th epower of a databse from that).
.
So the standing advice is, use an NCI for monotonic keys, and ensure that the CI allows good data distribution.
The advantages of CIs are vast, there is no good reason to have one (there may be bad reasons as alluded to above).
CIs allow range queries; NCIs do not. But that is not the only reason.
Another caveat is you need to keep the width of the CI Key small, because it is carried in the NCIs. Now normally this is not a problem, as in, wide CI keys are fine. But where you have an Unormalise dbunch of spreadsheets masquerading as a database, which results in many more indices than a Normalised database, that does become a consideration. Therefore the standing advice for Empire devotees is, keep the width of the CI key down. CIs do not "increase" the NCIs, that is not stated accurately. If you have NCIs, well, it is going to have a pointer or a CI key; if you have a CI (with all the benefits) then the cost is, a CI key instead of a RowId, negligible. So the accurate statement is, fat wide CI keys increase the NCIs.
Whoever says sequential access of CIs cannot be parallelised is wrong (MS may break it in one version and fix it in the next, but that is transient).
Using the ANSI SQL ...PRIMARY KEY ... notation defaults to UNIQUE CLUSTERED. because the db is supposed to be Relational. And the Unique PK is supposed to be a nice friendly Relational key, not a idiotic IDENTITY column. Therefore invariably (not counting exceptions) the PRIMARY KEY is the best candidate for clustering.
You can always create whatever indices you want by avoiding the ANSI SQL ...PRIMARY KEY ... notation and using CREATE [UNIQUE] [CLUSTERED] INDEX notation instead.
It is not possible to answer that last question of yours, you will have to keep asking questions until you run out. But please, forget the doco you reference and start again, otherwise we will be here for days discussing the difference between clear knowledge and gobbledegook.