SQL Server 堆与聚集索引
我使用的是SQL Server 2008。我知道如果一个表没有聚集索引,那么它被称为堆,否则存储模型被称为聚集索引(B-Tree)。
我想更多地了解堆存储的确切含义、它的外观以及它是否组织为“堆”数据结构(例如最小堆、最大堆)。有什么推荐读物吗?我想要更多一点的内部结构,但不要太深。 :-)
提前致谢, 乔治
I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).
I want to learn more about what exactly means heap storage, what it looks like and whether it is organized as "heap" data structure (e.g. minimal heap, maximum heap). Any recommended readings? I want to more a bit more internals, but not too deep. :-)
thanks in advance,
George
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
堆存储与这些堆无关。
堆只是意味着记录本身没有排序(即没有相互链接)。
当您插入一条记录时,它只会被插入到数据库找到的可用空间中。
更新基于堆的表中的行不会影响其他记录(尽管它会影响二级索引)
如果在
HEAP
表上创建二级索引,RID
(一种指向存储空间的物理指针)用作行指针。聚集索引意味着记录是
B-Tree
的一部分。当您插入记录时,B-Tree
需要重新链接。更新聚集表中的行会导致重新链接 B 树,即更新其他记录中的内部指针。
如果在聚集表上创建二级索引,则聚集索引键的值将用作行指针。
这意味着聚集索引应该是唯一的。如果聚集索引不唯一,则会将一个名为
uniquifier
的特殊隐藏列附加到索引键,以使其唯一(并且大小更大)。还值得注意的是,在列上创建二级索引会使值或聚集索引的键成为二级索引键的一部分。
通过在聚集表上创建索引,您实际上总是会得到一个复合索引。
索引
IX_mytable_5678
实际上是以下列上的索引:这还有一个副作用:
DESC 条件在
SQL Server
中有意义此索引:
可以在这样的查询中使用:
,而这个:
可以在这样的查询中使用:
Heap storage has nothing to do with these heaps.
Heap just means records themselves are not ordered (i. e. not linked to one another).
When you insert a record, it just gets inserted into the free space the database finds.
Updating a row in a heap based table does not affect other records (though it affects secondary indexes)
If you create a secondary index on a
HEAP
table, theRID
(a kind of a physical pointer to the storage space) is used as a row pointer.Clustered index means that the records are part of a
B-Tree
. When you insert a record, theB-Tree
needs to be relinked.Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.
If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.
This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called
uniquifier
is appended to the index key that makes if unique (and larger in size).It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.
By creating an index on a clustered table, you in fact always get a composite index
Index
IX_mytable_5678
is in fact an index on the following columns:This has one more side effect:
A
DESC
condition in a single-column index on a clustered table makes sense inSQL Server
This index:
can be used in a query like this:
, while this one:
can be used in a query like this:
堆只是没有集群键的表 - 没有强制执行特定物理顺序的键。
我真的不建议在任何时候使用堆 - 除非您临时使用表来批量加载外部文件,然后将这些行分发到其他表。
在所有其他情况下,我强烈建议使用集群键。默认情况下,SQL Server 将使用主键作为集群键 - 在大多数情况下这是一个不错的选择。除非您使用 GUID (UNIQUEIDENTIFIER) 作为主键,在这种情况下使用它作为集群键是一个可怕的主意。
请参阅 Kimberly Tripp 的优秀博客文章 作为主键和/或集群键的 GUID 和 聚集索引争论仍在继续,提供了出色的解释,说明为什么您应该始终拥有聚集键,以及为什么 GUID 是一个糟糕的聚集键。
我的建议是:
INT IDENTITY
作为主键,并让 SQL Server 将其作为集群键/code> - 如果没有其他列可以使用,我什
Heaps are just tables without a clustering key - without a key that enforces a certain physical order.
I would not really recommend having heaps at any time - except maybe if you use a table temporarily to bulk-load an external file, and then distribute those rows to other tables.
In every other case, I would strongly recommend using a clustering key. SQL Server will use the Primary Key as the clustering key by default - which is a good choice, in most cases. UNLESS you use a GUID (UNIQUEIDENTIFIER) as your primary key, in which case using that as your clustering key is a horrible idea.
See Kimberly Tripp's excellent blog posts GUIDs as Primary and/or the clustering key and The Clustered Index Debate Continues for excellent explanations why you should always have a clustering key, and why a GUID is a horrible clustering key.
My recommendation would be:
INT IDENTITY
as your primary key and let SQL Server make that the clustering key as wellINT IDENTITY
- and I would even create a separate INT column just for that purpose, if no other column can be usedMarc
联机丛书是最好的来源!
整个数据库引擎 - 规划和架构 - 表和索引数据结构架构是非常好的内部介绍。
从此链接 您可以下载在线图书的本地副本(免费)。这是所有 Sql 2008 问题的最佳(也是官方)参考。
Books Online is the best source!
The whole Database Engine - Planning and Architecture - Tables and Index Data Structures Architecture is very good internal introduction.
From this link you can download a local copy of Books Online(it is free). It is the best (and official) reference to all Sql 2008 questions.