SQL Server 堆与聚集索引

发布于 2024-08-03 13:44:58 字数 178 浏览 11 评论 0原文

我使用的是SQL Server 2008。我知道如果一个表没有聚集索引,那么它被称为堆,否则存储模型被称为聚集索引(B-Tree)。

我想更多地了解堆存储的确切含义、它的外观以及它是否组织为“堆”数据结构(例如最小堆、最大堆)。有什么推荐读物吗?我想要更多一点的内部结构,但不要太深。 :-)

提前致谢, 乔治

I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).

I want to learn more about what exactly means heap storage, what it looks like and whether it is organized as "heap" data structure (e.g. minimal heap, maximum heap). Any recommended readings? I want to more a bit more internals, but not too deep. :-)

thanks in advance,
George

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不顾 2024-08-10 13:44:58

堆存储与这些堆无关。

堆只是意味着记录本身没有排序(即没有相互链接)。

当您插入一条记录时,它只会被插入到数据库找到的可用空间中。

更新基于堆的表中的行不会影响其他记录(尽管它会影响二级索引)

如果在 HEAP 表上创建二级索引,RID(一种指向存储空间的物理指针)用作行指针。

聚集索引意味着记录是B-Tree的一部分。当您插入记录时,B-Tree 需要重新链接。

更新聚集表中的行会导致重新链接 B 树,即更新其他记录中的内部指针。

如果在聚集表上创建二级索引,则聚集索引键的值将用作行指针。

这意味着聚集索引应该是唯一的。如果聚集索引不唯一,则会将一个名为 uniquifier 的特殊隐藏列附加到索引键,以使其唯一(并且大小更大)。

还值得注意的是,在列上创建二级索引会使值或聚集索引的键成为二级索引键的一部分。

通过在聚集表上创建索引,您实际上总是会得到一个复合索引。

CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)

CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)

索引 IX_mytable_5678 实际上是以下列上的索引:

col5
col6
col7
col8
col1
col2
col3
col4

这还有一个副作用:

DESC 条件在 SQL Server 中有意义

此索引:

CREATE INDEX IX_mytable ON mytable (col1)

可以在这样的查询中使用:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id

,而这个:

CREATE INDEX IX_mytable ON mytable (col1 DESC)

可以在这样的查询中使用:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id DESC

Heap storage has nothing to do with these heaps.

Heap just means records themselves are not ordered (i. e. not linked to one another).

When you insert a record, it just gets inserted into the free space the database finds.

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

If you create a secondary index on a HEAP table, the RID (a kind of a physical pointer to the storage space) is used as a row pointer.

Clustered index means that the records are part of a B-Tree. When you insert a record, the B-Tree needs to be relinked.

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.

This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier is appended to the index key that makes if unique (and larger in size).

It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.

By creating an index on a clustered table, you in fact always get a composite index

CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)

CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)

Index IX_mytable_5678 is in fact an index on the following columns:

col5
col6
col7
col8
col1
col2
col3
col4

This has one more side effect:

A DESC condition in a single-column index on a clustered table makes sense in SQL Server

This index:

CREATE INDEX IX_mytable ON mytable (col1)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id

, while this one:

CREATE INDEX IX_mytable ON mytable (col1 DESC)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id DESC
青衫儰鉨ミ守葔 2024-08-10 13:44:58

堆只是没有集群键的表 - 没有强制执行特定物理顺序的键。

我真的不建议在任何时候使用堆 - 除非您临时使用表来批量加载外部文件,然后将这些行分发到其他表。

在所有其他情况下,我强烈建议使用集群键。默认情况下,SQL Server 将使用主键作为集群键 - 在大多数情况下这是一个不错的选择。除非您使用 GUID (UNIQUEIDENTIFIER) 作为主键,在这种情况下使用它作为集群键是一个可怕的主意。

请参阅 Kimberly Tripp 的优秀博客文章 作为主键和/或集群键的 GUID聚集索引争论仍在继续,提供了出色的解释,说明为什么您应该始终拥有聚集键,以及为什么 GUID 是一个糟糕的聚集键。

我的建议是:

  • 在 99% 的情况下,尝试使用 INT IDENTITY 作为主键,并让 SQL Server 将其作为集群键
  • 例外 #1:如果您批量加载巨大的数据数据量很大,临时表没有主键/集群键可能没问题。
  • 例外#2:如果必须使用 GUID 作为主键,则将集群键设置为不同的列 - 最好是 INT IDENTITY< 至会为此目的创建一个单独的 INT 列

/code> - 如果没有其他列可以使用,我什

Heaps are just tables without a clustering key - without a key that enforces a certain physical order.

I would not really recommend having heaps at any time - except maybe if you use a table temporarily to bulk-load an external file, and then distribute those rows to other tables.

In every other case, I would strongly recommend using a clustering key. SQL Server will use the Primary Key as the clustering key by default - which is a good choice, in most cases. UNLESS you use a GUID (UNIQUEIDENTIFIER) as your primary key, in which case using that as your clustering key is a horrible idea.

See Kimberly Tripp's excellent blog posts GUIDs as Primary and/or the clustering key and The Clustered Index Debate Continues for excellent explanations why you should always have a clustering key, and why a GUID is a horrible clustering key.

My recommendation would be:

  • in 99% of all cases try to use a INT IDENTITY as your primary key and let SQL Server make that the clustering key as well
  • exception #1: if you're bulk loading huge data amounts, you might be fine without a primary / clustering key for your temporary table
  • exception #2: if you must use a GUID as your primary key, then set your clustering key to a different column - preferably a INT IDENTITY - and I would even create a separate INT column just for that purpose, if no other column can be used

Marc

你列表最软的妹 2024-08-10 13:44:58

联机丛书是最好的来源!

整个数据库引擎 - 规划和架构 - 表和索引数据结构架构是非常好的内部介绍。

从此链接 您可以下载在线图书的本地副本(免费)。这是所有 Sql 2008 问题的最佳(也是官方)参考。

Books Online is the best source!

The whole Database Engine - Planning and Architecture - Tables and Index Data Structures Architecture is very good internal introduction.

From this link you can download a local copy of Books Online(it is free). It is the best (and official) reference to all Sql 2008 questions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文