与非聚集索引相比,外键列上的聚集索引是否会提高连接性能?

发布于 2024-08-24 15:31:47 字数 158 浏览 8 评论 0原文

在许多地方,建议在使用 BETWEEN 语句选择行范围时更好地利用聚集索引。当我选择通过外键字段加入以使用此聚集索引的方式时,我想,聚集也应该有所帮助,因为正在选择行范围,即使它们都具有相同的聚集键值并且不使用 BETWEEN 。

考虑到我只关心连接的选择而不是其他,我的猜测是否错误?

In many places it's recommended that clustered indexes are better utilized when used to select range of rows using BETWEEN statement. When I select joining by foreign key field in such a way that this clustered index is used, I guess, that clusterization should help too because range of rows is being selected even though they all have same clustered key value and BETWEEN is not used.

Considering that I care only about that one select with join and nothing else, am I wrong with my guess ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

缺⑴份安定 2024-08-31 15:31:47

绝对地讨论此类问题并不是很有用。

这总是具体情况而定!

本质上,通过聚集索引进行访问可以节省一次间接访问

假设 JOIN 中使用的键是聚集索引的键,在一次读取中 [无论是来自索引查找、扫描还是部分扫描,都无关紧要],您将获得整行(记录)。

聚集索引的一个问题是每个表只能获得一个索引。因此,您需要明智地使用它。事实上,在某些情况下,由于 INSERT 开销和碎片(取决于键和新键的顺序等),根本不使用任何聚集索引甚至是更明智的做法。

有时,人们可以获得聚集索引的同等好处,只需使用覆盖索引,即具有所需键序列的索引,后跟我们感兴趣的列值。就像聚集索引一样,覆盖索引不需要间接访问基础表。事实上,覆盖索引可能比聚集索引稍微更有效,因为它更小。
然而,就像聚集索引一样,除了存储开销之外,在 INSERT(以及 DELETE 或 UPDATE)查询期间还会产生与任何额外索引相关的性能成本

而且,是的,正如其他答案所示,用于聚集索引的键的“外键性”对索引的性能绝对没有影响。 FK 是旨在简化数据库完整性维护的约束,但底层字段(列)在其他方面与表中的任何其他字段一样。

要对索引结构做出明智的决策,需要

  • 了解各种索引类型(和堆)的工作方式
    (而且,顺便说一句,这在 SQL 实现之间有所不同),
  • 以便更好地了解手头数据库的统计概况:
    哪些是大表,哪些是关系,关系的平均/最大基数是多少,数据库的典型增长率是多少等,
  • 以便深入了解数据库的未来发展方式使用/查询

然后,也只有这样,人们才能对拥有给定聚集索引的兴趣(或缺乏兴趣)做出有根据的猜测。

Discussing this type of issue in the absolute isn't very useful.

It is always a case-by-case situation !

Essentially, access by way of a clustered index saves one indirection, period.

Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).

One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)

Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.

And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.

To make wise decisions about index structure, one needs

  • to understands the way the various index types (and the heap) work
    (and, BTW, this varies somewhat between SQL implementations)
  • to have a good image of the statistical profile of the database(s) at hand:
    which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
  • to have good insight regarding the way the database(s) is (are) going to be be used/queried

Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.

谢绝鈎搭 2024-08-31 15:31:47

我会问其他问题:将我的聚集索引放在外键列上只是为了加快单个 JOIN 的速度是否明智?它可能会有所帮助,但是......要付出代价!

聚集索引使表的每个操作都更快。是的!确实如此。请参阅 Kim Tripp 的精彩聚集索引争论仍在继续 了解背景信息。她还提到了聚集索引的主要标准:

  • 静态(永远不会改变)
  • 唯一
  • 尽可能

:不断增加的INT IDENTITY 完美地满足了这一点 - GUID 则不然。请参阅 GUID 作为主要键获取广泛的背景信息。

为什么要缩小?因为聚集键被添加到同一个表上每个非聚集索引的每个索引页中(以便能够在需要时实际查找数据行) )。您不想在集群键中包含 VARCHAR(200)...

为什么是唯一的? 参见上文 - 集群键是 SQL Server 用于唯一查找数据的项目和机制排。它必须是独一无二的。如果您选择非唯一的集群键,SQL Server 本身会向您的键添加一个 4 字节的唯一符。小心一点!

所以这些是我的标准 - 将你的聚类键放在一个狭窄的、稳定的、独特的、希望不断增加的列上。如果您的外键列与这些相匹配 - 完美!

但是,在任何情况下我都不会不会将集群键放在宽外键甚至复合外键上。请记住:聚集键的值将被添加到该表上的每个非聚集索引条目中!如果您的表中有 10 个非聚集索引,即 100,000 行 - 即一百万个条目。无论是 4 字节整数还是 200 字节 VARCHAR - HUGE,都会产生巨大差异。不仅在磁盘上,也在服务器内存中。非常非常仔细地考虑如何创建聚集索引!

SQL Server 可能需要添加一个唯一符 - 让事情变得更糟。如果这些值发生变化,SQL Server 就必须在各处进行大量的簿记和更新。

简而言之:

  • 在外键上放置索引绝对是个好主意 - 一直这样做!
  • 我会非常小心地将其设为聚集索引。首先,你只能得到一个聚集索引,那么你会选择哪一种FK关系呢?并且不要将聚类键放在宽且不断变化的列上

I would ask something else: would it be wise to put my clustered index on a foreign key column just to speed a single JOIN up? It probably helps, but..... at a price!

A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

  • narrow
  • static (never changes)
  • unique
  • if ever possible: ever increasing

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

So those are my criteria - put your clustering key on a narrow, stable, unique, hopefully ever-increasing column. If your foreign key column matches those - perfect!

However, I would not under any circumstances put my clustering key on a wide or even compound foreign key. Remember: the value(s) of the clustering key are being added to each and every non-clustered index entry on that table! If you have 10 non-clustered indices, 100'000 rows in your table - that's one million entries. It makes a huge difference whether that's a 4-byte integer, or a 200-byte VARCHAR - HUGE. And not just on disk - in server memory as well. Think very very carefully about what to make your clustered index!

SQL Server might need to add a uniquifier - making things even worse. If the values will ever change, SQL Server would have to do a lot of bookkeeping and updating all over the place.

So in short:

  • putting an index on your foreign keys is definitely a great idea - do it all the time!
  • I would be very very careful about making that a clustered index. First of all, you only get one clustered index, so which FK relationship are you going to pick? And don't put the clustering key on a wide and constantly changing column
审判长 2024-08-31 15:31:47

FK 列上的索引将有助于 JOIN,因为索引本身是有序的:集群仅意味着磁盘(叶)上的数据是有序的,而不是 B 树上的。

如果将其更改为覆盖索引,那么聚集与非聚集就无关紧要了。重要的是要有一个有用的索引。

An index on the FK column will help the JOIN because the index itself is ordered: clustered just means that the data on disk (leaf) is ordered rather then the B-tree.

If you change it to a covering index, then clustered vs non-clustered is irrelevant. What's important is to have a useful index.

醉态萌生 2024-08-31 15:31:47

这取决于数据库的实现。

对于 SQL Server,聚集索引是一种数据结构,其中数据存储为页,并且有 B 树并存储为单独的数据结构。您获得快速性能的原因是您可以快速到达链的开始,并且范围是一个易于遵循的链接列表。

非聚集索引是一种数据结构,其中包含指向实际记录的指针以及不同的关注点。

请参阅有关聚集索引结构的文档。

索引对于外键关系没有帮助,但由于“覆盖”索引的概念,它会有所帮助。如果您的 WHERE 子句包含基于索引的约束。它将能够更快地生成返回的数据集。这就是性能的来源。

It depends on the database implementation.

For SQL Server, a clustered index is a data structure where the data is stored as pages and there are B-Trees and are stored as a separate data structure. The reason you get fast performance, is that you can get to the start of the chain quickly and ranges are an easy linked list to follow.

Non-Clustered indexes is a data structure that contains pointers to the actual records and as such different concerns.

Refer to the documentation regarding Clustered Index Structures.

An index will not help in relation to a Foreign Key relationship, but it will help due to the concept of "covered" index. If your WHERE clause contains a constraint based upon the index. it will be able to generate the returned data set faster. That is where the performance comes from.

甜妞爱困 2024-08-31 15:31:47

如果您在集群中按顺序选择数据,通常会获得性能提升。此外,它完全取决于表(数据)的大小和您的 Between 语句中的条件。

The performance gains usually come if you are selecting data sequentially within the cluster. Also, it depends entirely on the size of the table (data) and the conditions in your between statement.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文