如果地址空间是连续的,对 IDENTITY 列的查找是否会更快?

发布于 2024-11-14 06:51:51 字数 218 浏览 7 评论 0原文

如果我有一个非常大的表,其中包含 IDENTITY 列 (bigint),并且该表可能会被删除,那么地址空间(可用 ID)的碎片是否会导致SELECTS 速度较慢?


澄清

我所说的地址空间碎片是指 ID 列中的值中留下的间隙,而不是从表中删除行时出现的磁盘碎片。

If I have a very large table with an IDENTITY column (bigint) and this table is subject to deletes, will the fragmentation of the address space (the available IDs) result in slower SELECTS?


Clarification:

By address space fragmentation I mean the gaps left in the values in the ID column, not the fragmentation on the disk that occurs when deleting rows from the table.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

非要怀念 2024-11-21 06:51:51

显然,如果不进行测试,就无法确定这一点。但是,我推测如果它是连续的或碎片的,它的运行速度不会更快或更慢。

假设您的列已建立索引 — SQL Server 将索引存储在 b 树中。该树的节点被设计为对于系统的分页系统具有最佳大小。无论索引的碎片如何,节点内的搜索都将识别正确的子节点页面。由于加载页面所需的时间将淹没在节点内搜索所需的时间,因此我认为碎片不会对查找时间产生任何影响。

另一方面,如果该列没有索引,那么 SQL Server 无论如何都必须执行全表扫描,因此值的分布根本不会影响时间。

它可能产生影响的一个地方是查询计划优化。 SQL Server 存储列内容的直方图,以便它可以选择能够提供足够性能的查询计划。在我看来,碎片化可能会导致它选择一个不如它可能考虑过的另一个计划好的计划。如果列统计信息不是最新的,就会出现这种情况。如果统计信息是最新的,则将检测到碎片,并且优化器可以考虑该信息。

Obviously, this can’t be known for certain without running tests. However, I will speculate that it would not run any faster or slower if it is contiguous or fragmented.

Assuming that your column is indexed—SQL Server stores indexes in a b-tree. The nodes of this tree are designed to be of optimal size for the system’s paging system. A search within the node is going to identify the correct child-node pages regardless of the fragmentation of the indexes. Since the time it takes to load the pages is going to swamp the time it takes to search within the nodes, I don't think fragmentation will have any effect on the lookup times.

If the column is not indexed, on the other hand, then SQL Server has to do a full table scan anyway, so the distribution of values is not going to affect the time at all.

The one place that it could have an effect is on query plan optimization. SQL Server stores histograms of column contents so that it may choose a query plan that will give adequate performance. It seems possible to me that fragmentation could cause it to choose a plan that is not as good as another plan that it might have considered. This would be the case if the column statistics are not up-to-date. If the statistics are up-to-date, then the fragmentation would be detected, and the optimizer could take that information into consideration.

纵性 2024-11-21 06:51:51

这取决于表的索引方式。假设您还在该列上放置了聚集索引,则使用该列的任何选择语句都应该非常快。在极端情况下,如果您正在进行范围扫描,可能需要将一些额外的页面拉入内存,但在 ID 上,很少会做这样的事情。通常,您通过搜索来获取这些行。

您仍然应该有一个维护计划来清理所有碎片,特别是当您进行大量删除时。我必须承认这不是我的强项,所以我不知道 SQL 2008 和/或您的存储介质是否可能使这变得不必要。

It depends on how the table is indexed. Assuming that you are also placing a clustered index on the column, any select statements using that column should be very fast. In the extreme, it might require some extra pages being pulled into memory if you're doing range scans, but on an ID it's pretty rare to be doing something like that. Usually you're getting those rows by seeks.

You should still have a maintenance plan that cleans up any fragmentation, especially if you are doing a lot of deletes. I have to admit that this isn't my strong suit though, so I don't know if SQL 2008 and/or your storage medium might make this unnecessary.

心在旅行 2024-11-21 06:51:51

不。

如果它不是索引列,那么碎片当然并不重要,因为无论如何引擎都会进行表扫描。 (根据实际的查询/连接,引擎不一定会进行表扫描,但从列本身的角度来看,引擎除了表扫描之外没有什么更好的事情可做)

如果是索引列,则索引存储在在数据插入或删除过程中扩展或收缩的树结构。对于这种树结构,您需要了解的一件事是,即使您按顺序插入数据,它也是“碎片”的。这里的碎片并不是磁盘分配单元意义上的碎片,而是树中的每个节点并没有完全利用它所覆盖的数据范围。预期的碎片是为了避免过于频繁的树重组。引擎在重构索引树时会使用占用率(可以在创建索引时指定)。所以,无论id是否连续,它存储在一个更大的存储空间中,其中存在一些“间隙”。删除列不应对性能产生任何明显的差异。

然而,数据库也是按页分配存储空间,这意味着会存在碎片和相关的性能问题。但对于你的问题,这与删除 id 列无关。

No.

If it is not an index column, certainly the fragmentation does not matter because anyway engine does table scan. (Depending on the actually query/join, the engine is not necessarily doing table scan but from the perspective of the column itself, engine has nothing better to do other than table scan)

If it is an indexed column, the index is stored in a tree structure that is expanded or shrunk during data insertion or deletion. One thing you need to know for this tree structure is that it is "fragmented" even if you insert data sequentially. The fragmentation here is not in the sense of disk allocation unit, but each node in the tree is not fully used for the data range it covers. The intended fragmentation is to avoid too frequent tree restructuring. The engine uses an occupancy ratio when it restructures a index tree (which can be specified during the creation of an index). So, no matter if the id is continuous or not, it is stored in a bigger storage space with some "gaps" in it. Deletion on the column should not create any noticeable difference on performance.

However, database also allocates storage space by pages, which means there will be fragmentation and related performance issue. But to your question, this has nothing to do with the deletion on the id column.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文