MS-SQL 何时维护表索引?
为了便于论证,我们假设它适用于 SQL 2005/8。据我所知,当您在表上放置索引来调整 SELECT
语句时,这些索引需要在 INSERT
/ UPDATE
/ DELETE< 期间维护/代码> 动作。
我的主要问题是:
SQL Server 何时维护表的索引?
我有很多后续问题:
我天真地认为它会在执行命令后这样做。假设您要插入 20 行,它将在插入并提交 20 行后维护索引。
如果出现以下情况会发生什么 脚本具有多个语句 靠在桌子上,但在其他方面 不同的陈述?
服务器有智能吗 毕竟要维护索引 语句已执行或确实执行 每条语句都有它吗?
我见过在大量/许多 INSERT
/ UPDATE
操作后删除并重新创建索引的情况。
这可能会导致重建 整个表的索引,即使您 只更改少数行?
是否会有性能优势 尝试整理
INSERT
和UPDATE
操作到更大的批次中, 通过收集要插入的行来表示 临时表,而不是做 许多较小的插入件?- 整理上面的行会如何防止删除索引与遭受维护损失?
抱歉问题激增 - 这是我一直都知道要注意的事情,但是当尝试调整脚本以获得平衡时,我发现我实际上并不知道索引维护何时发生。
编辑:我知道性能问题很大程度上取决于插入/更新期间的数据量和索引数量。再次为了论证,我有两种情况:
- 调整了索引重的表 选择。
- 索引灯台(PK)。
这两种情况都会有一个大的插入/更新批次,例如 10k+ 行。
编辑2:我知道能够在数据集上分析给定的脚本。然而,分析并没有告诉我为什么给定的方法比另一种方法更快。我更感兴趣的是索引背后的理论以及性能问题的根源,而不是明确的“这比那更快”的答案。
谢谢。
For arguments sake, lets say it's for SQL 2005/8. I understand that when you place indexes on a table to tune SELECT
statements, these indexes need maintaining during INSERT
/ UPDATE
/ DELETE
actions.
My main question is this:
When will SQL Server maintain a table's indexes?
I have many subsequent questions:
I naively assume that it will do so after a command has executed. Say you are inserting 20 rows, it will maintain the index after 20 rows have been inserted and committed.
What happens in the situation where a
script features multiple statements
against a table, but are otherwise
distinct statements?Does the server have the intelligence
to maintain the index after all
statements are executed or does it do
it per statement?
I've seen situations where indexes are dropped and recreated after large / many INSERT
/ UPDATE
actions.
This presumably incurs rebuilding the
entire table's indexes even if you
only change a handful of rows?Would there be a performance benefit
in attempting to collateINSERT
andUPDATE
actions into a larger batch,
say by collecting rows to insert in a
temporary table, as opposed to doing
many smaller inserts?- How would collating the rows above stack up against dropping an index versus taking the maintenance hit?
Sorry for the proliferation of questions - it's something I've always known to be mindful of, but when trying to tune a script to get a balance, I find I don't actually know when index maintenance occurs.
Edit: I understand that performance questions largely depend on the amount of data during the insert/update and the number of indexes. Again for arguments sake, I'd have two situations:
- An index heavy table tuned for
selects. - An index light table (PK).
Both situations would have a large insert/update batch, say, 10k+ rows.
Edit 2: I'm aware of being able to profile a given script on a data set. However, profiling doesn't tell me why a given approach is faster than another. I am more interested in the theory behind the indexes and where performance issues stem, not a definitive "this is faster than that" answer.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您的报表(甚至不是交易)完成时,您的所有索引都是最新的。当您提交时,所有更改都将变为永久性,并且所有锁都将被释放。否则就不是“智能”,它会违反完整性并可能导致错误。
编辑:我所说的“完整性”是指:一旦提交,数据应该立即可供任何人使用。如果索引当时不是最新的,有人可能会得到不正确的结果。
当你增加批量大小时,你的性能最初会提高,然后会变慢。您需要运行自己的基准测试并找出最佳批量大小。同样,您需要进行基准测试以确定删除/重新创建索引是否更快。
编辑:如果您在一个语句中插入/更新/删除一批行,则每个语句都会修改您的索引一次。以下脚本演示了:
When your statement (not even transaction) is completed, all your indexes are up-to-date. When you commit, all the changes become permanent, and all locks are released. Doing otherwise would not be "intelligence", it would violate the integrity and possibly cause errors.
Edit: by "integrity" I mean this: once committed, the data should be immediately available to anyone. If the indexes are not up-to-date at that moment, someone may get incorrect results.
As you are increasing batch size, your performance originally improves, then it will slow down. You need to run your own benchmarks and find out your optimal batch size. Similarly, you need to benchmark to determine whether it is faster to drop/recreate indexes or not.
Edit: if you insert/update/delete batches of rows in one statement, your indexes are modified once per statement. The following script demonstrates that: