SQL Server DELETE 使用索引时速度较慢
我有一个 SQL Server 2005 数据库,我尝试在适当的字段上放置索引,以加快从具有数百万行的表(big_table
只有 3 列),但现在 DELETE
执行时间甚至更长! (例如 1 小时与 13 分钟)
我与表之间存在关系,并且我过滤 DELETE
所依据的列位于另一个表中。例如
DELETE FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
,顺便说一句,我也尝试过:
DELETE FROM big_table
WHERE EXISTS
(SELECT 1 FROM small_table
WHERE small_table.id_product = big_table.id_product
AND small_table.id_category = 1)
虽然它看起来比第一个运行得稍快,但使用索引时它仍然比不使用索引慢很多。
我在这些字段上创建了索引:
big_table.id_product
small_table.id_product
small_table.id_category
我的 .ldf 文件在 DELETE 期间增长了很多
。
当我的表上有索引时,为什么我的 DELETE
查询会变慢?我认为它们应该运行得更快。
更新
好吧,共识似乎是索引会减慢巨大的DELETE
,因为索引必须更新。尽管如此,我仍然不明白为什么它不能一次删除所有行,并在最后更新一次索引。
我从一些阅读中得到的印象是,索引可以通过更快地搜索 WHERE
子句中的字段来加快 DELETE
的速度。
“在 DELETE 和 UPDATE 中搜索记录时,索引也同样有效命令与 SELECT 语句一样。”
但在文章后面,它说太多的索引会损害性能。
bobs 问题的答案:
- 表中的 5500 万行
- 4200 万行被删除
- 类似的 SELECT 语句不会运行(抛出“System.OutOfMemoryException”类型的异常)
我尝试了以下 2 个查询:
SELECT * FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
SELECT * FROM big_table
INNER JOIN small_table
ON small_table.id_product = big_table.id_product
WHERE small_table.id_category = 1
运行 25 分钟后均失败,并显示来自 SQL Server 2005 的以下错误消息:
An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.
数据库服务器是一台较旧的双核 Xeon 计算机,具有 7.5 GB RAM。这是我的玩具测试数据库:)所以它没有运行其他任何东西。
在创建
索引后,我是否需要对其进行一些特殊操作以使它们正常工作?
I have an SQL Server 2005 database, and I tried putting indexes on the appropriate fields in order to speed up the DELETE
of records from a table with millions of rows (big_table
has only 3 columns), but now the DELETE
execution time is even longer! (1 hour versus 13 min for example)
I have a relationship between to tables, and the column that I filter my DELETE
by is in the other table. For example
DELETE FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
Btw, I've also tried:
DELETE FROM big_table
WHERE EXISTS
(SELECT 1 FROM small_table
WHERE small_table.id_product = big_table.id_product
AND small_table.id_category = 1)
and while it seems to run slightly faster than the first, it's still a lot slower with the indexes than without.
I created indexes on these fields:
big_table.id_product
small_table.id_product
small_table.id_category
My .ldf file grows a lot during the DELETE
.
Why are my DELETE
queries slower when I have indexes on my tables? I thought they were supposed to run faster.
UPDATE
Okay, consensus seems to be indexes will slow down a huge DELETE
becuase the index has to be updated. Although, I still don't understand why it can't DELETE
all the rows all at once, and just update the index once at the end.
I was under the impression from some of my reading that indexes would speed up DELETE
by making searches for fields in the WHERE
clause faster.
"Indexes work just as well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements."
But later in the article, it says that too many indexes can hurt performance.
Answers to bobs questions:
- 55 million rows in table
- 42 million rows being deleted
- Similar
SELECT
statement would not run (Exception of type 'System.OutOfMemoryException' was thrown)
I tried the following 2 queries:
SELECT * FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
SELECT * FROM big_table
INNER JOIN small_table
ON small_table.id_product = big_table.id_product
WHERE small_table.id_category = 1
Both failed after running for 25 min with this error message from SQL Server 2005:
An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.
The database server is an older dual core Xeon machine with 7.5 GB ram. It's my toy test database :) so it's not running anything else.
Do I need to do something special with my indexes after I CREATE
them to make them work properly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
索引使查找速度更快 - 就像书后面的索引一样。
更改数据的操作(如 DELETE)速度较慢,因为它们涉及操作索引。考虑本书后面的相同索引。如果添加、删除或更改页面,您需要做更多工作,因为您还必须更新索引。
Indexes make lookups faster - like the index at the back of a book.
Operations that change the data (like a DELETE) are slower, as they involve manipulating the indexes. Consider the same index at the back of the book. You have more work to do if you add, remove or change pages because you have to also update the index.
我同意鲍勃上面的评论 - 如果您要从大型表中删除大量数据,则删除索引可能会花费一段时间,但删除数据会增加业务成本。当它删除所有数据时,您将导致重新索引事件发生。
关于日志文件的增长;如果您没有对日志文件执行任何操作,您可以切换到简单日志记录;但我强烈建议您在做出改变之前先了解一下可能对您的 IT 部门产生的影响。
如果您需要实时删除;直接在表中或在另一个表中将数据标记为非活动状态并从查询中排除该数据通常是一个很好的解决方法;然后稍后回来并在用户没有盯着沙漏时删除数据。报道这一点还有第二个原因;如果您要从表中删除大量数据(这就是我根据日志文件问题所假设的),那么您可能需要执行索引碎片整理来重新组织索引;如果您不喜欢手机上的用户,那么在非工作时间这样做是可行的方法!
I Agree with Bobs comment above - if you are deleting large volumes of data from large tables deleting the indices can take a while on top of deleting the data its the cost of doing business though. As it deletes all the data out you are causing reindexing events to happen.
With regards to the logfile growth; if you arent doing anything with your logfiles you could switch to Simple logging; but i urge you to read up on the impact that might have on your IT department before you change.
If you need to do the delete in real time; its often a good work around to flag the data as inactive either directly on the table or in another table and exclude that data from queries; then come back later and delete the data when the users aren't staring at an hourglass. There is a second reason for covering this; if you are deleting lots of data out of the table (which is what i am supposing based on your logfile issue) then you will likely want to do an indexdefrag to reorgnaise the index; doing that out of hours is the way to go if you dont like users on the phone !
JohnB 正在删除大约 75% 的数据。我认为以下是一个可能的解决方案,而且可能是更快的解决方案之一。不要删除数据,而是创建一个新表并插入需要保留的数据。插入数据后在该新表上创建索引。现在删除旧表并将新表重命名为与旧表相同的名称。
当然,上述假设有足够的磁盘空间可用于临时存储重复的数据。
JohnB is deleting about 75% of the data. I think the following would have been a possible solution and probably one of the faster ones. Instead of deleting the data, create a new table and insert the data that you need to keep. Create the indexes on that new table after inserting the data. Now drop the old table and rename the new one to the same name as the old one.
The above of course assumes that sufficient disk space is available to temporarily store the duplicated data.
尝试这样的方法以避免批量删除(从而避免日志文件增长)
Try something like this to avoid bulk delete (and thereby avoid log file growth)
您还可以尝试 TSQL 扩展 DELETE 语法并检查它是否提高了性能:
You can also try TSQL extension to DELETE syntax and check whether it improves performance: