如何加快大型数据库表的删除速度?
这是我试图解决的问题:我最近完成了数据层重新设计,它允许我跨多个分片平衡数据库的负载。 为了保持分片平衡,我需要能够将数据从一个分片迁移到另一个分片,这涉及到从分片 A 复制到分片 B,然后从分片 A 中删除记录。但是我有几个非常大的表,并且有许多外键指向它们,因此从表中删除一条记录可能需要一秒钟以上的时间。
在某些情况下,我需要从表中删除数百万条记录,但实际需要的时间太长。
禁用外键不是一个选项。 删除大批量的行也不是一个选择,因为这是一个生产应用程序,大量删除会锁定太多资源,从而导致失败。 我正在使用Sql Server,并且我了解分区表,但是分区的限制(以及企业版的许可费用)是如此不现实,以至于它们是不可能的。
当我开始解决这个问题时,我认为最困难的部分是编写算法来计算如何删除从叶级到数据模型顶部的行,以便在此过程中不会违反外键约束。 但解决这个问题对我没有任何好处,因为删除需要在一夜之间消失的记录需要花费数周的时间。
我已经构建了一种将数据标记为虚拟删除的方式,因此就应用程序而言,数据已经消失了,但由于数据的庞大规模,我仍在处理大型数据文件、大型备份和较慢的查询桌子。
有任何想法吗? 我已经在这里阅读了较旧的相关帖子,但没有发现任何有帮助的内容。
Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second.
In some cases I need to delete millions of records from the tables, and it just takes too long to be practical.
Disabling foreign keys is not an option. Deleting large batches of rows is also not an option because this is a production application and large deletes lock too many resources, causing failures. I'm using Sql Server, and I know about partitioned tables, but the restrictions on partitioning (and the license fees for enterprise edition) are so unrealistic that they are not possible.
When I began working on this problem I thought the hard part would be writing the algorithm that figures out how to delete rows from the leaf level up to the top of the data model, so that no foreign key constraints get violated along the way. But solving that problem did me no good since it takes weeks to delete records that need to disappear overnight.
I already built in a way to mark data as virtually deleted, so as far as the application is concerned, the data is gone, but I'm still dealing with large data files, large backups, and slower queries because of the sheer size of the tables.
Any ideas? I have already read older related posts here and found nothing that would help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
请参阅:优化 SQL Server 上的删除
您可能会感兴趣这篇 MS 支持文章:< a href="http://support.microsoft.com/kb/323630" rel="noreferrer">如何解决 SQL Server 中锁升级引起的阻塞问题:
Please see: Optimizing Delete on SQL Server
This MS support article might be of interest: How to resolve blocking problems that are caused by lock escalation in SQL Server:
您可以按照 Mitch 的建议使用
SET ROWCOUNT
获得相同的结果,但 根据 MSDN,在 SQL Server 的未来版本中,将不支持DELETE
和其他一些操作:You could achieve the same result using
SET ROWCOUNT
as suggested by Mitch but according to MSDN it won't be supported forDELETE
and some other operations in future versions of SQL Server:您可以创建新文件,复制除“已删除”行之外的所有行,然后交换表上的名称。 最后,删除旧表。 如果您要删除大部分记录,那么这实际上可能会更快。
You could create new files, copy all but the "deleted" rows, then swap the names on the tables. Finally, drop the old tables. If you're deleting a large percentage of the records, then this may actually be faster.
另一个建议是重命名表并添加状态列。 当状态 = 1(已删除)时,您将不希望它显示。 因此,您然后创建一个与原始表同名的视图,当状态为 null 或 = 0 时(取决于您如何实现它),该视图会从表中进行选择。 用户会立即看到删除操作,并且后台作业可以每十五分钟运行一次,删除记录,除了数据库管理员之外,任何人都不会意识到这一点。
Another suggestion is to rename the table and add a status column. When status = 1 (deleted), then you won't want it to show. So you then create a view with the same name as the orginal table which selects from the table when status is null or = 0 (depending on how you implement it). The deletion appears immediate to the user and a background job can run every fifteen minutes deleting records that runs without anyone other than the dbas being aaware of it.
如果您使用的是 SQL 2005 或 2008,也许使用“快照隔离”会对您有所帮助。 它允许在进行底层数据更新操作处理时数据对用户保持可见,然后在提交后立即显示数据。 即使您的删除需要 30 分钟才能运行,您的应用程序也会在此期间保持在线状态。
以下是快照锁定的快速入门:
http://www.mssqltips.com/tip。 asp?tip=1081
虽然您仍然应该尝试加快删除速度,使其尽可能快,但这可能会减轻一些负担。
If you're using SQL 2005 or 2008, perhaps using "snapshot isolation" would help you. It allows the data to remain visible to users while there's an underlying data update operation processing, and then reveals the data as soon as it's committed. Even if you delete takes 30 minutes to run, your applications would stay online during this time.
Here's a quick primer of snapshot locking:
http://www.mssqltips.com/tip.asp?tip=1081
Though you should still try to speed up your delete so it's as quick as possible, this may alleviate some of the burden.
您可以使用 while 循环删除小批量,如下所示:
You can delete small batches using a while loop, something like this:
如果表的很大一部分将符合删除条件(接近或超过 50%),那么创建一个包含不会被删除的记录的临时表会“更便宜”(与 WHERE 条件相反),截断原始表,然后用要保留的记录重新填充它。
If a sizeable percentage of the table is going to match the deletion criteria (near or over 50%), then it is "cheaper" to create a temporary table with the records that are not going to be deleted (reverse the WHERE criteria), truncate the original table and then repopulate it with the records that were intended to be kept.
这是您问题的解决方案。
here is the solution to your problem.