如何加快大型数据库表的删除速度?

发布于 2024-07-27 15:13:48 字数 602 浏览 8 评论 0原文

这是我试图解决的问题:我最近完成了数据层重新设计,它允许我跨多个分片平衡数据库的负载。 为了保持分片平衡,我需要能够将数据从一个分片迁移到另一个分片,这涉及到从分片 A 复制到分片 B,然后从分片 A 中删除记录。但是我有几个非常大的表,并且有许多外键指向它们,因此从表中删除一条记录可能需要一秒钟以上的时间。

在某些情况下,我需要从表中删除数百万条记录,但实际需要的时间太长。

禁用外键不是一个选项。 删除大批量的行也不是一个选择,因为这是一个生产应用程序,大量删除会锁定太多资源,从而导致失败。 我正在使用Sql Server,并且我了解分区表,但是分区的限制(以及企业版的许可费用)是如此不现实,以至于它们是不可能的。

当我开始解决这个问题时,我认为最困难的部分是编写算法来计算如何删除从叶级到数据模型顶部的行,以便在此过程中不会违反外键约束。 但解决这个问题对我没有任何好处,因为删除需要在一夜之间消失的记录需要花费数周的时间。

我已经构建了一种将数据标记为虚拟删除的方式,因此就应用程序而言,数据已经消失了,但由于数据的庞大规模,我仍在处理大型数据文件、大型备份和较慢的查询桌子。

有任何想法吗? 我已经在这里阅读了较旧的相关帖子,但没有发现任何有帮助的内容。

Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second.

In some cases I need to delete millions of records from the tables, and it just takes too long to be practical.

Disabling foreign keys is not an option. Deleting large batches of rows is also not an option because this is a production application and large deletes lock too many resources, causing failures. I'm using Sql Server, and I know about partitioned tables, but the restrictions on partitioning (and the license fees for enterprise edition) are so unrealistic that they are not possible.

When I began working on this problem I thought the hard part would be writing the algorithm that figures out how to delete rows from the leaf level up to the top of the data model, so that no foreign key constraints get violated along the way. But solving that problem did me no good since it takes weeks to delete records that need to disappear overnight.

I already built in a way to mark data as virtually deleted, so as far as the application is concerned, the data is gone, but I'm still dealing with large data files, large backups, and slower queries because of the sheer size of the tables.

Any ideas? I have already read older related posts here and found nothing that would help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

友谊不毕业 2024-08-03 15:13:48

请参阅:优化 SQL Server 上的删除

您可能会感兴趣这篇 MS 支持文章:< a href="http://support.microsoft.com/kb/323630" rel="noreferrer">如何解决 SQL Server 中锁升级引起的阻塞问题:

将大批量操作分解为几个较小的操作。 为了
例如,假设您运行了以下命令
查询删除数百个
审计中的数千条旧记录
表,然后你发现它
导致锁升级并阻塞
其他用户:

从 LogMessages 中删除,其中 LogDate <   “2002 年 2 月 1 日”     
  

通过删除这些记录
一次一百个,你可以
大大减少了
每个事务累积的锁
并防止锁升级。 为了
示例:

设置行数 500 
  删除更多: 
       从 LogMessages 中删除,其中 LogDate <   “2002 年 2 月 1 日” 
  如果@@ROWCOUNT >   0 转到删除_更多 
  设置行数 0 
  

通过提高查询的效率来减少查询的锁占用空间
可能。
大型扫描或大型
书签查找的数量可能
增加锁定机会
升级; 此外,它还增加了
发生死锁的可能性,一般情况下
对并发性产生不利影响
性能。

Please see: Optimizing Delete on SQL Server

This MS support article might be of interest: How to resolve blocking problems that are caused by lock escalation in SQL Server:

Break up large batch operations into several smaller operations. For
example, suppose you ran the following
query to remove several hundred
thousand old records from an audit
table, and then you found that it
caused a lock escalation that blocked
other users:

DELETE FROM LogMessages WHERE LogDate < '2/1/2002'    

By removing these records a few
hundred at a time, you can
dramatically reduce the number of
locks that accumulate per transaction
and prevent lock escalation. For
example:

SET ROWCOUNT 500
delete_more:
     DELETE FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more
SET ROWCOUNT 0

Reduce the query's lock footprint by making the query as efficient as
possible.
Large scans or large
numbers of Bookmark Lookups may
increase the chance of lock
escalation; additionally, it increases
the chance of deadlocks, and generally
adversely affects concurrency and
performance.

波浪屿的海角声 2024-08-03 15:13:48
delete_more:
     DELETE TOP(500) FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more

您可以按照 Mitch 的建议使用 SET ROWCOUNT 获得相同的结果,但 根据 MSDN,在 SQL Server 的未来版本中,将不支持 DELETE 和其他一些操作:

使用 SET ROWCOUNT 不会影响 DELETE、INSERT 和 UPDATE
SQL Server 未来版本中的语句。 避免使用 SET ROWCOUNT
在新的开发工作中使用 DELETE、INSERT 和 UPDATE 语句,
并计划修改当前使用它的应用程序。 对于类似的
行为,使用 TOP 语法。 欲了解更多信息,请参阅顶部
(Transact-SQL)。

delete_more:
     DELETE TOP(500) FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more

You could achieve the same result using SET ROWCOUNT as suggested by Mitch but according to MSDN it won't be supported for DELETE and some other operations in future versions of SQL Server:

Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE
statements in a future release of SQL Server. Avoid using SET ROWCOUNT
with DELETE, INSERT, and UPDATE statements in new development work,
and plan to modify applications that currently use it. For a similar
behavior, use the TOP syntax. For more information, see TOP
(Transact-SQL).

晨曦慕雪 2024-08-03 15:13:48

您可以创建新文件,复制除“已删除”行之外的所有行,然后交换表上的名称。 最后,删除旧表。 如果您要删除大部分记录,那么这实际上可能会更快。

You could create new files, copy all but the "deleted" rows, then swap the names on the tables. Finally, drop the old tables. If you're deleting a large percentage of the records, then this may actually be faster.

北渚 2024-08-03 15:13:48

另一个建议是重命名表并添加状态列。 当状态 = 1(已删除)时,您将不希望它显示。 因此,您然后创建一个与原始表同名的视图,当状态为 null 或 = 0 时(取决于您如何实现它),该视图会从表中进行选择。 用户会立即看到删除操作,并且后台作业可以每十五分钟运行一次,删除记录,除了数据库管理员之外,任何人都不会意识到这一点。

Another suggestion is to rename the table and add a status column. When status = 1 (deleted), then you won't want it to show. So you then create a view with the same name as the orginal table which selects from the table when status is null or = 0 (depending on how you implement it). The deletion appears immediate to the user and a background job can run every fifteen minutes deleting records that runs without anyone other than the dbas being aaware of it.

谎言 2024-08-03 15:13:48

如果您使用的是 SQL 2005 或 2008,也许使用“快照隔离”会对您有所帮助。 它允许在进行底层数据更新操作处理时数据对用户保持可见,然后在提交后立即显示数据。 即使您的删除需要 30 分钟才能运行,您的应用程序也会在此期间保持在线状态。

以下是快照锁定的快速入门:

http://www.mssqltips.com/tip。 asp?tip=1081

虽然您仍然应该尝试加快删除速度,使其尽可能快,但这可能会减轻一些负担。

If you're using SQL 2005 or 2008, perhaps using "snapshot isolation" would help you. It allows the data to remain visible to users while there's an underlying data update operation processing, and then reveals the data as soon as it's committed. Even if you delete takes 30 minutes to run, your applications would stay online during this time.

Here's a quick primer of snapshot locking:

http://www.mssqltips.com/tip.asp?tip=1081

Though you should still try to speed up your delete so it's as quick as possible, this may alleviate some of the burden.

最终幸福 2024-08-03 15:13:48

您可以使用 while 循环删除小批量,如下所示:

DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
END

You can delete small batches using a while loop, something like this:

DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
END
浸婚纱 2024-08-03 15:13:48

如果表的很大一部分将符合删除条件(接近或超过 50%),那么创建一个包含不会被删除的记录的临时表会“更便宜”(与 WHERE 条件相反),截断原始表,然后用要保留的记录重新填充它。

DELETE FROM TABLE WHERE ROW_TO_DELETE = 'OK';
GO

-->

INSERT INTO #TABLE WHERE NOT ROW_TO_DELETE = 'OK';
TRUNCATE TABLE;
INSERT INTO TABLE (SELECT * FROM #TABLE);
GO

If a sizeable percentage of the table is going to match the deletion criteria (near or over 50%), then it is "cheaper" to create a temporary table with the records that are not going to be deleted (reverse the WHERE criteria), truncate the original table and then repopulate it with the records that were intended to be kept.

DELETE FROM TABLE WHERE ROW_TO_DELETE = 'OK';
GO

-->

INSERT INTO #TABLE WHERE NOT ROW_TO_DELETE = 'OK';
TRUNCATE TABLE;
INSERT INTO TABLE (SELECT * FROM #TABLE);
GO
生寂 2024-08-03 15:13:48

这是您问题的解决方案。

DECLARE @RC AS INT
SET @RC = -1

WHILE @RC <> 0
BEGIN
    DELETE TOP(1000000) FROM [Archive_CBO_ODS].[CBO].[AckItem] WHERE [AckItemId] >= 300
    SET @RC = @@ROWCOUNT
    --SET @RC = 0
END

here is the solution to your problem.

DECLARE @RC AS INT
SET @RC = -1

WHILE @RC <> 0
BEGIN
    DELETE TOP(1000000) FROM [Archive_CBO_ODS].[CBO].[AckItem] WHERE [AckItemId] >= 300
    SET @RC = @@ROWCOUNT
    --SET @RC = 0
END
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文