对于 4200 万行的表,使用相关子查询删除 SQL?

发布于 2024-09-13 02:54:49 字数 3766 浏览 14 评论 0原文

我有一个包含 42,795,120 行的表 cats

显然这是很多行。因此,当我这样做时:

/* owner_cats is a many-to-many join table */
DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

查询超时:(

(编辑:我需要增加我的 CommandTimeout 值,默认值仅为 30 秒)

我不能使用 TRUNCATE TABLE cats 因为我不想把猫从其他所有者那里吹走,

我正在使用 SQL Server 2005,并将“恢复模式”设置为“。简单。”

所以,我考虑做这样的事情(顺便说一句,从应用程序执行此 SQL):

DELETE TOP (25) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

DELETE TOP(50) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

我的问题是:在 SQL 中我可以DELETE的行数阈值是多少Server 2005?

或者,如果我的方法不是最佳的,请提出更好的方法,谢谢。

这篇文章对我的帮助不够:

编辑(8/6/2010):

好的,我在再次阅读上面的链接后才意识到我在这些表上没有索引。另外,有些人已经在下面的评论中指出了这个问题。请记住,这是一个虚构的架构,因此即使 id_cat 也不是 PK,因为在我现实生活中的架构中,它不是一个唯一的字段。

我将把索引放在:

  1. cats.id_cat
  2. owner_cats.id_cat
  3. owner_cats.id_owner

我想我仍然掌握了这个数据仓库的窍门,显然我需要所有 JOIN 字段的索引,对吧?

然而,我需要几个小时才能完成这个批量加载过程。我已经将其作为 SqlBulkCopy 进行(分块,而不是一次全部 4200 万)。我有一些索引和PK。我阅读了以下帖子,这些帖子证实了我的理论,即即使是批量复制,索引也会减慢速度:

因此,我将在复制之前DROP 我的索引,然后在完成后重新CREATE 它们。

由于加载时间较长,我需要一段时间来测试这些建议。我会报告结果。

更新 (8/7/2010):

Tom 建议:

DELETE
FROM cats c
WHERE EXISTS (SELECT 1
FROM owner_cats o
WHERE o.id_cat = c.id_cat
AND o.id_owner = 1)

在没有索引的情况下,对于 4200 万行,花费了 13:21 分:秒,而按照上述方式花费了 22:08。然而,对于 1300 万行,他用了 2 分 13 秒,而我的老方法则为 2 分 10 秒。这是个好主意,但我仍然需要使用索引!

更新(2010年8月8日):

出现了严重错误!现在,索引打开后,我上面的第一个删除查询花费了 1:9 小时:分钟(是的一个小时!),而 22:08 分钟:秒和 13:21 分钟:sec 与 2:10 min:sec 分别对应 4200 万行和 1300 万行。我现在要尝试使用索引进行汤姆的查询,但这正朝着错误的方向发展。请帮忙。

更新(2010 年 8 月 9 日):

Tom 删除 4200 万行的时间为 1:06 小时:分,删除 1300 万行(带索引)的时间为 10:50 分钟:秒,而删除 13:21 分钟:秒则为 13:21 分钟:秒。分别为 2:13 分:秒。 当我使用索引时,数据库的删除时间会延长一个数量级!我想我知道为什么,我的数据库 .mdf 和 .ldf 从 3.5 GB 增长到 40.6 GB第一个(4200 万)GB 删除! 我做错了什么?

更新(2010年8月10日):

由于缺乏任何其他选择,我想出了一个我认为平淡无奇的解决方案(希望是临时的)

  1. 将数据库连接超时增加到 1 小时(CommandTimeout=60000; 默认值为 30 秒)
  2. 使用 Tom 的查询:DELETE FROM WHERE EXISTS (SELECT 1 . 执行得更快
  3. ..) 因为它在运行删除语句之前DROP所有索引和主键
  4. (???)运行DELETE语句
  5. CREATE 所有索引和 PK

看起来很疯狂,但至少比使用 TRUNCATE 并从第一个 owner_id 开始重新加载要快,因为我的 owner_id 之一需要 2:30 小时:分钟来加载,而我刚刚描述的 4200 万行删除过程则需要 17:22 分钟:秒。 (注意:如果我的加载过程抛出异常,我会重新开始该 owner_id,但我不想清除之前的 owner_id,所以我不想TRUNCATE owner_cats 表,这就是我尝试使用 DELETE 的原因。)

仍然需要任何帮助赞赏:)

I have a table cats with 42,795,120 rows.

Apparently this is a lot of rows. So when I do:

/* owner_cats is a many-to-many join table */
DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

the query times out :(

(edit: I need to increase my CommandTimeout value, default is only 30 seconds)

I can't use TRUNCATE TABLE cats because I don't want to blow away cats from other owners.

I'm using SQL Server 2005 with "Recovery model" set to "Simple."

So, I thought about doing something like this (executing this SQL from an application btw):

DELETE TOP (25) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

DELETE TOP(50) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

My question is: what is the threshold of the number of rows I can DELETE in SQL Server 2005?

Or, if my approach is not optimal, please suggest a better approach. Thanks.

This post didn't help me enough:

EDIT (8/6/2010):

Okay, I just realized after reading the above link again that I did not have indexes on these tables. Also, some of you have already pointed out that issue in the comments below. Keep in mind this is a fictitious schema, so even id_cat is not a PK, because in my real life schema, it's not a unique field.

I will put indexes on:

  1. cats.id_cat
  2. owner_cats.id_cat
  3. owner_cats.id_owner

I guess I'm still getting the hang of this data warehousing, and obviously I need indexes on all the JOIN fields right?

However, it takes hours for me to do this batch load process. I'm already doing it as a SqlBulkCopy (in chunks, not 42 mil all at once). I have some indexes and PKs. I read the following posts which confirms my theory that the indexes are slowing down even a bulk copy:

So I'm going to DROP my indexes before the copy and then re CREATE them when it's done.

Because of the long load times, it's going to take me awhile to test these suggestions. I'll report back with the results.

UPDATE (8/7/2010):

Tom suggested:

DELETE
FROM cats c
WHERE EXISTS (SELECT 1
FROM owner_cats o
WHERE o.id_cat = c.id_cat
AND o.id_owner = 1)

And still with no indexes, for 42 million rows, it took 13:21 min:sec versus 22:08 with the way described above. However, for 13 million rows, took him 2:13 versus 2:10 my old way. It's a neat idea, but I still need to use indexes!

Update (8/8/2010):

Something is terribly wrong! Now with the indexes on, my first delete query above took 1:9 hrs:min (yes an hour!) versus 22:08 min:sec and 13:21 min:sec versus 2:10 min:sec for 42 mil rows and 13 mil rows respectively. I'm going to try Tom's query with the indexes now, but this is heading in the wrong direction. Please help.

Update (8/9/2010):

Tom's delete took 1:06 hrs:min for 42 mil rows and 10:50 min:sec for 13 mil rows with indexes versus 13:21 min:sec and 2:13 min:sec respectively. Deletes are taking longer on my database when I use indexes by an order of magnitude! I think I know why, my database .mdf and .ldf grew from 3.5 GB to 40.6 GB during the first (42 mil) delete! What am I doing wrong?

Update (8/10/2010):

For lack of any other options, I have come up with what I feel is a lackluster solution (hopefully temporary):

  1. Increase timeout for database connection to 1 hour (CommandTimeout=60000; default was 30 sec)
  2. Use Tom's query: DELETE FROM WHERE EXISTS (SELECT 1 ...) because it performed a little faster
  3. DROP all indexes and PKs before running delete statement (???)
  4. Run DELETE statement
  5. CREATE all indexes and PKs

Seems crazy, but at least it's faster than using TRUNCATE and starting over my load from the beginning with the first owner_id, because one of my owner_id takes 2:30 hrs:min to load versus 17:22 min:sec for the delete process I just described with 42 mil rows. (Note: if my load process throws an exception, I start over for that owner_id, but I don't want to blow away previous owner_id, so I don't want to TRUNCATE the owner_cats table, which is why I'm trying to use DELETE.)

Anymore help would still be appreciated :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

不即不离 2024-09-20 02:54:49

没有实际门槛。这取决于您的连接上的命令超时设置。

请记住,删除所有这些行所需的时间取决于:

  • 找到感兴趣的行所需的时间
  • 将事务记录到事务日志中所需的
  • 时间 删除以下行的索引条目所需的时间兴趣
  • 删除实际感兴趣的行所需的时间
  • 等待其他进程停止使用该表所需的时间,以便您可以获取在这种情况下很可能是独占表锁

最后一点通常可能是最重要的重要的。在另一个查询窗口中执行 sp_who2 命令,以确保没有发生锁争用,从而阻止命令执行。

配置不当的 SQL Server 在这种类型的查询中表现不佳。在处理大行时,太小的事务日志和/或与数据文件共享相同的磁盘通常会导致严重的性能损失。

至于解决方案,就像所有事情一样,这取决于情况。这是您打算经常做的事情吗?根据剩余的行数,最快的方法可能是将表重建为另一个名称,然后重命名并重新创建其约束,所有这些都在事务内进行。如果这只是一个临时事件,请确保您的 ADO CommandTimeout 设置得足够高,并且您可以承受这次大删除的成本。

There is no practical threshold. It depends on what your command timeout is set to on your connection.

Keep in mind that the time it takes to delete all of these rows is contingent upon:

  • The time it takes to find the rows of interest
  • The time it takes to log the transaction in the transaction log
  • The time it takes to delete the index entries of interest
  • The time it takes to delete the actual rows of interest
  • The time it takes to wait for other processes to stop using the table so you can acquire what in this case will most likely be an exclusive table lock

The last point may often be the most significant. Do an sp_who2 command in another query window to make sure that there isn't lock contention going on, preventing your command from executing.

Improperly configured SQL Servers will do poorly at this type of query. Transaction logs which are too small and/or share the same disks as the data files will often incur severe performance penalties when working with large rows.

As for a solution, well, like all things, it depends. Is this something you intend to be doing often? Depending on how many rows you have left, the fastest way might be to rebuild the table as another name and then rename it and recreate its constraints, all inside a transaction. If this is just an ad-hoc thing, make sure your ADO CommandTimeout is set high enough and you can just bear the cost of this big delete.

檐上三寸雪 2024-09-20 02:54:49

如果删除将从表中删除“大量”行,则这可以是 DELETE 的替代方法:将记录保存在其他位置,截断原始表,放回“守护者”。像这样的东西:

SELECT *
INTO #cats_to_keep
FROM cats
WHERE cats.id_cat NOT IN (    -- note the NOT
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

TRUNCATE TABLE cats

INSERT INTO cats
SELECT * FROM #cats_to_keep

If the delete will remove "a significant number" of rows from the table, this can be an alternative to a DELETE: put the records to keep somewhere else, truncate the original table, put back the 'keepers'. Something like:

SELECT *
INTO #cats_to_keep
FROM cats
WHERE cats.id_cat NOT IN (    -- note the NOT
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)

TRUNCATE TABLE cats

INSERT INTO cats
SELECT * FROM #cats_to_keep
满天都是小星星 2024-09-20 02:54:49

您是否尝试过不使用子查询而使用联接?

DELETE cats 
FROM
 cats c
 INNER JOIN owner_cats oc
 on c.id_cat = oc.id_cat
WHERE
   id_owner =1

如果您有的话,您还尝试了不同的加入提示,例如

DELETE cats 
FROM
 cats c
 INNER HASH JOIN owner_cats oc
 on c.id_cat = oc.id_cat
WHERE
   id_owner =1

Have you tried no Subquery and use a join instead?

DELETE cats 
FROM
 cats c
 INNER JOIN owner_cats oc
 on c.id_cat = oc.id_cat
WHERE
   id_owner =1

And if you have have you also tried different Join hints e.g.

DELETE cats 
FROM
 cats c
 INNER HASH JOIN owner_cats oc
 on c.id_cat = oc.id_cat
WHERE
   id_owner =1
邮友 2024-09-20 02:54:49

如果您使用 EXISTS 而不是 IN,您应该会获得更好的性能。试试这个:

DELETE
  FROM cats c
 WHERE EXISTS (SELECT 1
                 FROM owner_cats o
                WHERE o.id_cat = c.id_cat
                  AND o.id_owner = 1)

If you use an EXISTS rather than an IN, you should get much better performance. Try this:

DELETE
  FROM cats c
 WHERE EXISTS (SELECT 1
                 FROM owner_cats o
                WHERE o.id_cat = c.id_cat
                  AND o.id_owner = 1)
油焖大侠 2024-09-20 02:54:49

没有这样的阈值 - 只要有足够的事务日志空间,您就可以从任何表中删除所有行 - 这是您的查询最有可能失败的地方。如果您从 DELETE TOP (n) PERCENT FROM cats WHERE ... 中得到一些结果,那么您可以将其包装在循环中,如下所示:

SELECT 1
WHILE @@ROWCOUNT <> 0
BEGIN
 DELETE TOP (somevalue) PERCENT FROM cats
 WHERE cats.id_cat IN (
 SELECT owner_cats.id_cat FROM owner_cats
 WHERE owner_cats.id_owner = 1)
END

There's no threshold as such - you can DELETE all the rows from any table given enough transaction log space - which is where your query is most likely falling over. If you're getting some results from your DELETE TOP (n) PERCENT FROM cats WHERE ... then you can wrap it in a loop as below:

SELECT 1
WHILE @@ROWCOUNT <> 0
BEGIN
 DELETE TOP (somevalue) PERCENT FROM cats
 WHERE cats.id_cat IN (
 SELECT owner_cats.id_cat FROM owner_cats
 WHERE owner_cats.id_owner = 1)
END
逆光飞翔i 2024-09-20 02:54:49

正如其他人提到的,当您删除 4200 万行时,数据库必须在数据库中记录 4200 万行删除。因此,事务日志必须大幅增长。您可能会尝试将删除分解为块。在下面的查询中,我使用 NTile 排名函数将行分成 100 个存储桶。如果这太慢,您可以扩大存储桶的数量,以便每次删除都更小。如果 owner_cats.id_ownerowner_cats.id_catscats.id_cat (我假设主键和数字)。

Declare @Cats Cursor
Declare @CatId int  --assuming an integer PK here
Declare @Start int
Declare @End int
Declare @GroupCount int

Set @GroupCount = 100

Set @Cats = Cursor Fast_Forward For
    With CatHerd As
        (
        Select cats.id_cat
            , NTile(@GroupCount) Over ( Order By cats.id_cat ) As Grp
        From cats
            Join owner_cats
                On owner_cats.id_cat = cats.id_cat
        Where owner_cats.id_owner = 1
        )
        Select Grp, Min(id_cat) As MinCat, Max(id_cat) As MaxCat
        From CatHerd
        Group By Grp
Open @Cats
Fetch Next From @Cats Into @CatId, @Start, @End

While @@Fetch_Status = 0
Begin
    Delete cats
    Where id_cat Between @Start And @End

    Fetch Next From @Cats Into @CatId, @Start, @End
End 

Close @Cats
Deallocate @Cats

上述方法的显着问题是它不是事务性的。因此,如果它在第 40 个块上失败,您将删除 40% 的行,而其他 60% 仍将存在。

As others have mentioned, when you delete 42 million rows, the db has to log 42 million deletions against the database. Thus, the transaction log has to grow substantially. What you might try is to break up the delete into chunks. In the following query, I use the NTile ranking function to break up the rows into 100 buckets. If that is too slow, you can expand the number of buckets so that each delete is smaller. It will help tremendously if there is an index on owner_cats.id_owner, owner_cats.id_cats and cats.id_cat (which I assumed the primary key and numeric).

Declare @Cats Cursor
Declare @CatId int  --assuming an integer PK here
Declare @Start int
Declare @End int
Declare @GroupCount int

Set @GroupCount = 100

Set @Cats = Cursor Fast_Forward For
    With CatHerd As
        (
        Select cats.id_cat
            , NTile(@GroupCount) Over ( Order By cats.id_cat ) As Grp
        From cats
            Join owner_cats
                On owner_cats.id_cat = cats.id_cat
        Where owner_cats.id_owner = 1
        )
        Select Grp, Min(id_cat) As MinCat, Max(id_cat) As MaxCat
        From CatHerd
        Group By Grp
Open @Cats
Fetch Next From @Cats Into @CatId, @Start, @End

While @@Fetch_Status = 0
Begin
    Delete cats
    Where id_cat Between @Start And @End

    Fetch Next From @Cats Into @CatId, @Start, @End
End 

Close @Cats
Deallocate @Cats

The notable catch with the above approach is that it is not transactional. Thus, if it fails on the 40th chunk, you will have deleted 40% of the rows and the other 60% will still exist.

瀟灑尐姊 2024-09-20 02:54:49

可能值得尝试 MERGE 例如

MERGE INTO cats 
   USING owner_cats
      ON cats.id_cat = owner_cats.id_cat
         AND owner_cats.id_owner = 1
WHEN MATCHED THEN DELETE;

Might be worth trying MERGE e.g.

MERGE INTO cats 
   USING owner_cats
      ON cats.id_cat = owner_cats.id_cat
         AND owner_cats.id_owner = 1
WHEN MATCHED THEN DELETE;
迷鸟归林 2024-09-20 02:54:49

<编辑> (9/28/2011)
我的答案的执行方式与 Thomas 的解决方案(2010 年 8 月 6 日)基本相同。当我发布我的答案时,我错过了它,因为它使用了实际的光标,所以我对自己说“不好”,因为涉及的记录数。然而,当我刚才重读他的答案时,我意识到他使用光标的方式实际上是“好的”。非常聪明。我刚刚投票赞成了他的答案,并且将来可能会使用他的方法。如果你不明白为什么,请再看一遍。如果您仍然看不到它,请对此答案发表评论,我会回来尝试详细解释。我决定留下我的答案,因为有些 DBA 可能拒绝让他们使用实际的 CURSOR,无论它有多“好”。 :-)

我意识到这个问题已经有一年了,但我最近遇到了类似的情况。我试图对一个大表进行“批量”更新,并连接到另一个相当大的表。问题在于,连接产生了太多的“连接记录”,导致处理时间过长,并且可能导致争用问题。由于这是一次性更新,我想出了以下“技巧”。我创建了一个 WHILE 循环,它遍历要更新的表,并一次选择 50,000 条记录进行更新。它看起来像这样:

DECLARE @RecId bigint
DECLARE @NumRecs bigint
SET @NumRecs = (SELECT MAX(Id) FROM [TableToUpdate])
SET @RecId = 1
WHILE @RecId < @NumRecs
BEGIN
    UPDATE [TableToUpdate]
    SET UpdatedOn = GETDATE(),
        SomeColumn = t2.[ColumnInTable2]
    FROM    [TableToUpdate] t
    INNER JOIN [Table2] t2 ON t2.Name = t.DBAName 
        AND ISNULL(t.PhoneNumber,'') = t2.PhoneNumber 
        AND ISNULL(t.FaxNumber, '') = t2.FaxNumber
    LEFT JOIN [Address] d ON d.AddressId = t.DbaAddressId 
        AND ISNULL(d.Address1,'') = t2.DBAAddress1
        AND ISNULL(d.[State],'') = t2.DBAState
        AND ISNULL(d.PostalCode,'') = t2.DBAPostalCode
    WHERE t.Id BETWEEN @RecId AND (@RecId + 49999)
    SET @RecId = @RecId + 50000
END

没什么特别的,但它完成了工作。由于它一次仅处理 50,000 条记录,因此创建的任何锁都是短暂的。此外,优化器意识到它不必处理整个表,因此它可以更好地选择执行计划。

<编辑> (9/28/2011)
对于这里不止一次提到的建议有一个巨大的警告,并且在网络上到处张贴关于将“好”记录复制到另一个表,执行 TRUNCATE(或 DROP 并重新创建,或 DROP 和重命名),然后重新填充表。

如果该表是 PK-FK 关系(或其他 CONSTRAINT)中的 PK 表,则不能执行此操作。当然,您可以删除关系,进行清理,然后重新建立关系,但您也必须清理 FK 表。您可以在重新建立关系之前执行此操作,这意味着更多的“停机时间”,或者您可以选择不对创建施加约束并在之后进行清理。我想您还可以在清理 PK 表之前清理 FK 表。最重要的是,您必须以一种或另一种方式显式清理 FK 表。

我的答案是基于 SET/准游标的混合过程。此方法的另一个好处是,如果 PK-FK 关系设置为 CASCADE DELETES,您不必执行我上面提到的清理工作,因为服务器会为您处理它。如果您的公司/DBA 不鼓励级联删除,您可以要求仅在此进程运行时启用它,然后在其完成时禁用它。根据运行清理的帐户的权限级别,可以将用于启用/禁用级联删除的 ALTER 语句添加到 SQL 语句的开头和结尾。

<Edit> (9/28/2011)
My answer performs basically the same way as Thomas' solution (Aug 6 '10). I missed it when I posted my answer because it he uses an actual CURSOR so I thought to myself "bad" because of the # of records involved. However, when I reread his answer just now I realize that the WAY he uses the cursor is actually "good". Very clever. I just voted up his answer and will probably use his approach in the future. If you don't understand why, take a look at it again. If you still can't see it, post a comment on this answer and I will come back and try to explain in detail. I decided to leave my answer because someone may have a DBA who refuses to let them use an actual CURSOR regardless of how "good" it is. :-)
</Edit>

I realize that this question is a year old but I recently had a similar situation. I was trying to do "bulk" updates to a large table with a join to a different table, also fairly large. The problem was that the join was resulting in so many "joined records" that it took too long to process and could have led to contention problems. Since this was a one-time update I came up with the following "hack." I created a WHILE LOOP that went through the table to be updated and picked 50,000 records to update at a time. It looked something like this:

DECLARE @RecId bigint
DECLARE @NumRecs bigint
SET @NumRecs = (SELECT MAX(Id) FROM [TableToUpdate])
SET @RecId = 1
WHILE @RecId < @NumRecs
BEGIN
    UPDATE [TableToUpdate]
    SET UpdatedOn = GETDATE(),
        SomeColumn = t2.[ColumnInTable2]
    FROM    [TableToUpdate] t
    INNER JOIN [Table2] t2 ON t2.Name = t.DBAName 
        AND ISNULL(t.PhoneNumber,'') = t2.PhoneNumber 
        AND ISNULL(t.FaxNumber, '') = t2.FaxNumber
    LEFT JOIN [Address] d ON d.AddressId = t.DbaAddressId 
        AND ISNULL(d.Address1,'') = t2.DBAAddress1
        AND ISNULL(d.[State],'') = t2.DBAState
        AND ISNULL(d.PostalCode,'') = t2.DBAPostalCode
    WHERE t.Id BETWEEN @RecId AND (@RecId + 49999)
    SET @RecId = @RecId + 50000
END

Nothing fancy but it got the job done. Because it was only processing 50,000 records at a time, any locks that got created were short lived. Also, the optimizer realized that it did not have to do the entire table so it did a better job of picking an execution plan.

<Edit> (9/28/2011)
There is a HUGE caveat to the suggestion that has been mentioned here more than once and is posted all over the place around the web regarding copying the "good" records to a different table, doing a TRUNCATE (or DROP and reCREATE, or DROP and rename) and then repopulating the table.

You cannot do this if the table is the PK table in a PK-FK relationship (or other CONSTRAINT). Granted, you could DROP the relationship, do the clean up, and re-establish the relationship, but you would have to clean up the FK table, too. You can do that BEFORE re-establishing the relationship, which means more "down-time", or you can choose to not ENFORCE the CONSTRAINT on creation and clean up afterwards. I guess you could also clean up the FK table BEFORE you clean up the PK table. Bottom line is that you have to explicitly clean up the FK table, one way or the other.

My answer is a hybrid SET-based/quasi-CURSOR process. Another benefit of this method is that if the PK-FK relationship is setup to CASCADE DELETES you don't have to do the clean up I mention above because the server will take care of it for you. If your company/DBA discourage cascading deletes, you can ask that it be enabled only while this process is running and then disabled when it is finished. Depending on the permission levels of the account that runs the clean up, the ALTER statements to enable/disable cascading deletes can be tacked onto the beginning and the end of the SQL statement.
</Edit>

避讳 2024-09-20 02:54:49

比尔·卡尔文对另一个问题的回答也适用于我的情况:

“如果您的 DELETE 旨在消除该表中的绝大多数行,人们经常做的一件事是将您想要保留的行复制到重复表中,然后使用 DROP TABLE 或 TRUNCATE 可以更快地擦除原始表。”

马特在这个答案中这样说:

“如果离线并删除一个很大的百分比,可能只需要构建一个新表来保留数据,删除旧表并重命名。”

此答案中的ammoQ(来自同一问题)建议(释义) :

  • 删除大量行时发出表锁,
  • 在任何外键列上放置索引

Bill Karwin's answer to another question applies to my situation also:

"If your DELETE is intended to eliminate a great majority of the rows in that table, one thing that people often do is copy just the rows you want to keep to a duplicate table, and then use DROP TABLE or TRUNCATE to wipe out the original table much more quickly."

Matt in this answer says it this way:

"If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename."

ammoQ in this answer (from the same question) recommends (paraphrased):

  • issue a table lock when deleting a large amount of rows
  • put indexes on any foreign key columns
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文