提高Sql删除的性能

发布于 2024-07-14 19:08:57 字数 221 浏览 5 评论 0原文

我们有一个查询,根据 id 字段(主键)从表中删除一些行。 这是一个非常简单的查询:

delete all from OUR_TABLE where ID in (123, 345, ...)

问题是 id 的数量可能很大(例如 70k),因此查询需要很长时间。 有什么办法可以优化这个吗? (我们正在使用 sybase - 如果这很重要的话)。

We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:

delete all from OUR_TABLE where ID in (123, 345, ...)

The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this?
(We are using sybase - if that matters).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

じ违心 2024-07-21 19:08:57

考虑批量运行它。 一次运行 1000 条记录的循环可能比执行所有操作的一个查询要快得多,而且不会长时间将表锁定给其他用户。

如果您有级联删除(并且许多外键表受到影响)或涉及触发器,您可能需要以更小的批次运行。 您必须进行试验才能确定哪个数字最适合您的情况。 我有一些表,我必须批量删除 100 条记录,而另一些表则需要 50000 条记录(幸运的是,在这种情况下,我删除了 100 万条记录)。

但在任何情况下,我都会将我打算删除的键值放入临时表中并从那里删除。

Consider running this in batches. A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.

If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. You'll have to experiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).

But in any even I would put my key values that I intend to delete into a temp table and delete from there.

别理我 2024-07-21 19:08:57

有两种方法可以执行这样的语句:

  1. 创建一个新表并复制除要删除的行之外的所有内容。 之后交换表(改变表名称...)我建议尝试一下,即使这听起来很愚蠢。 有些数据库的复制速度比删除速度快得多。

  2. 对表进行分区。 创建 N 个表并使用视图将它们连接成一个。 将行排序到按删除条件分组的不同表中。 这个想法是删除整个表而不是删除单个行。

There are two ways to make statements like this one perform:

  1. Create a new table and copy all but the rows to delete. Swap the tables afterwards (alter table name ...) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.

  2. Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.

梦里兽 2024-07-21 19:08:57

我想知道解析包含 70K 项的 IN 子句是否有问题。 您是否尝试过使用连接的临时表?

I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?

染火枫林 2024-07-21 19:08:57

Sybase 可以处理 IN 子句中的 70K 参数吗? 我使用的所有数据库对 IN 子句的参数数量都有一定的限制。 例如,Oracle 的限制约为 1000。

您可以创建子查询而不是 IN 子句吗? 这会缩短sql。 也许这对 IN 子句中如此大量的值有帮助。 像这样的事情:

  DELETE FROM OUR_TABLE WHERE ID IN 
        (SELECT ID FROM somewhere WHERE some_condition)

如果数据库模型允许,可以通过对数据库进行一些干预来加快删除大量记录的速度。 以下是一些策略:

  1. 您可以通过删除索引、删除记录并再次重新创建索引来加快速度。 这将消除在删除记录时重新平衡索引树的情况。

    • 删除表上的所有索引
    • 删除记录
    • 重新创建索引
    • 如果您与此表有很多关系,并且您绝对确定删除命令不会破坏任何完整性约束,请尝试禁用约束。 删除会更快,因为数据库不会检查完整性。 删除后启用约束。
    • 禁用完整性约束、禁用检查约束
    • 删除记录
    • 启用约束
    • 禁用表上的触发器(如果有的话)并且您的业务规则允许这样做。 删除记录,然后启用触发器。

    • 最后,按照其他建议进行操作 - 制作包含不被删除的行的表的副本,然后删除原始副本,重命名副本并重新创建完整性约束(如果有)。

我会尝试组合 1、2 和 3。如果这不起作用,则使用 4。如果一切都很慢,我会寻找更大的盒子 - 更多内存、更快的磁盘。

Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for IN clause. For example, Oracle have limit around 1000.

Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:

  DELETE FROM OUR_TABLE WHERE ID IN 
        (SELECT ID FROM somewhere WHERE some_condition)

Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:

  1. you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.

    • drop all indexes on table
    • delete records
    • recreate indexes
    • if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. Delete will go much faster because database won't be checking integrity. Enable constraints after delete.
    • disable integrity constraints, disable check constraints
    • delete records
    • enable constraints
    • disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.

    • last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.

I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.

苦行僧 2024-07-21 19:08:57

找出是什么消耗了性能!

在许多情况下,您可能会使用提供的解决方案之一。 但可能还有其他的(基于Oracle知识,所以在其他数据库上情况会有所不同。编辑:刚刚看到你提到了sybase):

  • 你在那个表上有外键吗? 确保引用 ID 已建立索引
  • 您在该表上有索引吗? 删除之前删除并删除之后重新创建可能会更快。
  • 检查执行计划。 是否使用全表扫描可能更快的索引? 或者反过来呢? 提示可能会有所帮助,
  • 而不是像上面建议的那样选择 new_table,因为选择可能会更快。

但请记住:首先找出是什么消耗了性能。

当您使用 DDL 语句时,请确保您理解并接受它可能对事务和备份产生的后果。

Find out what is using up the performance!

In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):

  • Do you have foreign keys on that table? Makes sure the referring ids are indexed
  • Do you have indexes on that table? It might be that droping before delete and recreating after the delete might be faster.
  • check the execution plan. Is it using an index where a full table scan might be faster? Or the other way round? HINTS might help
  • instead of a select into new_table as suggested above a create table as select might be even faster.

But remember: Find out what is using up the performance first.

When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.

总攻大人 2024-07-21 19:08:57

尝试按照与表或索引存储的顺序相同的顺序对传递到“in”的 ID 进行排序。然后,您可能会在磁盘缓存上获得更多命中。

将要删除的 ID 放入临时表中,该临时表的 ID 排序顺序与主表相同,可以让数据库对主表进行简单扫描。

您可以尝试使用多个连接并通过连接分配工作,以便使用数据库服务器上的所有 CPU,但首先要考虑将删除哪些锁等。

Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.

Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.

You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.

七分※倦醒 2024-07-21 19:08:57

我还认为临时表可能是最好的解决方案。

如果您要执行“delete from .. where ID in (select id from ...)”,那么对于大型查询来说,它仍然会很慢。 因此,我建议您使用联接进行删除 - 许多人不知道该功能。

因此,给定这个示例表:

    -- set up tables for this example
    if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
        drop table OurTable
    go

    create table OurTable (ID integer primary key not null)
    go
    insert into OurTable (ID) values (1)
    insert into OurTable (ID) values (2)
    insert into OurTable (ID) values (3)
    insert into OurTable (ID) values (4)
    go

然后我们可以编写删除代码,如下所示:

    create table #IDsToDelete (ID integer not null)
    go
    insert into #IDsToDelete (ID) values (2)
    insert into #IDsToDelete (ID) values (3)
    go
    -- ... etc ...
    -- Now do the delete - notice that we aren't using 'from'
    -- in the usual place for this delete
    delete OurTable from #IDsToDelete
       where OurTable.ID = #IDsToDelete.ID
    go
    drop table #IDsToDelete
    go
    -- This returns only items 1 and 4
    select * from OurTable order by ID
    go

I also think that the temp table is likely the best solution.

If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. I thus suggest that you delete using a join - many people don't know about that functionality.

So, given this example table:

    -- set up tables for this example
    if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
        drop table OurTable
    go

    create table OurTable (ID integer primary key not null)
    go
    insert into OurTable (ID) values (1)
    insert into OurTable (ID) values (2)
    insert into OurTable (ID) values (3)
    insert into OurTable (ID) values (4)
    go

We can then write our delete code as follows:

    create table #IDsToDelete (ID integer not null)
    go
    insert into #IDsToDelete (ID) values (2)
    insert into #IDsToDelete (ID) values (3)
    go
    -- ... etc ...
    -- Now do the delete - notice that we aren't using 'from'
    -- in the usual place for this delete
    delete OurTable from #IDsToDelete
       where OurTable.ID = #IDsToDelete.ID
    go
    drop table #IDsToDelete
    go
    -- This returns only items 1 and 4
    select * from OurTable order by ID
    go
会傲 2024-07-21 19:08:57

our_table 有关于删除级联的参考吗?

Does our_table have a reference on delete cascade?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文