提高Sql删除的性能
我们有一个查询,根据 id 字段(主键)从表中删除一些行。 这是一个非常简单的查询:
delete all from OUR_TABLE where ID in (123, 345, ...)
问题是 id 的数量可能很大(例如 70k),因此查询需要很长时间。 有什么办法可以优化这个吗? (我们正在使用 sybase - 如果这很重要的话)。
We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:
delete all from OUR_TABLE where ID in (123, 345, ...)
The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this?
(We are using sybase - if that matters).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
考虑批量运行它。 一次运行 1000 条记录的循环可能比执行所有操作的一个查询要快得多,而且不会长时间将表锁定给其他用户。
如果您有级联删除(并且许多外键表受到影响)或涉及触发器,您可能需要以更小的批次运行。 您必须进行试验才能确定哪个数字最适合您的情况。 我有一些表,我必须批量删除 100 条记录,而另一些表则需要 50000 条记录(幸运的是,在这种情况下,我删除了 100 万条记录)。
但在任何情况下,我都会将我打算删除的键值放入临时表中并从那里删除。
Consider running this in batches. A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.
If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. You'll have to experiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).
But in any even I would put my key values that I intend to delete into a temp table and delete from there.
有两种方法可以执行这样的语句:
创建一个新表并复制除要删除的行之外的所有内容。 之后交换表(
改变表名称...
)我建议尝试一下,即使这听起来很愚蠢。 有些数据库的复制速度比删除速度快得多。对表进行分区。 创建 N 个表并使用视图将它们连接成一个。 将行排序到按删除条件分组的不同表中。 这个想法是删除整个表而不是删除单个行。
There are two ways to make statements like this one perform:
Create a new table and copy all but the rows to delete. Swap the tables afterwards (
alter table name ...
) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.
我想知道解析包含 70K 项的 IN 子句是否有问题。 您是否尝试过使用连接的临时表?
I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?
Sybase 可以处理 IN 子句中的 70K 参数吗? 我使用的所有数据库对 IN 子句的参数数量都有一定的限制。 例如,Oracle 的限制约为 1000。
您可以创建子查询而不是 IN 子句吗? 这会缩短sql。 也许这对 IN 子句中如此大量的值有帮助。 像这样的事情:
如果数据库模型允许,可以通过对数据库进行一些干预来加快删除大量记录的速度。 以下是一些策略:
您可以通过删除索引、删除记录并再次重新创建索引来加快速度。 这将消除在删除记录时重新平衡索引树的情况。
禁用表上的触发器(如果有的话)并且您的业务规则允许这样做。 删除记录,然后启用触发器。
最后,按照其他建议进行操作 - 制作包含不被删除的行的表的副本,然后删除原始副本,重命名副本并重新创建完整性约束(如果有)。
我会尝试组合 1、2 和 3。如果这不起作用,则使用 4。如果一切都很慢,我会寻找更大的盒子 - 更多内存、更快的磁盘。
Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for
IN
clause. For example, Oracle have limit around 1000.Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:
Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:
you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.
disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.
last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.
I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.
找出是什么消耗了性能!
在许多情况下,您可能会使用提供的解决方案之一。 但可能还有其他的(基于Oracle知识,所以在其他数据库上情况会有所不同。编辑:刚刚看到你提到了sybase):
但请记住:首先找出是什么消耗了性能。
当您使用 DDL 语句时,请确保您理解并接受它可能对事务和备份产生的后果。
Find out what is using up the performance!
In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):
But remember: Find out what is using up the performance first.
When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.
尝试按照与表或索引存储的顺序相同的顺序对传递到“in”的 ID 进行排序。然后,您可能会在磁盘缓存上获得更多命中。
将要删除的 ID 放入临时表中,该临时表的 ID 排序顺序与主表相同,可以让数据库对主表进行简单扫描。
您可以尝试使用多个连接并通过连接分配工作,以便使用数据库服务器上的所有 CPU,但首先要考虑将删除哪些锁等。
Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.
Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.
You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.
我还认为临时表可能是最好的解决方案。
如果您要执行“delete from .. where ID in (select id from ...)”,那么对于大型查询来说,它仍然会很慢。 因此,我建议您使用联接进行删除 - 许多人不知道该功能。
因此,给定这个示例表:
然后我们可以编写删除代码,如下所示:
I also think that the temp table is likely the best solution.
If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. I thus suggest that you delete using a join - many people don't know about that functionality.
So, given this example table:
We can then write our delete code as follows:
our_table 有关于删除级联的参考吗?
Does our_table have a reference on delete cascade?