使用 SQL 更新 25 万行的更快方法
我需要更新表上大约 250k 行,并且要更新的每个字段将具有不同的值,具体取决于行本身(不是根据行 id 或键计算,而是在外部计算)。
我尝试使用参数化查询,但结果很慢(我仍然可以在 SQL Server 2008 中尝试使用表值参数 SqlDbType.Structured,但我希望有一种通用方法在多个数据库上执行此操作包括 MySql、Oracle 和 Firebird)。
进行大量的单独更新也很慢(但比使用参数化查询进行数千次单独调用(往返!)快大约 2 倍)
创建一个临时表并运行连接我的表和 tmp 表的更新怎么样?它会工作得更快吗?
I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).
I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql, Oracle and Firebird).
Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)
What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
“慢”到底有多慢?
这样做的主要问题是,它会在数据库的日志文件中创建一个巨大的条目(如果更新中途发生电源故障,数据库需要记录每个操作,以便它可以失败时回滚)。这很可能是“缓慢”的根源,比其他任何事情都重要(尽管显然有如此大量的行,但还有其他方法可以使事情变得低效[例如,每次更新进行一次数据库往返将慢得难以忍受] ,我只是说一旦你消除了明显的东西,你仍然会发现它相当慢)。
有几种方法可以让您更有效地做到这一点。一种方法是分块进行更新,例如一次更新 1,000 行。这样,数据库就会写入大量小的日志条目,而不是一个非常大的日志条目。
另一种方法是在更新期间关闭或“关闭”数据库的日志记录。例如,在 SQL Server 中,您可以设置 恢复模型为“简单”或“批量更新”,这将大大加快速度(但需要注意的是,如果更新期间出现电源故障或其他问题,您将面临更大的风险)。
编辑 只是为了扩展一点,可能首先实际执行查询的最有效方法是将所有新行批量插入到临时表中,然后执行从中对现有表进行单个
UPDATE
(或者按照我上面所说的那样以 1,000 个块进行UPDATE
)。我的大部分答案都是在您像这样实现之后解决问题:您仍然会发现它非常慢......How slow is "slow"?
The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).
There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.
Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).
Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single
UPDATE
of the existing table from that (or to do theUPDATE
in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...如果可能的话调用存储过程
call a stored procedure if possible
如果更新的列是索引的一部分,您可以
如果您需要这些索引来检索数据,那么它没有帮助。
If the columns updated are part of indexes you could
If you need these indexes to retrieve the data, well, it doesn't help.
您应该使用设置了 KeepIdentities 标志的 SqlBulkCopy。
作为 SqlTransaction 的一部分,执行查询以选择所有需要更新的记录,然后删除它们,返回那些选定的(现在已删除)记录。将它们一次性读入 C#。既然您已经缩小了选择范围,则更新内存中 C# 端的记录,然后 SqlBulk 将这些更新的记录、键和所有内容复制回来。并且不要忘记提交交易。这是更多的工作,但速度非常快。
You should use the SqlBulkCopy with the KeepIdentities flag set.
As part of a SqlTransaction do a query to SELECT all the records that need updating and then DELETE THEM, returning those selected (and now removed) records. Read them into C# in a single batch. Update the records on the C# side in memory, now that you've narrowed the selection and then SqlBulkCopy those updated records back, keys and all. And don't forget to commit the transaction. It's more work, but it's very fast.
这就是我要做的:
优点:
缺点:
Here's what I would do:
Pros:
Cons:
以下是可能导致更新变慢的因素:
Here are things that can make your updates slow: