使用 SQL 更新 25 万行的更快方法

发布于 2024-08-30 03:26:49 字数 305 浏览 5 评论 0原文

我需要更新表上大约 250k 行,并且要更新的每个字段将具有不同的值,具体取决于行本身(不是根据行 id 或键计算,而是在外部计算)。

我尝试使用参数化查询,但结果很慢(我仍然可以在 SQL Server 2008 中尝试使用表值参数 SqlDbType.Structured,但我希望有一种通用方法在多个数据库上执行此操作包括 MySql、Oracle 和 Firebird)。

进行大量的单独更新也很慢(但比使用参数化查询进行数千次单独调用(往返!)快大约 2 倍)

创建一个临时表并运行连接我的表和 tmp 表的更新怎么样?它会工作得更快吗?

I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).

I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql, Oracle and Firebird).

Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)

What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

审判长 2024-09-06 03:26:49

“慢”到底有多慢?

这样做的主要问题是,它会在数据库的日志文件中创建一个巨大的条目(如果更新中途发生电源故障,数据库需要记录每个操作,以便它可以失败时回滚)。这很可能是“缓慢”的根源,比其他任何事情都重要(尽管显然有如此大量的行,但还有其他方法可以使事情变得低效[例如,每次更新进行一次数据库往返将慢得难以忍受] ,我只是说一旦你消除了明显的东西,你仍然会发现它相当慢)。

有几种方法可以让您更有效地做到这一点。一种方法是分块进行更新,例如一次更新 1,000 行。这样,数据库就会写入大量小的日志条目,而不是一个非常大的日志条目。

另一种方法是在更新期间关闭或“关闭”数据库的日志记录。例如,在 SQL Server 中,您可以设置 恢复模型为“简单”或“批量更新”,这将大大加快速度(但需要注意的是,如果更新期间出现电源故障或其他问题,您将面临更大的风险)。

编辑 只是为了扩展一点,可能首先实际执行查询的最有效方法是将所有新行批量插入到临时表中,然后执行从中对现有表进行单个UPDATE(或者按照我上面所说的那样以 1,000 个块进行UPDATE)。我的大部分答案都是在您像这样实现之后解决问题:您仍然会发现它非常慢......

How slow is "slow"?

The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).

There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.

Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).

Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single UPDATE of the existing table from that (or to do the UPDATE in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...

浮萍、无处依 2024-09-06 03:26:49

如果可能的话调用存储过程

call a stored procedure if possible

新雨望断虹 2024-09-06 03:26:49

如果更新的列是索引的一部分,您可以

  • 删除这些索引,
  • 并在更新时
  • 重新创建索引。

如果您需要这些索引来检索数据,那么它没有帮助。

If the columns updated are part of indexes you could

  • drop these indexes
  • do the update
  • re-create the indexes.

If you need these indexes to retrieve the data, well, it doesn't help.

不疑不惑不回忆 2024-09-06 03:26:49

您应该使用设置了 KeepIdentities 标志的 SqlBulkCopy。

作为 SqlTransaction 的一部分,执行查询以选择所有需要更新的记录,然后删除它们,返回那些选定的(现在已删除)记录。将它们一次性读入 C#。既然您已经缩小了选择范围,则更新内存中 C# 端的记录,然后 SqlBulk 将这些更新的记录、键和所有内容复制回来。并且不要忘记提交交易。这是更多的工作,但速度非常快。

You should use the SqlBulkCopy with the KeepIdentities flag set.

As part of a SqlTransaction do a query to SELECT all the records that need updating and then DELETE THEM, returning those selected (and now removed) records. Read them into C# in a single batch. Update the records on the C# side in memory, now that you've narrowed the selection and then SqlBulkCopy those updated records back, keys and all. And don't forget to commit the transaction. It's more work, but it's very fast.

一紙繁鸢 2024-09-06 03:26:49

这就是我要做的:

  1. 检索整个表,即计算/检索/查找/生成外部更改所需的列
  2. 计算/生成这些更改
  3. 运行批量插入到临时表,上传您需要的信息服务器端以便进行更改。这将需要关键信息+您想要更改的所有行的新值。
  4. 在服务器上运行 SQL 将新值从临时表复制到生产表中。

优点:

  • 在服务器端运行最后一步比运行大量的单个 SQL 更快,因此您将在更短的时间内锁定有问题的表 像
  • 这样的批量插入速度很快

缺点:

  • 需要数据库中有额外的空间临时表
  • 生成更多日志数据,记录批量插入和对生产表的更改

Here's what I would do:

  1. Retrieve the entire table, that is, the columns you need in order to calculate/retrieve/find/produce the changes externally
  2. Calculate/produce those changes
  3. Run a bulk insert to a temporary table, uploading the information you need server-side in order to do the changes. This would require the key information + new values for all the rows you intend to change.
  4. Run SQL on the server to copy new values from the temporary table into the production table.

Pros:

  • Running the final step server-side is faster than running tons and tons of individual SQL, so you're going to lock the table in question for a shorter time
  • Bulk insert like this is fast

Cons:

  • Requires extra space in your database for the temporary table
  • Produces more log data, logging both the bulk insert and the changes to the production table
蓝眸 2024-09-06 03:26:49

以下是可能导致更新变慢的因素:

  • 通过参数化查询一项一项地执行更新
    • 解决方案:在一条语句中进行更新
  • 大事务会创建大日志条目
  • 更新索引(RDBMS 将在每​​行之后更新索引。如果更改索引列,在大型表上可能会非常昂贵)
    • 如果可以的话,请在更新前删除索引并在更新后重新创建它们
  • 更新具有外键约束的字段后重新创建它们 - 对于每个插入的记录,RDBMS 将去寻找适当的键
    • 如果可以,请在更新前禁用外键约束并在更新后启用它们
  • 触发器和行级别检查 之后启用它们
    • 如果可以的话,请在更新前禁用触发器并在更新后启用它们

Here are things that can make your updates slow:

  • executing updates one by one through parametrized query
    • solution: do update in one statement
  • large transaction creates big log entry
  • updating indexes (RDBMS will update index after each row. If you change indexed column, it could be very costly on large table)
    • if you can, drop indices before update and recreate them after
  • updating field that has foreign key constraint - for each inserted record RDBMS will go and look for appropriate key
    • if you can, disable foreign key constraints before update and enable them after update
  • triggers and row level checks
    • if you can, disable triggers before update and enable them after
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文