优化sql更新

发布于 2024-07-13 10:24:44 字数 420 浏览 8 评论 0原文

我们有 2 个表，分别称为 TableToUpdate 和 Dates。我们需要通过查看其他表日期来更新TableToUpdate的EndTimeKey列。我们运行下面的 sql 来执行此操作，但需要很长时间才能完成。

表 TableToUpdate 有 6M 条记录。表日期有 5000 条记录。

我们如何优化它？

感谢您的回复！

update TableToUpdate set
EndTimeKey = DATE_NO
from Dates where EndTime = DATE

原文

We have 2 tables called TableToUpdate and Dates.
We need to update TableToUpdate's EndTimeKey column by looking from other table Dates. We run sql below to do this, but it takes to long to finish.

Table TableToUpdate has 6M records.
Table Dates has 5000 records.

How can we optimize it ?

Thanks for replies !

update TableToUpdate set
EndTimeKey = DATE_NO
from Dates where EndTime = DATE

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酸甜透明夹心 2024-07-20 10:24:46

您可能会更新 600 万条记录，无论如何这都不会很快。但是，请查看您的执行计划并查看它是否使用索引。

还可以批量运行，这在更新大量记录时通常会更快。在数据库负载很少的非工作时间进行更新，这将减少潜在的锁定问题。确保两个表之间的数据类型相同，这样您就不必进行任何隐式转换。

看看你正在更新的表，上面有触发器吗？根据触发器的编写方式，这可能会严重减慢许多记录的更新速度（特别是如果不太聪明的人决定在触发器中放置游标或循环而不是编写基于集合的代码）。

另外，这里还有一些我要添加的内容（我还更改了它显示显式显示连接）

update t
set EndTimeKey = DATE_NO
from TableToUpdate t
Join Dates D on t.EndTime = d.DATE
where EndTimeKey <> DATE_NO

更新已经匹配的记录没有意义。

You are updating potentially 6 million records, this is not going to be terribly speedy in any event. However, look at your execution plan and see if it using indexes.

Also run this in batches, that is generally faster when updating large numbers of records. Do the update during off hours when there is little load on the database, this will reduce potential locking issues. Make sure your datatypes are the same between the two tables so you aren't having to do any implicit conversions.

Look at the table you are updating, are there any triggers on it? Depending on how the trigger is written, this could seriously slow down an update of that many records (especially if someone who wasn't too bright decided to put a cursor or a loop in the trigger instead of writing set-based code).

Also here is some thing I would add (I also changed it show explicitly show the join)

update t
set EndTimeKey = DATE_NO
from TableToUpdate t
Join Dates D on t.EndTime = d.DATE
where EndTimeKey <> DATE_NO

No point in updating records that already match.

回复收藏 0 原文

陪你到最终 2024-07-20 10:24:46

对于如此大量的数据，您可能最好创建一个 SELECT 查询，该查询会生成一个结果集，其中包含更新的值，正如您希望看到的新表一样。接下来，通过创建表并使用 INSERT INTO 或通过更改 SELECT 添加 INTO 来创建新表，将它们选择到新表中（可能是“NewTableToUpdate”）。

接下来使用 sp_rename 将“TableToUpdate”重命名为“OLDTableToUpdate”，将“NEWTableToUpdate”重命名为“TableToUpdate”，然后按照在原始表上创建索引的方式创建索引。

根据我的经验，我发现这是实现此类重大变化的最快方法。 HTH。

额外的想法...如果您的表上有聚集索引，则将 ORDER BY 添加到您的 SELECT 语句中，以确保它以与聚集索引相同的顺序插入到新表中。这将显着加快索引创建速度。

回复收藏 0 原文

走野 2024-07-20 10:24:46

您可以在相关字段上设置一些索引（按相关顺序），即：endtimekey 和 endtime。不要对此抱太大期望。您可以检查的另一件事是是否有其他约束来限制查询结果。

您还可以创建一个视图，为每个 tabletoupdate.endtimekey 返回正确的 date_no。

如果您的 dbms 确实支持此类内容，也许您可以编写一个存储过程 - 因为这确实会加速更新。

回复收藏 0 原文

我很坚强 2024-07-20 10:24:46

我在这里注意到几件事，EndTimeKey 真的是密钥吗？如果是这样，它可能有一个索引，如果是这样，速度（或缺乏）将更新索引，同时也进行数据的实际更新，解决方案删除索引，运行更新重新应用索引。

另一个问题可能是 Sql 的事务性质 - 当您进行此更新时，它会记录每个更改，以便在发生故障时可以回滚。此更新看起来非常简单，因此您可以批量应用它，即

update TableToUpdate setEndTimeKey = DATE_NOfrom Dates where EndTime = DATE
where TableToUpdateId between 1 and 100000

这会将您的更新分成可管理的大小块 - 至少您会知道每个块将花费多长时间。

另一种选择是在 EndTime 列上放置索引，可能必须执行全表扫描。

真正的答案是查看正在生成的查询计划。正如您所看到的，查询运行缓慢的原因有很多 - 这些只是一些需要快速检查的原因。

I note a couple of things here, is EndTimeKey really a key? If so it may have an index on it, if so the speed (or lack thereof) will be updateing the index whilst also doing the actual update of the data, solution drop the index, run the update re-apply the index.

Another issue could be the transactional nature of Sql - as you do this update it will log every change so it can roll back in the event of a failure. This update looks to be pretty straightforward, so you could apply it in batches ie

update TableToUpdate setEndTimeKey = DATE_NOfrom Dates where EndTime = DATE
where TableToUpdateId between 1 and 100000

That will break your update into manageable size chunks - at the very least you'll get an idea how long each chunk will take.

Another option is putting an index on the EndTime column, potentially it's having to do a full table scan.

The real answer though is to look at the query plan being generated. As you can see there are many reasons why a query may run slow - these are just some quick ones to check.

回复收藏 0 原文

~没有更多了~