更新、删除然后插入或替换:最快/可扩展的 MySQL(MyISAM) 方法

发布于 2024-10-27 04:40:14 字数 355 浏览 3 评论 0原文

我有一个每天都会定期更新的表,因此我正在寻找最具可扩展性的方法来更新行。这些更新是大批量进行的,因此每次更新可能包含大约 1000 行。

目前,我正在循环遍历这 1000 行中的每一行并运行单个更新查询...虽然执行时间不长,但与一个简单的批量插入语句相比,它似乎很浪费。因此 REPLACE INTO 是有道理的,因为它基本上删除旧行并插入新行,但这与手动“删除数组中的 id 位置”然后批量插入相比如何?完全一样吗?略有不同?有更好的方法吗?

这里的关键是这些不是单行查询而是批量行查询。所以问题是,运行这些更新的最具可扩展性的方式是什么。我说“可扩展”而不是“最快”,因为这些更新在具有活跃用户的生产服务器上定期发生,因此速度很重要,但不以锁定服务器为代价。

I have a table that gets updated very regularly throughout the day, so Im looking for the most scalable method for updating rows. These updates happen in large batches, so each update may include around 1000 rows.

Currently, I'm looping through each of these 1000 rows and running a single update query... while it doesn't take long to execute, it just seems wasteful compared to one simple mass insert statement. So REPLACE INTO makes sense, since its basically deleting the old rows and inserting new ones, but how does that compare to a manual "delete where id in array" then mass insert? Exact same? Slightly different? Is there a better method?

The key here is that these aren't single row queries but mass row queries. So the question is, what is the most scalable way to run these updates. I say "scalable" and not "fastest" because these updates happen at regular intervals on a production server with active users, so speed is important but not at the cost of locking up the server.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

少跟Wǒ拽 2024-11-03 04:40:14

为此,您想使用 InnoDB 而不是 MyISAM。为什么?因为当您执行批量插入和删除时,将整个事务包装在事务中可以巨大性能提升。

无论您最终对数据做什么,仅这一变化就可能是巨大的。

使用适当的事务隔离级别,您的用户当您更改表的所有内容时,可以继续使用该表,只有在提交后才能看到更改,而不必担心表锁定。

对于实际的数据更新,避免删除。删除很慢。进行更新,并仅删除需要更新的内容。也要避免使用 REPLACE INTO 魔法,因为它会在插入之前执行删除操作。

You want to use InnoDB for this instead of MyISAM. Why? Because when you're performing bulk inserts and deletes, wrapping the entire thing in a transaction can be a huge performance boost.

No matter what you end up doing to the data, that change alone could be huge.

With an appropriate transaction isolation level, your users could continue using the table while you change everything about it, only seeing the changes once you commit, without worry about table locks.

With regard to the actual data update, avoid deletes. Deletes are slow. Do updates, and delete only things you need to update. Avoid the REPLACE INTO magic as well, as it does a delete before an insert.

岁吢 2024-11-03 04:40:14

我不会推荐 InnoDB。使用它似乎是合乎逻辑的,因为论文说它是为了这个目的,并且行级锁定是一件好事 - 现在每当我们尝试它时,我们都会发现它比 MyISAM 慢得多,而且这永远无法通过你的东西来补偿增益与锁定差。这根本不值得。

相反,我试图回答原来的问题。

更新是最快的方法。

期间。

REPLACE 几乎从来都不是正确的方法。

它更像是一个避免某些情况下发生事务的补丁,并且它非常适合这一点;但只要您可以管理,请执行 SELECT COUNT 操作,以查明是否有具有相同键的行,然后如果答案是肯定的则选择 UPDATE,否则选择 INSERT。是的,我知道这意味着需要 3 个单独的步骤,而不是一个步骤。这就是为什么我说“只要你能做到”。所以,再说一遍:COUNT 加 UPDATE 更快 - 另一方面,REPLACE 是“原子的”。

啊,是的,忘了提一下:从 MySQL 4.1+ 开始,您有“INSERT ON DUPLICATE KEY UPDATE”,它可以一步完成上述所有操作。

如果可能,请使行固定大小 - 这使得更新成为一种速度更快。

如果您的表格在 PMA 中显示“行大小:动态”,则您的行可以增长/收缩,而且不是免费的。 VARCHAR 和其他文本字段通常执行此操作。现在,如果您有一小行并将其更新为更大的行,显然 UPDATE 需要寻找更大的空间来容纳新行并删除原始行。换句话说,在这种情况下,MySQL 会执行类似插入加删除的操作。这很昂贵。如果有办法,请对需要频繁更新的表使用固定行。这也将减少“表开销”。

您可以使用单个查询更新许多行

在 mysql 中,您有 CASE - 因此您可以执行类似“更新我的所有行,如果 id=1 将名称设置为 Jennifer,如果 id=2 将名称设置为 George, ……”。如果有很多字段,使用它进行更新会很复杂,但是是的,你可以做到,有一种方法,并且你可以将它包装在 php 函数中,这样将来只有查询长度才重要。这样,您的更新就是原子的。你可以称之为“穷人的交易”。

多次更新胜过一次

如果您有很多用户,显然您不希望锁定查询超过几毫秒。在这种情况下,执行更多更新比单个大事务要好得多,因为其他进程可以同时完成其工作并访问相同的表。

I would NOT recommend InnoDB. It seems logical to use it because the papers say it's for this purpose and row level locking is a good thing - now whenever we tried it we've found that it's slower, much slower than MyISAM, and this can never be compensated by what you gain with the locking difference. It's simply not worth it.

Instead, I'm trying to answer the original question.

UPDATE is the fastest way.

Period.

REPLACE is almost never the proper way.

It's more like a patch to avoid transactions in certain cases, and it works well for that; but whenever you can manage, do a SELECT COUNT instead, to find out if you have rows with the same key, then choose UPDATE if the answer is yes and INSERT otherwise. Yes, this means 3 separate steps instead of one, I know. That's why I say "whenever you can manage". So, again: COUNT plus UPDATE is faster - REPLACE, on the other hand, is "atomic".

Ah yes, forgot to mention: from MySQL 4.1+ you have "INSERT ON DUPLICATE KEY UPDATE" which does all above in one step.

Make your rows fixed size if possible - this makes UPDATEs a lot faster.

If your table shows "Row size: dynamic" in PMA, your rows are able to grow/shrink and it's not for free. VARCHAR and other text fields usually do this. Now if you have a small row and you update it to something bigger, obviously UPDATE needs to look for a bigger space to fit the new row and remove the original one. In other words, MySQL will do something like an insert plus delete in this case. It's costly. If there's a way, use fixed rows for tables needing frequent updates. This will also reduce "table overhead".

You can update many rows with a single query

In mysql, you have CASE - so you can do something like "update all my rows, setting name to Jennifer if id=1, George if id=2, ...". It's complex to use it for updating if there are many fields but yes, you can do it, there is a way, and you can wrap it in a php function so that only query length will matter in the future. This way, your updates are atomic. You can call it "poor man's transaction".

Many updates are better than one

If you have many users, obviously you don't want to lock queries for more than a few ms. In this case, doing more UPDATEs is a lot better than one single big transaction because other processes can do their job and access the same tables in the meantime.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文