在 MYSQL 中使用 SELECT MAX(id) 而不是在 PHP 中使用 mysql_insert_id() 有多糟糕?

发布于 2024-09-14 12:34:44 字数 506 浏览 9 评论 0原文

背景:我正在开发一个系统,开发人员似乎在使用一个函数来执行 MYSQL 查询,例如 "SELECT MAX(id) AS id FROM TABLE" 每当他们需要获取 id 时最后插入的行(具有 auto_increment 列的表)。

我知道这是一种可怕的做法(因为并发请求会弄乱记录),我正在尝试将这一点传达给非技术/管理团队,他们的回应是......

“哦,好吧,只有当我们有 
(a) 很多用户,或者 
(b) 只有当两个人尝试做某事时才会发生
    恰好在同一时间”

我不同意这两点,并且认为我们会比计划更早地遇到这个问题。然而,我试图计算(或找出一种机制)来计算在我们开始看到混乱的链接之前应该有多少用户应该使用该系统。

有什么数学见解吗?再说一遍,我知道这是一种可怕的做法,我只想了解这种情况下的变量...


更新:感谢大家的评论 - 我们正在朝着正确的方向前进并修复代码!

Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).

I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...

"Oh okay, we'll only face this problem when we have 
(a) a lot of users, or 
(b) it'll only happen when two people try doing something
    at _exactly_ the same time"

I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.

Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...


Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

牛↙奶布丁 2024-09-21 12:34:45

我没有数学知识,但我想指出答案(a)有点愚蠢。公司不想要大量用户吗?这不是一个目标吗?这种反应意味着他们宁愿解决问题两次,第二次可能要付出巨大的代价,而不是第一次正确地解决它。

I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.

寒尘 2024-09-21 12:34:45

当有人在一次插入和该查询运行之间向表中添加了某些内容时,就会发生这种情况。因此,回答你的问题,两个人使用该系统有可能会出现问题。

至少使用 LAST_INSERT_ID() 将获取特定资源的最后一个 ID,因此中间添加了多少新条目并不重要。

This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.

At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.

半﹌身腐败 2024-09-21 12:34:45

除了返回错误 ID 值的风险之外,还有 SELECT MAX(id) 的额外数据库查询开销,并且实际执行的 PHP 代码比简单的 mysql_insert_id() 更多。为什么要刻意让代码变慢呢?

In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?

メ斷腸人バ 2024-09-21 12:34:44

重点不在于是否可能出现潜在的糟糕情况。关键是它们是否可能。只要问题发生的可能性很大,如果已知的话就应该避免。

这并不是说我们正在谈论将一行函数调用更改为 5000 行怪物来处理远程可能的边缘情况。我们讨论的是实际上将调用缩短为更具可读性和更正确的用法。

我有点同意 @Mark Ba​​ker 的观点,即存在一些性能考虑,但由于 id 是主键,因此 MAX 查询会非常快。当然,LAST_INSERT_ID() 会更快(因为它只是从会话变量中读取),但速度很小。

并且您不需要大量用户即可实现此目的。您所需要的只是大量并发请求(甚至不是那么多)。如果插入开始和选择开始之间的时间为 50 毫秒(假设事务安全数据库引擎),那么每秒只需要 20 个请求即可开始一致地遇到问题。关键是错误窗口并非微不足道。如果您说每秒 20 个请求(实际上并不是很多),并假设普通人每分钟访问一个页面,那么您所说的只是 1200 个用户。这是为了让它定期发生。如果只有 2 个用户,这种情况可能会发生一次。

直接来自 有关该主题的 MySQL 文档

You can generate sequences without calling LAST_INSERT_ID(), but the utility of 
using the function this way is that the ID value is maintained in the server as 
the last automatically generated value. It is multi-user safe because multiple 
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by 
other clients that generate their own sequence values.

The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.

It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.

I kind of agree with @Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.

And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.

And right from the MySQL documentation on the subject:

You can generate sequences without calling LAST_INSERT_ID(), but the utility of 
using the function this way is that the ID value is maintained in the server as 
the last automatically generated value. It is multi-user safe because multiple 
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by 
other clients that generate their own sequence values.
GRAY°灰色天空 2024-09-21 12:34:44

您应该使用 文档 说:

相反,在 SQL 查询中使用内部 MySQL SQL 函数 LAST_INSERT_ID()

即使如此,SELECT MAX(id)mysql_insert_id() 都不是“线程安全”的,您仍然可能有竞争条件。最好的选择是在请求之前和之后锁定表。或者甚至更好地使用交易。

Instead of using SELECT MAX(id) you shoud do as the documentation says :

Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query

Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文