通过高事务隔离级别防止更新丢失:这是一个常见的误解吗?

发布于 2025-01-11 07:58:51 字数 2018 浏览 0 评论 0原文

我注意到我的应用程序经常将依赖于先前读取操作的值写入数据库。一个常见的例子是用户可以存钱的银行帐户:

void deposit(amount) {
    balance = getAccountBalance()
    setAccountBalance(balance + amount)
}

如果这个方法同时被两个线程/客户端/ATM 调用,我想避免竞争条件,这样帐户所有者就会赔钱:

balance = getAccountBalance()       |
                                    | balance = getAccountBalance()
setAccountBalance(balance + amount) |
                                    | // balance2 = getAccountBalance() // theoretical
                                    | setAccountBalance(balance + amount)
                                    V

我经常读到Repeatable ReadSerialized可以解决这个问题。甚至德语维基百科关于丢失更新的文章也指出了这一点。翻译成英文:

隔离级别 RR(可重复读)经常被提及作为丢失更新问题的解决方案。

这个答案建议可序列化来解决 SELECT 之后 INSERT 的类似问题。

据我理解这个想法 - 当右侧的过程尝试设置帐户余额时,(理论上的)读取操作将不再返回相同的余额。因此不允许写操作。是的 - 如果您阅读这个流行的答案,它实际上听起来非常合适:

在 REPEATABLE READ 下,第二个 SELECT 保证至少显示从第一个 SELECT 未更改返回的行。并发事务可能会在那一分钟内添加新行,但无法删除或更改现有行。

但后来我想知道“它们无法删除或更改”实际上是什么意思。如果您尝试删除/更改它会发生什么?你会得到一个错误吗?或者您的事务会等到第一个事务完成并最终执行其更新吗?这使得一切变得不同。在第二种情况下,你仍然会赔钱。

如果您阅读下面的评论,情况会变得更糟,因为还有其他方法可以满足可重复读取条件。例如快照技术:可以在​​左侧事务写入其值之前拍摄快照,这样如果稍后在右侧事务中发生第二次读取,则可以提供原始值。例如,请参阅 MySQL 手册:

同一事务内的一致读取会读取第一次读取建立的快照

我得出的结论是,限制事务隔离级别可能是消除竞争条件的错误工具。如果它解决了问题(对于特定的 DBMS),则不是由于可重复读取的定义所致。相反,这是因为满足可重复读取条件的特定实现。例如锁的使用。

所以,对我来说,它看起来像这样:解决这个问题实际上需要的是锁定机制。一些 DBMS 使用锁来实现可重复读取这一事实被利用了。

这个假设正确吗?还是我对事务隔离级别理解错误?


您可能会生气,因为这肯定是有关该主题的第一百万个问题。问题是:示例银行账户场景绝对至关重要。就在那里,应该绝对清楚正在发生的事情,在我看来,似乎有太多误导性和矛盾的信息和误解。

I noticed that my applications often write values to a database that depend on a former read operation. A common example is a bank account where a user could deposit money:

void deposit(amount) {
    balance = getAccountBalance()
    setAccountBalance(balance + amount)
}

I want to avoid a race condition if this method is called by two threads/clients/ATMs simultaneously like this where the account owner would lose money:

balance = getAccountBalance()       |
                                    | balance = getAccountBalance()
setAccountBalance(balance + amount) |
                                    | // balance2 = getAccountBalance() // theoretical
                                    | setAccountBalance(balance + amount)
                                    V

I often read that Repeatable Read or Serializable can solve this problem. Even the german Wikipedia article for Lost Updates states this. Translated to english:

The isolation level RR (Repeatable Read) is often mentioned as a solution to the lost update problem.

This SO answer suggests Serializable for a similar problem with INSERT after SELECT.

As far as I understood the idea - at the time the process on the right side tries to set the account balance, a (theoretical) reading operation wouldn't return the same balance anymore. Therefore the write operation is not allowed. And yes - if you read this popular SO answer, it actually sounds perfectly fitting:

under REPEATABLE READ the second SELECT is guaranteed to display at least the rows that were returned from the first SELECT unchanged. New rows may be added by a concurrent transaction in that one minute, but the existing rows cannot be deleted nor changed.

But then I wondered what "they cannot be deleted nor changed" actually means. What happens if you try to delete/change it anyway? Will you get an error? Or will your transaction wait until the first transaction finished and in the end also perform its update? This makes all the difference. In the second case you will still lose money.

And if you read the comments below it gets even worse, because there are other ways to meet the Repeatable Read conditions. For example a snapshot technology: A snapshot could be taken before the left side transaction writes its value and this allows to provide the original value if a second read occurs later in the right side transaction. See, for instance, the MySQL manual:

Consistent reads within the same transaction read the snapshot established by the first read

I came to the conclusion that restricting the transaction isolation level is probably the wrong tool to get rid of the race condition. If it solves the problem (for a specific DBMS), it's not due to the definition of Repeatable Read. Rather it's because of a specific implementation to fulfil the Repeatable Read conditions. For instance the usage of locks.

So, to me it looks like this: What you actually need to solve this issue is a locking mechanism. The fact that some DBMS use locks to implement Repeatable Read is exploited.

Is this assumption correct? Or do I have a wrong understanding of transaction isolation levels?


You might be annoyed, because this must be the millionth question about the topic. The problem is: The example bank account scenario is absolutely critical. Just there, where it should be absolutely clear what's going on, it seems to me as if there is so much misleading and contradictory information and misconceptions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

如梦 2025-01-18 07:58:51

这里的问题是,您询问 SQL 标准定义的隔离级别需要什么来解决不属于该定义的并发异常。

SQL 标准仅定义隔离级别(读取未提交读取已提交可重复读取可序列化)如何映射到< code>脏读、不可重复读幻读异常。没有提到Lost-Update,因此,正如您正确指出的那样,这取决于特定 DBMS 如何实现隔离级别。

据推测,REPEATABLE_READ 足以防止 PostgreSQL 上的 Lost-Update,而需要 SERIALIZABLE 来防止 MySQL 和 Oracle 上的丢失更新。

以下是一些关于 OraclePostgreSQL/MySQL

The problem here is that you are asking what Isolation Level, as defined by the SQL Standard, is needed to sort out a concurrency anomaly that is not part of this definition.

SQL Standard only defines how Isolation Levels (Read Uncommited, Read Commited, Repeatable Read, Serializable) map to Dirty Read, Non-Repeatable Read and Phantom Read anomalies. No mention of Lost-Update, so this - as you rightly pointed out - depends on how isolation levels are implemented by a specific DBMS.

Supposedly REPEATABLE_READ is enough to prevent Lost-Update on PostgreSQL and SERIALIZABLE is needed to prevent it on MySQL and Oracle.

Here are some interesting posts about Oracle and PostgreSQL/MySQL

幸福还没到 2025-01-18 07:58:51

丢失更新是一种事务异常,仅当事务使用乐观锁定时才会发生。在悲观锁中永远不会发生这种情况。

  • 某些 RDBMS 只提供乐观锁定,例如
    Oracle 数据库和 PostGreSQL
  • 其他一些 RDBMS 只提供悲观锁定,情况就是如此
    IBM DB2
  • 最后 Microsoft SQL Server 能够交替使用
    乐观或悲观锁定取决于用户的选择,默认行为是悲观的

因此问题必须面对您使用哪种 RDBMS 以及您拥有哪种类型的锁定...

更多信息...

保证执行写入的事务的成功完成。尽管如此,这种技术并不能防止死锁......

数学家 Edsger Dijkstra 解决了最后一个问题(银行家算法),他表明有必要在开始更新数据(INSERT、UPDATE、DELETE...)之前,设置所有保护所处理的数据所需的锁,这相当于只能独占访问所有正在处理的数据……Dijkstra 因其对计算机科学的贡献而获得图灵奖!

换句话说,只有一个用户访问数据库! ...

总结...

下表给出了事务异常以及使用隔离级别时可以避免的情况:

隔离级别和事务异常

Lost update is a transactional anomaly that occurs only if the transaction use optimistic locking. It will never happen in pessimistic locking.

  • Some RDBMS offers optimistic locking only, which is the case for
    Oracle Database and PostGreSQL
  • Some other RDBMS offers only pessimistic locking, which is the case
    of IBM DB2
  • And finally Microsoft SQL Server is able to alternately use
    optimistic or pessimistic locking depending on the user's choice, with a default behavior that is pessimistic

So the questions must be facing which RDBMS do you use and which type of locking do you have...

Some more informations...

Guaranteeing the successful completion of a transaction that performs writes is only possible if one starts by locking in exclusive mode, while maintaining the locks for the duration of the transaction by ensuring that the lock mode is pessimistic and not optimistic. Despite this, this technique will not prevent deadlocks...

The mathematician Edsger Dijkstra solved this last problem (Banker's algorithm) by showing that it is necessary, before starting to update the data (INSERT, UPDATE, DELETE...), set all the locks necessary to protect the data handled, which amounts to having only exclusive access to all the processing data... Dijkstra win the Turing award for this contribution to computer science !

In other words, having only one user who accesses the database! ...

To summerize...

Transactional anomalies and what does avoid when using an isolation level is given with the following table:

Isolation level and transactional anomalies

您的好友蓝忘机已上羡 2025-01-18 07:58:51

在 SQL Server 中,REPEATABLE READ 和 SERIALIZABLE 都将通过死锁使一个事务失败来防止更新丢失。在这些隔离级别中,每个会话将在初始 SELECT 期间获取并持有目标行上的共享 (S) 锁。然后每个会话都会尝试获取该行的排它(X)锁来更新它,从而导致死锁。

如果您希望避免丢失更新,而不需要让一个会话等待另一个会话完成,则必须在初始选择之前或期间创建一个更具排它性的锁。通常的模式是向初始选择添加 UPDLOCK 提示以指示“选择更新”。对于“选择更新”,没有理由提高事务隔离级别。

Oracle 和 PostgreSQL 也有您可以使用的“选择更新”语法。

In SQL Server both REPEATABLE READ and SERIALIZABLE will prevent the lost update by failing one transaction with a deadlock. In these isolation levels each session will take and hold a shared (S) lock on the target row during the initial SELECT. Then each session will try to get an exclusive (X) lock on the row to update it, causing a deadlock.

If you want to avoid the lost update without by having one session wait until the other has completed you must create a more exclusive lock before or during the initial select. The normal pattern for this is to add an UPDLOCK hint to the initial select to indicate a "select for update". And with "select for update" there's no reason to raise the transaction isolation level.

Oracle and PostgreSQL also have "select for update" syntax you can use.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文