两阶段提交如何防止最后一秒失败?

发布于 2024-07-06 08:27:47 字数 346 浏览 12 评论 0原文

我正在研究两阶段提交如何在分布式事务中工作。 据我了解,在该阶段的最后一部分,事务协调器询问每个节点是否准备好提交。 如果每个人都同意,那么它就会告诉他们继续前进并做出承诺。

什么可以防止以下故障?

  1. 所有节点都回应说它们是 准备提交
  2. 交易 协调员告诉他们“继续 并提交”但节点之一 在收到此消息之前崩溃 消息
  3. 所有其他节点都成功提交,但现在分布式事务已损坏
  4. 据我所知,当崩溃的节点返回时,其事务将被回滚(因为它从未收到提交消息)

我假设每个节点都在运行普通数据库对分布式事务一无所知。 我错过了什么?

I am studying how two-phase commit works across a distributed transaction. It is my understanding that in the last part of the phase the transaction coordinator asks each node whether it is ready to commit. If everyone agreed, then it tells them to go ahead and commit.

What prevents the following failure?

  1. All nodes respond that they are
    ready to commit
  2. The transaction
    coordinator tells them to "go ahead
    and commit" but one of the nodes
    crashes before receiving this
    message
  3. All other nodes commit successfully, but now the distributed transaction is corrupt
  4. It is my understanding that when the crashed node comes back, its transaction will have been rolled back (since it never got the commit message)

I am assuming each node is running a normal database that doesn't know anything about distributed transactions. What did I miss?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

笙痞 2024-07-13 08:27:47

不,他们没有被指示回滚,因为在原始发布者的场景中,一些节点已经提交。 发生的情况是,当崩溃的节点变得可用时,事务协调器告诉它再次提交。

由于节点在“准备”阶段做出了积极响应,因此即使从崩溃中恢复,也需要能够“提交”。

No, they are not instructed to roll back because in the original poster's scenario, some of the nodes have already committed. What happens is when the crashed node becomes available, the transaction coordinator tells it to commit again.

Because the node responded positively in the "prepare" phase, it is required to be able to "commit", even when it comes back from a crash.

蓬勃野心 2024-07-13 08:27:47

总结一下大家的回答:

  1. 不能使用普通数据库进行分布式事务。 数据库必须显式支持事务协调器。

  2. 没有指示节点回滚,因为某些节点已经提交。 发生的情况是,当崩溃的节点恢复时,事务协调器告诉它完成提交。

Summarizing everyone's answers:

  1. One cannot use normal databases with distributed transactions. The database must explicitly support a transaction coordinator.

  2. The nodes are not instructed to roll back because some of the nodes have already committed. What happens is that when the crashed node comes back, the transaction coordinator tells it to finish the commit.

记忆で 2024-07-13 08:27:47

不,第 4 点是不正确的。 每个节点都会在稳定存储中记录它能够提交或回滚事务的信息,以便即使在崩溃时也能够按照命令执行操作。 当崩溃的节点恢复时,它必须意识到它有一个事务处于预提交状态,恢复任何相关的锁或其他控制,然后尝试联系协调器站点以收集事务的状态。

只有当崩溃的节点永远不会恢复时,才会出现问题(然后其他一切都认为事务正常,或者当崩溃的节点恢复时就会正常)。

No. Point 4 is incorrect. Each node records in stable storage that it was able to commit or rollback the transaction, so that it will be able to do as commanded even across crashes. When the crashed node comes back up, it must realize that it has a transaction in pre-commit state, reinstate any relevant locks or other controls, and then attempt to contact the coordinator site to collect the status of the transaction.

The problems only occur if the crashed node never comes back up (then everything else thinks the transaction was OK, or will be when the crashed node comes back).

掌心的温暖 2024-07-13 08:27:47

两阶段提交并不是万无一失的,只是设计用于在 99% 的情况下工作。

“该协议假设每个节点都有稳定的存储并带有预写日志,没有节点永远崩溃,预写日志中的数据永远不会在崩溃中丢失或损坏,并且任何两个节点都可以通信与彼此。”

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

Two phase commit isn't foolproof and is just designed to work in the 99% of the time cases.

"The protocol assumes that there is stable storage at each node with a write-ahead log, that no node crashes forever, that the data in the write-ahead log is never lost or corrupted in a crash, and that any two nodes can communicate with each other."

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

多情癖 2024-07-13 08:27:47

有很多方法可以解决两阶段提交的问题。 几乎所有这些都最终成为 Paxos 三阶段提交算法的某种变体。 Mike Burrows 在 Google 设计了基于 Paxos 的 Chubby 锁服务,他在我看到的一个讲座中说,分布式提交算法有两种类型——“Paxos 和不正确的算法”。

当崩溃的节点重新唤醒时,它可以做的一件事是说“我从未听说过这个事务,它应该被提交吗?” 发送给协调员,协调员将告诉它投票结果。

请记住,这是一个更普遍问题的示例:崩溃的节点在恢复之前可能会错过许多事务。 因此,非常重要的是,在恢复时,它应该在使其可用之前与协调器或另一个副本进行通信。 如果节点本身无法判断它是否已崩溃,那么事情会变得更加复杂,但仍然容易处理。

如果您使用仲裁系统进行数据库读取,则不一致的情况将被掩盖(并让数据库本身知道)。

There are many ways to attack the problems with two-phase commit. Almost all of them wind up as some variant of the Paxos three-phase commit algorithm. Mike Burrows, who designed the Chubby lock service at Google which is based on Paxos, said that there are two types of distributed commit algorithms - "Paxos, and incorrect ones" - in a lecture I saw.

One thing the crashed node could do, when it reawakes, is say "I never heard about this transaction, should it have been committed?" to the coordinator, which will tell it what the vote was.

Bear in mind that this is an example of a more general problem: the crashed node could miss many transactions before it recovers. Therefore it's terribly important that upon recovery it should talk either to the coordinator or another replica before making itself available. If the node itself can't tell whether or not it has crashed, then things get more involved but still tractable.

If you use a quorum system for database reads, the inconsistency will be masked (and made known to the database itself).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文