多个无事务外部系统上的原子操作

发布于 2024-09-05 14:51:28 字数 410 浏览 2 评论 0原文

假设您有一个连接 3 个不同外部系统的应用程序。您需要更新所有3个内容。如果发生故障,您需要回滚操作。 这并不是一件很难实现的事情,但是假设操作3失败了,当回滚时,操作1的回滚失败了!现在第一个外部系统处于无效状态...

我认为可能的解决方案是关闭应用程序并强制手动修复外部系统,但话又说回来...它可能已经使用了此信息(也许这就是它失败的原因),或者我们可能没有足够的访问权限。或者它甚至可能不是回滚操作的好方法!

处理此类案件有一些好的方法吗?

编辑:一些应用程序详细信息..

它是一个多用户网络应用程序。大部分工作是通过计划作业(通过 Quartz.Net)完成的,因此大多数操作都在它自己的线程中运行。不过,某些用户操作应该会触发更新多个系统的作业。外部系统有些不稳定。

我正在考虑更改应用程序以使用命令和工作单元模式

Say you have an application connecting 3 different external systems. You need to update something in all 3. In case of a failure, you need to roll back the operations.
This is not a hard thing to implement, but say operation 3 fails, and when rolling back, the rollback for operation 1 fails! Now the first external system is in an invalid state...

I'm thinking a possible solution is to shut down the application and forcing a manual fix of the external system, but then again... It might already have used this information (and perhaps that's why it failed), or we might not have sufficient access. Or it might not even be a good way to rollback the action!

Are there some good ways of handling such cases?

EDIT: Some application details..

It's a multi user web application. Most of the work is done with scheduled jobs (through Quartz.Net), so most operations is run in it's own thread. Some user actions should trigger jobs that update several systems though. The external systems are somewhat unstable.

I Was thinking of changing the application to use the Command and Unit Of Work pattern

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

挽容 2024-09-12 14:51:28

两阶段提交(2PC)可能适合这里。

第一阶段是让各个数据库同意他们愿意继续提交。在您的示例中,数据库 1 不会继续写入,直到确定所有三个数据库都报告事务可行为止。

这与您描述的“乐观”方法的过程相比 - 数据库 1 将假设事务应该执行,直到它了解到其他情况,并被迫回滚。

Two-Phase Commit (2PC) might be suitable here.

The first phase is getting the various databases to agree that they are willing to go ahead with the commit. In your example, database 1 won't proceed with the write until it is sure that all three databases have reported that the transaction will be possible.

This compares with the process that you are describing that is an "optimistic" approach - Database 1 will assume the transaction should go through until it learns otherwise, and is forced to rollback.

左岸枫 2024-09-12 14:51:28

您想进一步解释一下操作 1 的回滚如何会失败吗?

它想要达到的状态是它之前已经处于的状态,所以它在逻辑上应该是一致的。可能会出现网络故障等暂时性问题,但处理该问题的最佳方法可能是重试,直到问题消失。

如果问题是后续事务同时锁定或更改了数据,那么您将遇到一个更大的问题 - 您的事务不是原子的,回滚它们可能会导致其他事务的输出变得无效。

Would you like to explain further how the rollback of operation 1 could fail?

The state it is aiming to get to is one that it has been in before, so it should be logically consistent. There might be transient issues like network failure, but it might be the case that the best way to deal with that is to retry until the problems goes away.

If the problem is that subsequent transactions have locked or changed the data in the meantime, then you have a much larger problem - your transactions are not atomic, and rolling them back may cause the output of other transactions to become invalid.

说不完的你爱 2024-09-12 14:51:28

根据应用程序的大小(单用户与企业),关闭应用程序可能是一个坏主意。

首先,我建议将 3 个外部应用程序中更改的信息的初始状态保存到您自己的应用程序的本地存储中。这意味着您至少可以确定应用程序崩溃/回滚失败等情况下的回滚状态应该是什么。事务成功提交后,您就可以删除该数据。

当其中一项操作失败时该怎么办取决于 3 个外部系统的功能。我们假设其中一个系统保存员工数据。仅仅因为交易失败导致一名员工的地址错误而关闭应用程序就太过分了。每当访问员工的数据时,最好只检查失败的事务日志(即保存 3 个外部应用程序的初始状态的本地存储)。如果该员工数据被标记为无效,则抛出一个错误,指示该记录处于无效状态并且无法检索。

但是,如果整个外部系统将因失败的事务而陷入混乱,那么是的 - 您在这里无能为力,只能关闭您的应用程序,直到问题得到解决。

Depending on the size of the application (single user vs. enterprise), shutting down the application might be a bad idea.

First of all, I'd suggest saving the initial state of the information being changed in the 3 external apps to storage local to your own app. That means you can at least determine what the rollback state is supposed to be should your app crash/the rollback fail/etc. Once the transaction has successfully committed you can then delete this data.

What to do when one of the operations fails depends on the functionality of the 3 external systems. Let's assume that one of these systems holds employee data. Shutting down the application simply because one employee's address is wrong due to a failed transaction is overkill. It's much better to simply check the failed transaction log (ie. the local storage to which you saved the initial states of the 3 external apps) whenever an employee's data is accessed. If that employee data is flagged as invalid, throw an error indicating that the record is in an invalid state and cannot be retrieved.

However, if the entire external system will be thrown into disarray by a failed transaction, then yes - there's nothing you can do here but shut down your app until the problem is fixed.

落墨 2024-09-12 14:51:28

Oddthinking 的答案是一个很好的答案,但也有其局限性,因为要真正可靠地制作 2PC 非常困难。这在分布式计算社区中已经众所周知很长一段时间了,尽管很多人都尽力忽略它。

如果您有兴趣深入研究该领域,Paxos 共识算法是一个好地方开始。请注意,这是一个令人惊讶的困难问题,正是因为您提到的问题以及实际上不可能构建一个真正可靠的消息传递系统来在有限的时间内传递消息的事实。 (要理解为什么这是真的,请考虑有人拿着反铲< /a> 可能会消除各个通信方之间的所有网络链接...)

我怀疑真正的解决方案是设计整个系统的架构以及如何在整个系统中进行更改,以便一个区域的通信丢失不会造成灾难性的后果。这可能很容易,也可能不容易做到,具体取决于具体的细节。

Oddthinking's answer is a good one, but limited because it is very difficult to actually reliably do a 2PC. This has been known in the distributed computing community for quite a while, though lots of people try their best to just ignore it.

If you're interested in delving deeper into this area, the Paxos consensus algorithm is a good place to start. And be aware that this is a surprisingly difficult problem, precisely because of both the problems you allude to and the fact that it's actually impossible to build a truly reliable messaging system that can deliver a message in a bounded amount of time. (To understand why that's true, consider that someone with a backhoe might wipe out all the network links between the various communicating parties…)

I suspect the real fix is design the architecture of the overall system and how you roll out changes across it so that a loss of communications in one area is not catastrophic. This might or might not be easy to do, depending on the exact details.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文