维护两台机器之间的状态

发布于 2024-07-22 18:03:16 字数 216 浏览 18 评论 0原文

我们有两个工业控制器用于控制关键系统。 这个想法是,当一个控制器发生故障时,另一个控制器将自动接管。 为了确保无缝切换,每个备用控制器必须始终镜像在线控制器的状态。

我们有一个解决方案,但编码和文档都很差。 问题是,是否存在一种通用的设计模式来实现这样的系统或开源软件,以实现类似的目标,可以用来创建可用于控制器或 PC 的通用解决方案,并且可以扩展以允许任意数量的控制器充当备用例程。

We have two industrial controllers that are used to control critical systems. The idea is that on failure of one controller, the other controller will automatically take over. To ensure the swap over is seamless, each the standby controller must mirror the state of the online controller at all time.

We have a solution, which is poorly coded and documented. The question is, is there a common design pattern that implements such a system or open source software that achieves a similar thing thaty could be used to create a generic solution that could be used for controllers or PC's and can be extended to allow any number of controllers to act as standby routines.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

§对你不离不弃 2024-07-29 18:03:17

方法是“缓存一致性”。 商业产品——例如,Tangosol——就是这样做的。

另一种方法是企业服务总线 (ESB) 或面向服务的架构 (SOA) 的轻量级版本。 几乎所有的 SOA 供应商都有这方面的产品。 我将从 Tibco 开始,它有一个轻量级组件集可供您使用。

由于 SOA 并不那么难,因此您可以使用 HTTP 协议推出自己的协议,以便一个控制器可以将状态 POST 到它的影子控制器。

On approach is "cache coherence". Commercial products -- Tangosol, for example -- do this.

Another approach is a light-weight version of an Enterprise Service Bus (ESB) or Service Oriented Architecture (SOA). Almost all the SOA vendors have products for this. I'd start with Tibco, which has a lightweight component set that you can use for this.

Since SOA isn't that hard, you can roll your own using the HTTP protocol so one controller can POST status to it's shadow controllers.

谁把谁当真 2024-07-29 18:03:17

故障转移和透明故障转移之间存在差异。 您真的有透明故障转移的要求吗? 如果是这样,您最终将为此付出代价(在成本和复杂性方面)。

话虽这么说,请查看 Buddy Replication 上的这篇文章,了解一个优雅的解决方案问题。

There is a difference between failover and transparent failover. Do you really have requirements for transparent failover? If so, you're going to end up paying for it (in both cost and complexity).

That being said, take a look at this post on Buddy Replication for an elegant solution to the problem.

纸短情长 2024-07-29 18:03:17

我几乎所有的 DBMS 都使用了标准的主从模式,支持集群、分布式架构和复制(http ://en.wikipedia.org/wiki/Database_replication)。

因此,基本上在您的情况下,您可以让主机维护状态,而从机坐在那里,除了根据主机的状态更新自己的状态之外什么都不做。 如果主设备宕机,从设备发现主设备不再存在,并且可以接管状态控制,主设备只有在从从设备的状态更新了自己的状态后才能再次使用(在主人尚未激活)。

There is the standard Master-Slave pattern used my almost all DBMS' that support clustering, distributed architectures and replication (http://en.wikipedia.org/wiki/Database_replication).

So, very basically in your situation you could have the Master machine maintaining state, and the slave sitting there doing nothing except updating its own state from that of the master. If the master goes down, the slave sees the master is no longer there, and can take over the control of state, with the master only being used again once it has updated its own state from that of the slave (which has maintained state while the master has not been active).

国产ˉ祖宗 2024-07-29 18:03:17

控制实时关键系统所采用的传统方法是锁步运行两个单元。 多年来,Tandem 一直在使用这种技术构建一些非常令人印象深刻的容错机器。

然而,锁步在很大程度上是一种硬件级解决方案; 我认为你不能纯粹在软件级别实现经典的锁步。 或者至少,不是直截了当的。 也许使用通过交换矢量时钟或同样螺旋桨头同步的状态机?

The traditional approach taken in controlling realtime critical systems is to run the two units in lockstep. Tandem have been building some very impressive fault-tolerant machines using this technique for years.

However, lockstep is very much a hardware-level solution; i don't think you could implement classic lockstep purely at the software level. Or at least, not straightforwardly. Maybe using state machines synchronised by exchange of vector clocks or something equally propeller-headed?

彡翼 2024-07-29 18:03:17

航天飞机计算机也有类似的情况。 在这种情况下,他们使用了 5 台计算机,如果其中一台计算机迟到或与其他计算机不同,它(本质上)就会被投票从岛上剔除。

在你的情况下,你如何确定哪个控制器坏了? 判定机是否也考虑单点故障?

两个控制器之间可用什么级别的通信? 共享内存、以太网还是更慢的东西?

两者之间的状态信息变化有多快?

是否可以向两个控制器提供相同的信息,并且两个控制器会计算相同的状态转换吗?

There is an analogous situation with the space shuttle computers. In that situation, they used 5 computers and if one machine was late or different from the others, it was (in essence) voted off of the island.

In your situation, how do you determine which controller has gone bad? Is the determining machine also considered for single-point failure?

What level of communications are available between the two controllers? Shared memory, Ethernet, or something even slower?

How fast does state information change between the two?

Is it possible to feed identical information to both controllers and would both controllers calculate the same state transitions?

北城孤痞 2024-07-29 18:03:17

也许是共享 SQLite 数据库或类似的东西?

Maybe a shared SQLite database or something similar?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文