主管进程如何监控进程? 在 JVM 上也可以做同样的事情吗?
Erlang 容错(据我所知)包括使用主管进程来监视工作进程,因此如果一个工作进程死亡,主管可以启动一个新的进程。
Erlang是如何进行这种监控的,尤其是在分布式场景下? 如何确定进程确实已经死亡? 它有心跳吗? 运行时环境中是否内置了某些东西? 如果网络电缆被拔掉怎么办?如果无法与其他进程通信,它是否会认为其他进程已经死亡? 。
我正在考虑如何在 JVM 中(比如 Java 或 Scala)实现 Erlang 所声称的相同的容错能力等 但我不确定它是否需要 JVM 内置的支持才能像 Erlang 一样做到这一点。 作为比较,我还没有找到 Erlang 如何做到这一点的定义。
Erlang fault tolerance (as I understand it) includes the use of supervisor processes to keep an eye on worker processes, so if a worker dies the supervisor can start up a new one.
How does Erlang do this monitoring, especially in a distributed scenario? How can it be sure the process has really died? Does it do heart beats? Is something built into the runtime environment? What if a network cable is unplugged - does it assume the other processes have died if it cannot communicate with them? etc.
I was thinking about how to achieve the same fault tolerance etc claimed by Erlang in the JVM (in say Java or Scala). But I was not sure if it required support built into the JVM to do it as well as Erlang. I had not come across a definition of how Erlang does it yet though as a point of comparison.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Erlang OTP 监督通常不在不同节点上的进程之间进行。 它会起作用,但最佳实践是以不同的方式进行。
常见的方法是编写整个应用程序,使其在每台计算机上运行,但应用程序知道它并不孤单。 应用程序的某些部分有一个节点监视器,因此它可以了解节点故障(这是通过简单的网络 ping 完成的)。 这些节点故障可用于更改负载平衡规则或转移到另一个主节点等。
此 ping 意味着检测节点故障存在延迟。 检测死对等节点(或死链接)可能需要相当长的时间。
如果主管和进程在本地运行,则崩溃和向主管发出的信号几乎是瞬时的。 它依赖于一个功能,即异常崩溃会传播到链接的进程,这些进程也会崩溃,除非它们捕获退出。
Erlang OTP Supervision is typically not done between processes on different nodes. It would work, but best practice is to do it differently.
The common approach is to write the entire application so it runs on each machine, but the application is aware that it is not alone. And some part of the application has a node monitor so it is aware of node-downs (this is done with simple network ping). These node downs can be used to change load balancing rules or fall over to another master, etc.
This ping means that there is latency in detecting node-downs. It can take quite a few seconds to detect a dead peer node (or dead link to it).
If the supervisor and process runs locally, the crash and the signal to the supervisor is pretty much instantanious. It relies on a feature that an abnormal crash propagates to linked processes that crash as well unless they trap exits.
似乎有人实施了 类似策略在斯卡拉中。 我的期望是主管会将网络故障视为失败的子流程,并且 Scala 流程的文档似乎证实了这一点。
It appears that someone has implemented a similar strategy in Scala. My expectation would be that a supervisor would treat a network failure as a failed subprocess, and the documentation on the Scala process seems to bear this out.
我认为你的意思是主管进程端口映射器。
您可以通过 JInterface 使用 Erlang 端口映射器/基础设施 - 这样您就可以避免重新发明轮子 - 如果您仍然需要它,您至少可以获得那里描述的所有接口。
I think you mean by Supervisor process the portmapper.
You could utilize the Erlang portmapper/infrastructure via the JInterface - thus you avoid reinventing the wheel - in case you still want it you get at least all interfaces described there.
Erlang 是开源的,这意味着您可以下载源代码并获得有关 Erlang 如何工作的明确答案它。
我相信它是在 BEAM 运行时完成的。 当进程死亡时,会向与其链接的所有进程发送信号。 请参阅 Erlang 编程 的第 9 章,了解完整的内容讨论。
在Erlang中,您可以选择监视节点,并接收
{node_up, Node}
和{node_down, Node}
消息 我假设如果您无法再与节点通信,这些也会被发送。 如何处理它们取决于您。Erlang is opensource, which means you can download the source and get the definitive answer on how Erlang does it.
I believe it's done in the BEAM runtime. When a process dies a signal is sent to all processes linked to it. See Chapter 9 of Programming Erlang for a full discussion.
In Erlang, you can choose to monitor a node, and receive
{node_up, Node}
and{node_down, Node}
messages. I assume these will also be sent if you can no longer talk to a node. How you handle them is up to you.