成功故障转移后 Erlang 接管失败
我有一个分布在 2 个节点上的应用程序。 当我暂停()第一个节点时,故障转移工作正常,但是(有时?)当我重新启动第一个节点时,接管失败并且应用程序崩溃,因为 start_link 返回已经启动。
SUPERVISOR REPORT <0.60.0> 2009-05-20 12:12:01
===============================================================================
Reporting supervisor {local,twitter_server_supervisor}
Child process
errorContext start_error
reason {already_started,<2415.62.0>}
pid undefined
name tag1
start_function {twitter_server,start_link,[]}
restart_type permanent
shutdown 10000
child_type worker
ok
我的应用程序
start(_Type, Args)->
twitter_server_supervisor:start_link( Args ).
stop( _State )->
ok.
我的主管:
start_link( Args ) ->
supervisor:start_link( {local,?MODULE}, ?MODULE, Args ).
两个节点都使用相同的 sys.config 文件。
我对这个过程有什么不明白的地方,以至于上面的内容不应该起作用?
I have an application distributed over 2 nodes. When I halt() the first node the failover works perfectly, but ( sometimes ? ) when I restart the first node the takeover fails and the application crashes since start_link returns already started.
SUPERVISOR REPORT <0.60.0> 2009-05-20 12:12:01
===============================================================================
Reporting supervisor {local,twitter_server_supervisor}
Child process
errorContext start_error
reason {already_started,<2415.62.0>}
pid undefined
name tag1
start_function {twitter_server,start_link,[]}
restart_type permanent
shutdown 10000
child_type worker
ok
My app
start(_Type, Args)->
twitter_server_supervisor:start_link( Args ).
stop( _State )->
ok.
My supervisor :
start_link( Args ) ->
supervisor:start_link( {local,?MODULE}, ?MODULE, Args ).
Both nodes are using the same sys.config file.
What am I not understanding about this process that the above should not work ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的问题似乎源于 Twitter 服务器主管试图启动其子级之一。 由于错误报告抱怨带有 start_function 的子进程
并且由于您没有显示该代码,所以我只能猜测它正在尝试为自己注册一个名称,但已经有一个使用该名称注册的进程。
更令人猜测的是,原因显示了一个 Pid,这个 Pid 的名称是我们试图为自己获取的:
那里的 Pid 有一个非零的初始整数,如果它为零,则意味着它是一个本地进程。 由此我推断您正在尝试注册一个全局名称,并且您已连接到另一个节点,其中已经有一个以该名称全局注册的进程。
It seems like your problem stem from twitter server supervisor trying to start one of its children. Since the error report complains about the child with start_function
And since you are not showing that code, I can only guess that it is trying to register a name for itself, but there is already a process registered with that name.
Even more guessing, the reason shows a Pid, the Pid that has the name that we tried to grab for ourself:
The Pid there has a non-zero initial integer, if it was zero it means it is a local process. From which I deduce that you are trying to register a global name, and you are connected to another node where there is already a process globally registered by that name.