erlang OTP Supervisor 崩溃
我正在研究 Erlang 文档,试图了解设置 OTP gen_server 和 Supervisor 的基础知识。每当我的 gen_server 崩溃时,我的主管也会崩溃。事实上,每当我在命令行上出错时,我的主管就会崩溃。
我希望 gen_server 在崩溃时能够重新启动。我希望命令行错误对我的服务器组件没有任何影响。我的主管根本不应该崩溃。
我正在使用的代码是一个基本的“回显服务器”,它会回复您发送的任何内容,以及一个每分钟最多重新启动 echo_server 5 次的主管(one_for_one)。我的代码:
echo_server.erl
-module(echo_server).
-behaviour(gen_server).
-export([start_link/0]).
-export([echo/1, crash/0]).
-export([init/1, handle_call/3, handle_cast/2]).
start_link() ->
gen_server:start_link({local, echo_server}, echo_server, [], []).
%% public api
echo(Text) ->
gen_server:call(echo_server, {echo, Text}).
crash() ->
gen_server:call(echo_server, crash)..
%% behaviours
init(_Args) ->
{ok, none}.
handle_call(crash, _From, State) ->
X=1,
{reply, X=2, State}.
handle_call({echo, Text}, _From, State) ->
{reply, Text, State}.
handle_cast(_, State) ->
{noreply, State}.
echo_sup.erl
-module(echo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(echo_sup, []).
init(_Args) ->
{ok, {{one_for_one, 5, 60},
[{echo_server, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]}]}}.
使用 erlc *.erl
编译,这是一个示例运行:
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-p
oll:false]
Eshell V5.7.2 (abort with ^G)
1> echo_sup:start_link().
{ok,<0.37.0>}
2> echo_server:echo("hi").
"hi"
3> echo_server:crash().
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server echo_server terminating
** Last message in was crash
** When Server state == none
** Reason for termination ==
** {'function not exported',
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server <0.37.0> terminating
** Last message in was {'EXIT',<0.35.0>,
{{{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}}
** When Server state == {state,
{<0.37.0>,echo_sup},
one_for_one,
[{child,<0.41.0>,echo_server,
{echo_server,start_link,[]},
permanent,brutal_kill,worker,
[echo_server]}],
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
5,60,
[{1273,79154,701110}],
echo_sup,[]}
** Reason for termination ==
** {{{undef,[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}
** exception exit: {{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}}
in function gen_server:call/2
4> echo_server:echo("hi").
** exception exit: {noproc,{gen_server,call,[echo_server,{echo,"hi"}]}}
in function gen_server:call/2
5>
I'm working through the Erlang documentation, trying to understand the basics of setting up an OTP gen_server and supervisor. Whenever my gen_server crashes, my supervisor crashes as well. In fact, whenever I have an error on the command line, my supervisor crashes.
I expect the gen_server to be restarted when it crashes. I expect command line errors to have no bearing whatsoever on my server components. My supervisor shouldn't be crashing at all.
The code I'm working with is a basic "echo server" that replies with whatever you send in, and a supervisor that will restart the echo_server 5 times per minute at most (one_for_one). My code:
echo_server.erl
-module(echo_server).
-behaviour(gen_server).
-export([start_link/0]).
-export([echo/1, crash/0]).
-export([init/1, handle_call/3, handle_cast/2]).
start_link() ->
gen_server:start_link({local, echo_server}, echo_server, [], []).
%% public api
echo(Text) ->
gen_server:call(echo_server, {echo, Text}).
crash() ->
gen_server:call(echo_server, crash)..
%% behaviours
init(_Args) ->
{ok, none}.
handle_call(crash, _From, State) ->
X=1,
{reply, X=2, State}.
handle_call({echo, Text}, _From, State) ->
{reply, Text, State}.
handle_cast(_, State) ->
{noreply, State}.
echo_sup.erl
-module(echo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(echo_sup, []).
init(_Args) ->
{ok, {{one_for_one, 5, 60},
[{echo_server, {echo_server, start_link, []},
permanent, brutal_kill, worker, [echo_server]}]}}.
Compiled using erlc *.erl
, and here's a sample run:
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-p
oll:false]
Eshell V5.7.2 (abort with ^G)
1> echo_sup:start_link().
{ok,<0.37.0>}
2> echo_server:echo("hi").
"hi"
3> echo_server:crash().
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server echo_server terminating
** Last message in was crash
** When Server state == none
** Reason for termination ==
** {'function not exported',
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}
=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server <0.37.0> terminating
** Last message in was {'EXIT',<0.35.0>,
{{{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}}
** When Server state == {state,
{<0.37.0>,echo_sup},
one_for_one,
[{child,<0.41.0>,echo_server,
{echo_server,start_link,[]},
permanent,brutal_kill,worker,
[echo_server]}],
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]}}},
5,60,
[{1273,79154,701110}],
echo_sup,[]}
** Reason for termination ==
** {{{undef,[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}},
[{gen_server,call,2},
{erl_eval,do_apply,5},
{shell,exprs,6},
{shell,eval_exprs,6},
{shell,eval_loop,3}]}
** exception exit: {{undef,
[{echo_server,terminate,
[{{badmatch,2},
[{echo_server,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
none]},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,[echo_server,crash]}}
in function gen_server:call/2
4> echo_server:echo("hi").
** exception exit: {noproc,{gen_server,call,[echo_server,{echo,"hi"}]}}
in function gen_server:call/2
5>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从 shell 测试主管的问题是主管进程链接到 shell 进程。当 gen_server 进程崩溃时,退出信号会传播到崩溃并重新启动的 shell。
为了避免这个问题,向主管添加这样的内容:
The problem testing supervisors from the shell is that the supervisor process is linked to the shell process. When gen_server process crashes the exit signal is propagated up to the shell which crashes and get restarted.
To avoid the problem add something like this to the supervisor:
我建议您调试/跟踪您的应用程序以检查发生了什么。这对于理解 OTP 中的工作原理非常有帮助。
对于您的情况,您可能需要执行以下操作。
启动跟踪器:
跟踪您的主管和 gen_server 的所有函数调用:
检查进程正在传递哪些消息:
查看进程发生了什么(崩溃等):
有关跟踪的更多信息:
http://www.erlang.org/doc/man/dbg.html
http://aloiroberto.wordpress.com/2009/02/23/tracing-erlang-functions/
希望这对当前和未来的情况有所帮助。
提示: gen_server 行为期望定义并导出回调terminate/2 ;)
更新: 在定义terminate/2之后从痕迹中可以看出事故原因。它看起来是这样的:
我们 (75) 调用 crash/0 函数。这由 gen_server (78) 接收。
呃,句柄调用有问题。我们有一个不匹配...
调用终止函数。服务器退出并且取消注册。
主管 (77) 收到来自 gen_server 的退出信号并完成其工作:
好吧,它尝试......因为它发生了 Filippo 所说的......
I would suggest you to debug/trace your application to check what's going on. It's very helpful in understanding how things work in OTP.
In your case, you might want to do the following.
Start the tracer:
Trace all function calls for your supervisor and your gen_server:
Check which messages the processes are passing:
See what's happening to your processes (crash, etc):
For more information about tracing:
http://www.erlang.org/doc/man/dbg.html
http://aloiroberto.wordpress.com/2009/02/23/tracing-erlang-functions/
Hope this can help for this and future situations.
HINT: The gen_server behaviour is expecting the callback terminate/2 to be defined and exported ;)
UPDATE: After the definition of the terminate/2 the reason of the crash is evident from the trace. This is how it looks:
We (75) call the crash/0 function. This is received by the gen_server (78).
Uh, problem on the handle call. We have a badmatch...
The terminate function is called. The server exits and it gets unregistered.
The Supervisor (77) receive the exit signal from the gen_server and it does its job:
Well, it tries... Since it happens what Filippo said...
另一方面,如果必须从控制台内测试重新启动策略,请使用控制台启动主管并使用 pman 检查以终止进程。
您会看到 pman 使用相同的管理程序 Pid 进行刷新,但使用不同的工作程序 Pid,具体取决于您在重启策略中设置的 MaxR 和 MaxT。
On the other hand, if at all restart-strategy has to be tested from within console, use console to start the supervisor and check with pman to kill the process.
You would see that pman refreshes with same supervisor Pid but with different worker Pids depending upon the MaxR and MaxT you have set in restart-strategy.