主管在重新启动子进程时是否会阻止调用?

发布于 2024-10-27 16:29:51 字数 980 浏览 2 评论 0原文

我试图了解这里发生的情况:

我有一个主管循环重新启动一个客户端,而不触发 MaxR、MaxT 机制。客户端崩溃的速度足够慢,永远不会触发速率限制。

还有另一种机制,使用supervisor:which_children/1和delete_child/2, start_child/2来使子组适应现实(它扫描USB设备尝试找到的每台设备有一个主管子级)。

这通常表现为速率限制的安全网,但奇怪的是,看起来删除和启动子项的机制根本没有被调用。

为了了解发生了什么,我从 shell 调用了 supervisor:which_children/1 ,看起来该调用只是阻塞并且永远不会返回。

当主管忙于重新启动子进程时,对主管的调用是否会被阻止?

附录:

看起来崩溃发生在子进程启动期间:

=SUPERVISOR REPORT==== 29-Mar-2011::21:36:20 ===
     Supervisor: {local,gateway_sup}
     Context:    start_error
     Reason:     {'EXIT',{timeout,{gen_server,call,[<0.155.0>,late_init]}}}
     Offender:   [{pid,<0.76.0>},
              {name,gw_3_5},
              {mfa,{channel,start_link,
                            [[{gateways,[{left,108},{right,103}]}],
                             {3,5}]}},
              {restart_type,transient},
              {shutdown,10000},
              {child_type,worker}]

I'm trying to understand what's happening here:

I have a supervisor that is cyclically restarting one client without triggering the MaxR, MaxT mechanism. The client just crashes slowly enough never to trigger the rate limitation.

There would have been another mechanism that uses supervisor:which_children/1 and delete_child/2, start_child/2 to adapt the set of children to reality (its scanning for USB devices trying to have one supervisor child per device found).

This would normally behave like a safety net to the rate limitation, but strangely it looks like the mechanism that deletes and starts children is not called at all.

To find out what's going on I called supervisor:which_children/1 from the shell and it looks like the call just blocks and never returns.

Can it be that calls to the supervisor are blocked while it is busy trying to restart a child?

Addendum:

it looks like the crash happens during child start:

=SUPERVISOR REPORT==== 29-Mar-2011::21:36:20 ===
     Supervisor: {local,gateway_sup}
     Context:    start_error
     Reason:     {'EXIT',{timeout,{gen_server,call,[<0.155.0>,late_init]}}}
     Offender:   [{pid,<0.76.0>},
              {name,gw_3_5},
              {mfa,{channel,start_link,
                            [[{gateways,[{left,108},{right,103}]}],
                             {3,5}]}},
              {restart_type,transient},
              {shutdown,10000},
              {child_type,worker}]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟花易冷人易散 2024-11-03 16:29:51

除了讨论之外,问题的答案是:

当重新启动在启动期间失败的子进程时,主管在其进程内循环(内部是 gen_server),不处理对其的任何 API 调用。

因此,如果将 Supervisor 的速率限制配置为不会在子进程的启动错误时触发,那就特别糟糕了。在我的示例中,我的启动速度很慢(尤其是在出错时)。

因此,如果主管永远循环尝试重新启动子进程,则无法对其进行任何调用……这通常是不好的。

The answer to the question besides the discussion is:

When restarting a child that fails during startup the supervisor loops inside its process (it is a gen_server internally) not handling any API calls to it.

So it is especially bad if the rate limitation of the supervisor is configured that it will not trigger on startup errors of the children. I have a slow startup (especially on error) in my example.

So if the supervisor loops forever trying to restart a child it is not reachable for any calls to it ... which is usually bad.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文