Elixir主管在DNS超时之后不重新启动定时的Poolboy Genserver
我正在尝试将Poolboy用于工人池,以提出大量DNS请求。在其中一些DNS请求中,DNS查询时间出现了,这引发了错误并终止了Genserver工人:
07:44:29.585 [error] GenServer #PID<0.382.0> terminating
** (Socket.Error) timeout
(socket 0.3.13) lib/socket/datagram.ex:46: Socket.Datagram.recv!/2
(dns 2.3.0) lib/dns.ex:76: DNS.query/4
(dmarc_hijack 0.1.0) lib/dmarc.ex:5: Dmarc.fetch_dmarc_record/1
(dmarc_hijack 0.1.0) lib/dmarc_hijack/worker.ex:16: DmarcHijack.Worker.handle_call/3
(stdlib 3.17.1) gen_server.erl:721: :gen_server.try_handle_call/4
(stdlib 3.17.1) gen_server.erl:750: :gen_server.handle_msg/6
(stdlib 3.17.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.717.0>): {:fetch_process_dmarc, "12580.tv"}
State: nil
Client #PID<0.717.0> is dead
最终,这导致了我所有的Poolboy工人被杀害,而主管似乎并没有重新启动工人Genservers。应用功能就停止了,因为没有更多的工人,但是执行不会停止。
我正在尝试/捕获池任务中的错误以及DNS客户端:
Poolboy任务:
defp setup_task(domain) do
Task.async(fn ->
:poolboy.transaction(
:worker,
fn pid ->
try do
GenServer.call(pid, {:fetch_process_dmarc, domain})
catch :exit, reason ->
# Handle timeout
Logger.warning("Probably just got a timeout on #{domain}. Real reason follows:")
Logger.warning(inspect(reason))
{domain, {:error, :timeout}}
end
end,
@timeout
)
end)
end
DNS查询代码:
defmodule Dmarc do
def fetch_dmarc_record(domain) do
try do
DNS.query("_dmarc.#{domain}", :txt, {select_random_dns_server(), 53})
|> extract_dmarc_record_from_txt()
catch error ->
Logger.error(error)
{:error, :timeout}
end
end
对我来说最有意义的是,我应该在制作DNS时处理DNS查询超时查询,但没有被Try/Catch Block处理。我认为这是因为recv!
在超时上调用恐慌,绕过我的try/catch块,但在这里我可能错了。
基于我的理解,主管应重新启动终止的Genservers,但无论出于何种原因,他们都从未重新启动超时。
带有主管详细信息的应用程序配置,
defmodule DmarcHijack.Application do
use Application
defp poolboy_config do
[
name: {:local, :worker},
worker_module: DmarcHijack.Worker,
size: 5,
max_overflow: 5
]
end
@impl true
def start(_type, _args) do
children = [
DmarcHijack.ResultsBucket,
:poolboy.child_spec(:worker, poolboy_config())
]
opts = [strategy: :one_for_one, name: DmarcHijack.Supervisor]
Supervisor.start_link(children, opts)
end
end
我非常感谢可用于调试此问题的任何帮助。谢谢!
I'm trying to use Poolboy for a worker pool to make a large number of DNS requests. On some of these DNS requests, the DNS query times out, which throws an error and terminates the GenServer worker:
07:44:29.585 [error] GenServer #PID<0.382.0> terminating
** (Socket.Error) timeout
(socket 0.3.13) lib/socket/datagram.ex:46: Socket.Datagram.recv!/2
(dns 2.3.0) lib/dns.ex:76: DNS.query/4
(dmarc_hijack 0.1.0) lib/dmarc.ex:5: Dmarc.fetch_dmarc_record/1
(dmarc_hijack 0.1.0) lib/dmarc_hijack/worker.ex:16: DmarcHijack.Worker.handle_call/3
(stdlib 3.17.1) gen_server.erl:721: :gen_server.try_handle_call/4
(stdlib 3.17.1) gen_server.erl:750: :gen_server.handle_msg/6
(stdlib 3.17.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.717.0>): {:fetch_process_dmarc, "12580.tv"}
State: nil
Client #PID<0.717.0> is dead
Eventually, this leads to all of my Poolboy workers getting killed, and the Supervisor does not appear to restart the Worker GenServers. Application functionality then ceases as there are no more workers, but execution does not halt.
I'm try/catch-ing errors in the Poolboy task as well as the DNS client:
Poolboy task:
defp setup_task(domain) do
Task.async(fn ->
:poolboy.transaction(
:worker,
fn pid ->
try do
GenServer.call(pid, {:fetch_process_dmarc, domain})
catch :exit, reason ->
# Handle timeout
Logger.warning("Probably just got a timeout on #{domain}. Real reason follows:")
Logger.warning(inspect(reason))
{domain, {:error, :timeout}}
end
end,
@timeout
)
end)
end
DNS query code:
defmodule Dmarc do
def fetch_dmarc_record(domain) do
try do
DNS.query("_dmarc.#{domain}", :txt, {select_random_dns_server(), 53})
|> extract_dmarc_record_from_txt()
catch error ->
Logger.error(error)
{:error, :timeout}
end
end
It makes the most sense to me that I should be handling the DNS query timeout at the point of making that DNS query, but it's not getting handled by the try/catch block. I think this is happening because the recv!
call panics on a timeout, bypassing my try/catch block but I could be wrong here.
Based on my understanding, the supervisor should re-start the terminated GenServers but for whatever reason once they terminate from the timeout they are never restarted.
Application config with Supervisor details
defmodule DmarcHijack.Application do
use Application
defp poolboy_config do
[
name: {:local, :worker},
worker_module: DmarcHijack.Worker,
size: 5,
max_overflow: 5
]
end
@impl true
def start(_type, _args) do
children = [
DmarcHijack.ResultsBucket,
:poolboy.child_spec(:worker, poolboy_config())
]
opts = [strategy: :one_for_one, name: DmarcHijack.Supervisor]
Supervisor.start_link(children, opts)
end
end
I'd really appreciate any help available to debug this issue. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于处理与我同一问题的任何人,我通过以下内容解决了此问题:
catch
用rescue> rec> rec> rec
用于DNS查询:Infinite
因为DNS已经处理了超时。我很确定这不是最好的解决方案,但对我有用。
For anyone who's dealing with the same issue that I am, I resolved this issue by doing the following:
catch
withrescue
for the DNS query:infinite
since the timeout is being handled already by DNS.I'm pretty sure this isn't the best solution, but it worked for me.