Elixir主管在DNS超时之后不重新启动定时的Poolboy Genserver

发布于 2025-01-23 09:13:16 字数 2522 浏览 0 评论 0原文

我正在尝试将Poolboy用于工人池,以提出大量DNS请求。在其中一些DNS请求中,DNS查询时间出现了,这引发了错误并终止了Genserver工人:

07:44:29.585 [error] GenServer #PID<0.382.0> terminating
** (Socket.Error) timeout
    (socket 0.3.13) lib/socket/datagram.ex:46: Socket.Datagram.recv!/2
    (dns 2.3.0) lib/dns.ex:76: DNS.query/4
    (dmarc_hijack 0.1.0) lib/dmarc.ex:5: Dmarc.fetch_dmarc_record/1
    (dmarc_hijack 0.1.0) lib/dmarc_hijack/worker.ex:16: DmarcHijack.Worker.handle_call/3
    (stdlib 3.17.1) gen_server.erl:721: :gen_server.try_handle_call/4
    (stdlib 3.17.1) gen_server.erl:750: :gen_server.handle_msg/6
    (stdlib 3.17.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.717.0>): {:fetch_process_dmarc, "12580.tv"}
State: nil
Client #PID<0.717.0> is dead

最终,这导致了我所有的Poolboy工人被杀害,而主管似乎并没有重新启动工人Genservers。应用功能就停止了,因为没有更多的工人,但是执行不会停止。

我正在尝试/捕获池任务中的错误以及DNS客户端:

Poolboy任务:

  defp setup_task(domain) do
    Task.async(fn ->
      :poolboy.transaction(
        :worker,
        fn pid ->
          try do
            GenServer.call(pid, {:fetch_process_dmarc, domain})
          catch :exit, reason ->
            # Handle timeout
            Logger.warning("Probably just got a timeout on #{domain}. Real reason follows:")
            Logger.warning(inspect(reason))
            {domain, {:error, :timeout}}
          end
        end,
        @timeout
      )
    end)
  end

DNS查询代码:

defmodule Dmarc do
  def fetch_dmarc_record(domain) do
    try do
      DNS.query("_dmarc.#{domain}", :txt, {select_random_dns_server(), 53})
      |> extract_dmarc_record_from_txt()
    catch error ->
        Logger.error(error)
        {:error, :timeout}

    end

  end

对我来说最有意义的是,我应该在制作DNS时处理DNS查询超时查询,但没有被Try/Catch Block处理。我认为这是因为recv!在超时上调用恐慌,绕过我的try/catch块,但在这里我可能错了。

基于我的理解,主管应重新启动终止的Genservers,但无论出于何种原因,他们都从未重新启动超时。

带有主管详细信息的应用程序配置,

defmodule DmarcHijack.Application do
  use Application

  defp poolboy_config do
    [
      name: {:local, :worker},
      worker_module: DmarcHijack.Worker,
      size: 5,
      max_overflow: 5
    ]
  end

  @impl true
  def start(_type, _args) do
    children = [
      DmarcHijack.ResultsBucket,
      :poolboy.child_spec(:worker, poolboy_config())

    ]

    opts = [strategy: :one_for_one, name: DmarcHijack.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

我非常感谢可用于调试此问题的任何帮助。谢谢!

I'm trying to use Poolboy for a worker pool to make a large number of DNS requests. On some of these DNS requests, the DNS query times out, which throws an error and terminates the GenServer worker:

07:44:29.585 [error] GenServer #PID<0.382.0> terminating
** (Socket.Error) timeout
    (socket 0.3.13) lib/socket/datagram.ex:46: Socket.Datagram.recv!/2
    (dns 2.3.0) lib/dns.ex:76: DNS.query/4
    (dmarc_hijack 0.1.0) lib/dmarc.ex:5: Dmarc.fetch_dmarc_record/1
    (dmarc_hijack 0.1.0) lib/dmarc_hijack/worker.ex:16: DmarcHijack.Worker.handle_call/3
    (stdlib 3.17.1) gen_server.erl:721: :gen_server.try_handle_call/4
    (stdlib 3.17.1) gen_server.erl:750: :gen_server.handle_msg/6
    (stdlib 3.17.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.717.0>): {:fetch_process_dmarc, "12580.tv"}
State: nil
Client #PID<0.717.0> is dead

Eventually, this leads to all of my Poolboy workers getting killed, and the Supervisor does not appear to restart the Worker GenServers. Application functionality then ceases as there are no more workers, but execution does not halt.

I'm try/catch-ing errors in the Poolboy task as well as the DNS client:

Poolboy task:

  defp setup_task(domain) do
    Task.async(fn ->
      :poolboy.transaction(
        :worker,
        fn pid ->
          try do
            GenServer.call(pid, {:fetch_process_dmarc, domain})
          catch :exit, reason ->
            # Handle timeout
            Logger.warning("Probably just got a timeout on #{domain}. Real reason follows:")
            Logger.warning(inspect(reason))
            {domain, {:error, :timeout}}
          end
        end,
        @timeout
      )
    end)
  end

DNS query code:

defmodule Dmarc do
  def fetch_dmarc_record(domain) do
    try do
      DNS.query("_dmarc.#{domain}", :txt, {select_random_dns_server(), 53})
      |> extract_dmarc_record_from_txt()
    catch error ->
        Logger.error(error)
        {:error, :timeout}

    end

  end

It makes the most sense to me that I should be handling the DNS query timeout at the point of making that DNS query, but it's not getting handled by the try/catch block. I think this is happening because the recv! call panics on a timeout, bypassing my try/catch block but I could be wrong here.

Based on my understanding, the supervisor should re-start the terminated GenServers but for whatever reason once they terminate from the timeout they are never restarted.

Application config with Supervisor details

defmodule DmarcHijack.Application do
  use Application

  defp poolboy_config do
    [
      name: {:local, :worker},
      worker_module: DmarcHijack.Worker,
      size: 5,
      max_overflow: 5
    ]
  end

  @impl true
  def start(_type, _args) do
    children = [
      DmarcHijack.ResultsBucket,
      :poolboy.child_spec(:worker, poolboy_config())

    ]

    opts = [strategy: :one_for_one, name: DmarcHijack.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

I'd really appreciate any help available to debug this issue. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

香橙ぽ 2025-01-30 09:13:16

对于处理与我同一问题的任何人,我通过以下内容解决了此问题:

  1. catchrescue> rec> rec> rec用于DNS查询
  2. 设置Poolboy的超时值到:Infinite因为DNS已经处理了超时。

我很确定这不是最好的解决方案,但对我有用。

For anyone who's dealing with the same issue that I am, I resolved this issue by doing the following:

  1. Replaced the catch with rescue for the DNS query
  2. Set the Timeout value for Poolboy to :infinite since the timeout is being handled already by DNS.

I'm pretty sure this isn't the best solution, but it worked for me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文