避免 fork()/SIGCHLD 竞争条件

发布于 2024-07-10 20:37:13 字数 1511 浏览 9 评论 0原文

请考虑以下 fork()/SIGCHLD 伪代码。

  // main program excerpt
    for (;;) {
      if ( is_time_to_make_babies ) {

        pid = fork();
        if (pid == -1) {
          /* fail */
        } else if (pid == 0) {
          /* child stuff */
          print "child started"
          exit
        } else {
          /* parent stuff */
          print "parent forked new child ", pid
          children.add(pid);
        }

      }
    }

  // SIGCHLD handler
  sigchld_handler(signo) {
    while ( (pid = wait(status, WNOHANG)) > 0 ) {
      print "parent caught SIGCHLD from ", pid
      children.remove(pid);
    }
  }

在上面的例子中存在一个竞争条件。 “/* child stuff */”有可能在“/*parent stuff */”开始之前完成,这可能会导致子进程的 pid 被添加到列表中退出后的孩子,并且永远不会被删除。 当应用程序关闭时,父级将无休止地等待已经完成的子级完成。

我能想到的解决这个问题的一个解决方案是有两个列表:started_childrenfinished_children。 我将在现在添加到 children 的同一位置添加到 started_children。 但在信号处理程序中,我不是从 children 中删除,而是添加finished_children。 当应用关闭时,父级只需等待,直到 started_childrenfinished_children 之间的差异为零。

我能想到的另一个可能的解决方案是使用共享内存,例如共享父级的子级列表并让子级自己.add.remove? 但我对此了解不多。

编辑:另一个可能的解决方案,这是首先想到的,是简单地在 /* child stuff */ 的开头添加一个 sleep(1) 但这对我来说很奇怪,这就是我把它排除在外的原因。 我什至不确定这是否是 100% 修复。

那么,您将如何纠正这种竞争条件? 如果有一个完善的推荐模式,请告诉我!

谢谢。

Please consider the following fork()/SIGCHLD pseudo-code.

  // main program excerpt
    for (;;) {
      if ( is_time_to_make_babies ) {

        pid = fork();
        if (pid == -1) {
          /* fail */
        } else if (pid == 0) {
          /* child stuff */
          print "child started"
          exit
        } else {
          /* parent stuff */
          print "parent forked new child ", pid
          children.add(pid);
        }

      }
    }

  // SIGCHLD handler
  sigchld_handler(signo) {
    while ( (pid = wait(status, WNOHANG)) > 0 ) {
      print "parent caught SIGCHLD from ", pid
      children.remove(pid);
    }
  }

In the above example there's a race-condition. It's possible for "/* child stuff */" to finish before "/* parent stuff */" starts which can result in a child's pid being added to the list of children after it's exited, and never being removed. When the time comes for the app to close down, the parent will wait endlessly for the already-finished child to finish.

One solution I can think of to counter this is to have two lists: started_children and finished_children. I'd add to started_children in the same place I'm adding to children now. But in the signal handler, instead of removing from children I'd add to finished_children. When the app closes down, the parent can simply wait until the difference between started_children and finished_children is zero.

Another possible solution I can think of is using shared-memory, e.g. share the parent's list of children and let the children .add and .remove themselves? But I don't know too much about this.

EDIT: Another possible solution, which was the first thing that came to mind, is to simply add a sleep(1) at the start of /* child stuff */ but that smells funny to me, which is why I left it out. I'm also not even sure it's a 100% fix.

So, how would you correct this race-condition? And if there's a well-established recommended pattern for this, please let me know!

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

对你而言 2024-07-17 20:37:13

最简单的解决方案是在 fork() 之前使用 sigprocmask() 阻止 SIGCHLD 信号,并在处理 pid 后在父代码中取消阻止它。

如果孩子死了,SIGCHLD 的信号处理程序将在您解除信号阻塞后被调用。 这是一个关键部分的概念 - 在您的情况下,关键部分在 fork() 之前开始,在 children.add() 之后结束。

Simplest solution would be to block SIGCHLD signal before fork() with sigprocmask() and unblock it in parent code after you have processed the pid.

If child died, signal handler for SIGCHLD will be called after you unblock the signal. It is a critical section concept - in your case critical section starts before fork() and ends after children.add().

却一份温柔 2024-07-17 20:37:13

如果您不能使用关键片段,也许一个简单的计数器可以完成这项工作。 添加时+1,删除时-1,无论哪一个先发生,当一切完成后你最终都会得到零。

If you can't use critical fragment, maybe a simple counter can do this job. +1 when add, -1 when remove, no mater which one happen first, you eventually can get zero when all is done.

汹涌人海 2024-07-17 20:37:13

除了现有的“儿童”之外添加一个新的数据结构“早期死亡”。 这将使儿童的物品保持清洁。

  // main program excerpt
    for (;;) {
      if ( is_time_to_make_babies ) {

        pid = fork();
        if (pid == -1) {
          /* fail */
        } else if (pid == 0) {
          /* child stuff */
          print "child started"
          exit
        } else {
          /* parent stuff */
          print "parent forked new child ", pid
          if (!earlyDeaths.contains(pid)) {
              children.add(pid);
          } else {
              earlyDeaths.remove(pid);
          }
        }

      }
    }

  // SIGCHLD handler
  sigchld_handler(signo) {
    while ( (pid = wait(status, WNOHANG)) > 0 ) {
      print "parent caught SIGCHLD from ", pid
      if (children.contains(pid)) {
          children.remove(pid);
      } else {
          earlyDeaths.add(pid);
      }
    }
  }

编辑:如果你的进程是单线程的,这可以简化——earlyDeaths 不必是一个容器,它只需要保存一个 pid 即可。

In addition to the existing "children" add a new data structure "early deaths". This will keep the contents of children clean.

  // main program excerpt
    for (;;) {
      if ( is_time_to_make_babies ) {

        pid = fork();
        if (pid == -1) {
          /* fail */
        } else if (pid == 0) {
          /* child stuff */
          print "child started"
          exit
        } else {
          /* parent stuff */
          print "parent forked new child ", pid
          if (!earlyDeaths.contains(pid)) {
              children.add(pid);
          } else {
              earlyDeaths.remove(pid);
          }
        }

      }
    }

  // SIGCHLD handler
  sigchld_handler(signo) {
    while ( (pid = wait(status, WNOHANG)) > 0 ) {
      print "parent caught SIGCHLD from ", pid
      if (children.contains(pid)) {
          children.remove(pid);
      } else {
          earlyDeaths.add(pid);
      }
    }
  }

EDIT: this can be simplified if your process is single threaded -- earlyDeaths doesn't have to be a container, it just has to hold one pid.

茶色山野 2024-07-17 20:37:13

也许是乐观算法? 尝试children.remove(pid),如果失败,继续生活。

或者在尝试删除之前检查 pid 是否在子进程中?

Maybe an optimistic algorithm? Try children.remove(pid), and if it fails, move on with life.

Or check that pid is in children before trying to remove it?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文