避免 fork()/SIGCHLD 竞争条件
请考虑以下 fork()
/SIGCHLD
伪代码。
// main program excerpt
for (;;) {
if ( is_time_to_make_babies ) {
pid = fork();
if (pid == -1) {
/* fail */
} else if (pid == 0) {
/* child stuff */
print "child started"
exit
} else {
/* parent stuff */
print "parent forked new child ", pid
children.add(pid);
}
}
}
// SIGCHLD handler
sigchld_handler(signo) {
while ( (pid = wait(status, WNOHANG)) > 0 ) {
print "parent caught SIGCHLD from ", pid
children.remove(pid);
}
}
在上面的例子中存在一个竞争条件。 “/* child stuff */
”有可能在“/*parent stuff */
”开始之前完成,这可能会导致子进程的 pid 被添加到列表中退出后的孩子,并且永远不会被删除。 当应用程序关闭时,父级将无休止地等待已经完成的子级完成。
我能想到的解决这个问题的一个解决方案是有两个列表:started_children
和 finished_children
。 我将在现在添加到 children
的同一位置添加到 started_children
。 但在信号处理程序中,我不是从 children
中删除,而是添加到 finished_children
。 当应用关闭时,父级只需等待,直到 started_children
和 finished_children
之间的差异为零。
我能想到的另一个可能的解决方案是使用共享内存,例如共享父级的子级列表并让子级自己.add
和.remove
? 但我对此了解不多。
编辑:另一个可能的解决方案,这是首先想到的,是简单地在 /* child stuff */
的开头添加一个 sleep(1)
但这对我来说很奇怪,这就是我把它排除在外的原因。 我什至不确定这是否是 100% 修复。
那么,您将如何纠正这种竞争条件? 如果有一个完善的推荐模式,请告诉我!
谢谢。
Please consider the following fork()
/SIGCHLD
pseudo-code.
// main program excerpt
for (;;) {
if ( is_time_to_make_babies ) {
pid = fork();
if (pid == -1) {
/* fail */
} else if (pid == 0) {
/* child stuff */
print "child started"
exit
} else {
/* parent stuff */
print "parent forked new child ", pid
children.add(pid);
}
}
}
// SIGCHLD handler
sigchld_handler(signo) {
while ( (pid = wait(status, WNOHANG)) > 0 ) {
print "parent caught SIGCHLD from ", pid
children.remove(pid);
}
}
In the above example there's a race-condition. It's possible for "/* child stuff */
" to finish before "/* parent stuff */
" starts which can result in a child's pid being added to the list of children after it's exited, and never being removed. When the time comes for the app to close down, the parent will wait endlessly for the already-finished child to finish.
One solution I can think of to counter this is to have two lists: started_children
and finished_children
. I'd add to started_children
in the same place I'm adding to children
now. But in the signal handler, instead of removing from children
I'd add to finished_children
. When the app closes down, the parent can simply wait until the difference between started_children
and finished_children
is zero.
Another possible solution I can think of is using shared-memory, e.g. share the parent's list of children and let the children .add
and .remove
themselves? But I don't know too much about this.
EDIT: Another possible solution, which was the first thing that came to mind, is to simply add a sleep(1)
at the start of /* child stuff */
but that smells funny to me, which is why I left it out. I'm also not even sure it's a 100% fix.
So, how would you correct this race-condition? And if there's a well-established recommended pattern for this, please let me know!
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
最简单的解决方案是在
fork()
之前使用sigprocmask()
阻止 SIGCHLD 信号,并在处理 pid 后在父代码中取消阻止它。如果孩子死了,SIGCHLD 的信号处理程序将在您解除信号阻塞后被调用。 这是一个关键部分的概念 - 在您的情况下,关键部分在
fork()
之前开始,在children.add()
之后结束。Simplest solution would be to block SIGCHLD signal before
fork()
withsigprocmask()
and unblock it in parent code after you have processed the pid.If child died, signal handler for SIGCHLD will be called after you unblock the signal. It is a critical section concept - in your case critical section starts before
fork()
and ends afterchildren.add()
.如果您不能使用关键片段,也许一个简单的计数器可以完成这项工作。 添加时+1,删除时-1,无论哪一个先发生,当一切完成后你最终都会得到零。
If you can't use critical fragment, maybe a simple counter can do this job. +1 when add, -1 when remove, no mater which one happen first, you eventually can get zero when all is done.
除了现有的“儿童”之外添加一个新的数据结构“早期死亡”。 这将使儿童的物品保持清洁。
编辑:如果你的进程是单线程的,这可以简化——earlyDeaths 不必是一个容器,它只需要保存一个 pid 即可。
In addition to the existing "children" add a new data structure "early deaths". This will keep the contents of children clean.
EDIT: this can be simplified if your process is single threaded -- earlyDeaths doesn't have to be a container, it just has to hold one pid.
也许是乐观算法? 尝试children.remove(pid),如果失败,继续生活。
或者在尝试删除之前检查 pid 是否在子进程中?
Maybe an optimistic algorithm? Try children.remove(pid), and if it fails, move on with life.
Or check that pid is in children before trying to remove it?