我是否应该担心进程组中的进程接收信号的顺序?
我想通过向进程组内的进程发送 SIGTERM
来终止进程组。这可以通过kill命令来完成,但是我找到的手册提供了一些关于它到底如何工作的细节:
int kill(pid_t pid, int sig);
...
If pid is less than -1, then sig is sent to every process in
the process group whose ID is -pid.
但是,信号将以什么顺序发送到形成组的进程?想象一下以下情况:组内的主进程和从进程之间设置了管道。如果从设备在处理 kill(-pid)
期间被杀死,而主设备仍未被杀死,则主设备可能会将此报告为内部故障(在收到子设备已死亡的通知后)。但是,我希望所有进程都明白这种终止是由其进程组外部的某些东西引起的。
我怎样才能避免这种混乱?我应该做的不仅仅是 kill(-pid,SIGTERM)
吗?或者它是由我不知道的操作系统的底层属性解决的?
注意,我不能修改组内进程的代码!
I want to terminate a process group by sending SIGTERM
to processes within it. This can be accomplished via the kill
command, but the manuals I found provide few details about how exactly it works:
int kill(pid_t pid, int sig);
...
If pid is less than -1, then sig is sent to every process in
the process group whose ID is -pid.
However, in which order will the signal be sent to the processes that form the group? Imagine the following situation: a pipe is set between master and slave processes in the group. If slave is killed during processing kill(-pid)
, while the master is still not, the master might report this as an internal failure (upon receiving notification that the child is dead). However, I want all processes to understand that such termination was caused by something external to their process group.
How can I avoid this confusion? Should I be doing something more than mere kill(-pid,SIGTERM)
? Or it is resolved by underlying properties of the OS, about which I'm not aware?
Note that I can't modify the code of the processes in the group!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
尝试将其分为三步:
第一个 SIGSTOP 应该将所有进程置于停止状态。他们无法捕获此信号,因此这应该停止整个进程组。
SIGTERM 将排队等候该进程,但我不相信它会被传递,因为进程已停止(这是来自内存,我目前找不到参考,但我相信这是真的)。
SIGCONT 将再次启动进程,从而允许传递 SIGTERM。如果从设备首先收到 SIGCONT,主设备可能仍会停止,因此不会注意到从设备离开。当主机收到 SIGCONT 时,后面会收到 SIGTERM,从而终止它。
我不知道这是否真的有效,并且它的实现可能取决于所有信号实际传递的时间(包括 SIGCHLD 到主进程),但它可能值得一试。
Try doing it as a three-step process:
The first SIGSTOP should put all the processes into a stopped state. They cannot catch this signal, so this should stop the entire process group.
The SIGTERM will be queued for the process but I don't believe it will be delivered, since the processes are stopped (this is from memory, and I can't currently find a reference but I believe it is true).
The SIGCONT will start the processes again, allowing the SIGTERM to be delivered. If the slave gets the SIGCONT first, the master may still be stopped so it will not notice the slave going away. When the master gets the SIGCONT, it will be followed by the SIGTERM, terminating it.
I don't know if this will actually work, and it may be implementation dependent on when all the signals are actually delivered (including the SIGCHLD to the master process), but it may be worth a try.
我的理解是,您不能依赖任何特定的信号传递顺序。
如果仅向主进程发送 TERM 信号,然后让主进程杀死其子进程,则可以避免此问题。
My understanding is that you cannot rely on any specific order of signal delivery.
You could avoid the issue if you send the TERM signal to the master process only, and then have the master kill its children.
即使所有不同类型的 UNIX 都承诺按特定顺序传递信号,调度程序仍可能决定在父代码之前运行关键的子进程代码。
甚至您的 STOP/TERM/CONT 序列也容易受到此影响。
恐怕您可能需要更复杂的东西。也许子进程可以捕获 SIGTERM,然后循环直到其父进程退出,然后再退出?如果您这样做,请务必添加超时。
Even if all the various varieties of UNIX would promise to deliver the signals in a particular order, the scheduler might still decide to run the critical child process code before the parent code.
Even your STOP/TERM/CONT sequence will be vulnerable to this.
I'm afraid you may need something more complicated. Perhaps the child process could catch the SIGTERM and then loop until its parent exits before it exits itself? Be sure and add a timeout if you do this.
未经测试:使用共享内存并放入某种“我们快死了”信号量,可以在 I/O 错误被视为真正错误之前检查该信号量。 mmap() 与 MAP_ANONYMOUS|MAP_SHARED 并确保它能够在您的
fork()
进程中继续存在。哦,一定要使用
易失性
关键字,否则你的信号量就会被优化掉。Untested: Use shared memory and put in some kind of "we're dying" semaphore, which may be checked before I/O errors are treated as real errors. mmap() with MAP_ANONYMOUS|MAP_SHARED and make sure it survives your way of
fork()
ing processes.Oh, and be sure to use the
volatile
keyword or your semaphore is optimized away.