Solaris 10:快速检测 SIGCHLD/进程退出
在 Solaris 10 上,我有一个父进程和子进程。我用kill -KILL 杀死子进程。我希望在父进程中尽可能快地检测到这一点(这是一个主/从系统,目标是让父进程请求其备份尽快接管)。父进程需要知道子进程已经开始退出(不需要等到子进程退出)。
在我正在使用的系统中,我看到发送 SIGKILL 和父进程接收 SIGCHLD 之间有大约 200 毫秒的延迟。我认为我无法减少这个时间,仅仅是因为子进程的大小和退出所需的时间 - 如果我错了,请纠正我。
我想我的选择是: -- 不要向子进程发送 SIGKILL。相反,向父进程发送信号,以便它可以杀死子进程(因此立即知道子进程正在终止)。这并不理想,因为某些“kill -KILL”命令超出了我的控制范围,因此我无法将它们替换为发送给父级的不同信号。 -- 挂钩子进程的终止处理(我认为这是不可能的,因为无法捕获 SIGKILL)。 ——还有其他想法吗?
感谢您的任何建议。 尼克B
On Solaris 10, I have a parent and child process. I kill the child process with kill -KILL. I want the fastest possible detection of this in the parent process (this is a master/slave system and the goal is for the parent to request its backup to take over as fast as possible). The parent process needs to know that the child has started to exit (it doesn't need to wait until the child has exited).
In the system I'm working with I see a delay of about 200ms between sending the SIGKILL and the parent process receiving the SIGCHLD. I don't think I can reduce this time, simply because of the size of the child process and the time it takes to exit - correct me if I am wrong.
I think my options are:
-- Don't send SIGKILL to the child. Send a signal to the parent instead, so that it can kill the child (and therefore knows instantly that the child process is being terminated). This is not ideal because some of the "kill -KILL" commands are out of my control so I can't replace them with a different signal to the parent.
-- Hook into the termination processing on the child (I don't think this is possible because SIGKILL can't be caught).
-- Any other ideas?
Thanks for any advice.
NickB
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我不确定您是否会比 SIGCHLD 的传递快得多。如果可能的话,您可能需要考虑将应用程序重新架构为主/多从应用程序。
如果您使用 1 个主服务器和 5 个从服务器运行,那么丢失 1 个从服务器将导致容量下降 20%,而不是完全丢失。希望主人能够在注意到之前足够快地让另一个奴隶站起来。
这样做的另一个可能的优点是有备用的奴隶在等待,已经开始但等待信号量或其他信号来开始做真正的工作。即使您无法并排运行多个从属设备,这也可能会有所帮助,因为它将消除至少部分延迟(等待进程加载)。只要 SIGCHLD 出现,就示意空闲的孩子开始。
I'm not sure you're going to get much faster than the delivery of SIGCHLD. You may want to think about re-architecting the application to be a master/multi-slave one, if possible.
If you're running with one master and five slaves, then the loss of one slave will result in a 20% drop in capacity rather than total loss. And hopefully the master can get another slave up quickly enough before it's noticed.
Another possible advantage to this is to have spare slaves waiting in the wings, already started but waiting on a semaphore or other signal to start doing the real work. It's possible that this may help even if you can't run multiple slaves side-by-side since it will remove at least part of the delay (waiting for the process to load up). Simply signal a spare child to start as soon as the SIGCHLD appears.
这是一个猜测,但是父进程是如何检测到 SIGCHLD 的呢?如果您使用信号处理程序,则可以通过使用专用信号线程来获得一些速度。
基本上,您启动一个单独的线程来处理信号。所有线程(包括信号线程)都应调用 pthread_sigmask() 来阻止接收 SIGCHLD。然后信号线程使用包含 SIGCHLD 的掩码调用 sigwait()。 sigwait() 将阻塞直到收到 SIGCHLD,然后在收到信号时返回。
使用信号线程的主要优点是您可以在某种主循环中处理信号,而不受信号处理程序的限制或让信号中断进程可能正在执行的其他操作。我的疯狂猜测是,内核使用此方法向线程传递信号可能也更便宜。
This is a guess, but how is the parent process detecting the SIGCHLD? If you're using a signal handler, you might be able to gain some speed by using a dedicated signal thread.
Basically, you start a separate thread to process the signal. All threads (including the signal thread) should call
pthread_sigmask()
to block receipt of SIGCHLD. The signal thread then callssigwait()
with a mask including SIGCHLD. sigwait() will block until a SIGCHLD is received, and then return when the signal is received.The main advantage of using a signal thread is that you can process the signals in a main loop of some kind, without the limitations of a signal handler or having the signal interrupt something else the process may be doing. My wild guess is that might also be cheaper for the kernel to deliver a signal to a thread using this method.
您可以使用 Solaris 中不太为人所知的功能 doors< /a>.在您的父进程中,通过带有
DOOR_UNREF
属性的door_create
创建门,这意味着:然后分叉,这样你就有了两个对门描述符的引用。当您的子进程终止时,父进程中会调用门函数,因为门的描述符引用下降为 1。
Solaris 门的设计速度非常快,但说实话,我从未测量过这种特殊情况下的交货时间。让我知道,如果它对你有用。
you can use not so widely know feature of Solaris doors. In your parent process, create door by
door_create
withDOOR_UNREF
attribute, which means:Then fork, so you have two references to the door's descriptor . When your child process dies, a door function is called in the parent process, because the door's descriptor references drops to one.
Solaris doors are meant to be super fast, but honestly, I never measured a delivery time in this particular case. Let me know, if it works for you.
您可以使用 waitpid() 或 waitid() 来检测子进程状态的变化。无论如何,您应该调用其中之一来获取子进程的 pid...
然后您可以忽略 SIGCHLD,并获得避免异步编码的额外好处。
paxdiablo 使用信号量的建议实际上也可能是您想要的:启动时,子进程锁定信号量。如果您运行两个子进程,则其中一个将运行,另一个将等待信号量。一旦第一个被杀死,第二个就开始运行。
Rather than using signals to catch the child being killed, you could use waitpid() or waitid() to detect the change of state of the child process. You should be calling one of these in any case to reap the child's pid...
You can then ignore SIGCHLD, and have the added bonus of avoiding asynchronous coding.
paxdiablo's suggestion of using semaphores may also actually be what you want: On startup, a child locks a semaphore. If you run two child processes, then one will run and one will be waiting on the semaphore. Once the first is killed, the second starts running.