设计一个监控进程,用于监控和重启进程
我正在设计一个监控流程。监视进程的工作是监视几组已配置的进程。当monitor进程检测到某个进程宕机时,需要重新启动该进程。
我正在为我的Linux系统开发代码。这是我开发一个小型原型的方法 - 提供有关需要监视的各种进程的详细信息(路径、参数)。 - 监控进程执行以下操作: 1.为SIGCHLD安装信号处理程序 2. fork和execv启动要监控的进程。存储子进程的pid。 3. 当子进程宕机时,父进程收到 SIGCHLD 4. 现在将调用信号处理程序。处理程序将对之前存储的 pid 列表运行 for 循环。对于每个 pid,它将检查 /proc 文件系统是否存在与 pid 对应的目录。如果该目录不存在,则重新启动该进程。
现在,我的问题是这样的 - 上述方法(检查 /proc 文件系统)是检查进程是否正在运行的标准或推荐机制,还是应该执行类似为 ps 命令创建管道并循环 ps 输出的操作? - 有没有更好的方法来实现我的要求?
问候。
I am designing a monitor process. The job of the monitor process is to monitor a few set of configured processes. When the monitor process detects that a process has gone down, it needs to restart the process.
I am developing the code for my linux system. Here is how I developed a small prototype
- Fed the details(path, arguments) about the various processes that need to be monitored. - The monitor process did the following:
1. Installed a signal handler for SIGCHLD
2. A fork and execv to start the process to be monitored. Store the pid of the child processes.
3. When a child went down, the parent recevies a SIGCHLD
4. The signal handler will now be called. The handler will run a for loop on the list of pids stored earlier. For each pid, it will check the /proc filesystem for existence of a directory corresponding to the pid. If the directory doesn't exist, the process is restarted.
Now, my question is this
- Is the above method (to check the /proc filesystem) a standard or recommended mechanism of checking if a process is running or should I do something like creating a pipe for the ps command and looping through the output of ps ?
- Is there a better way of achieving my requirement?
Regards.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您不应该检查
/proc
来确定哪个进程已退出 - 另一个不相关的进程可能同时启动并被巧合地分配了相同的 PID。相反,在您的
SIGCHLD
处理程序中,您应该在循环中使用waitpid()
系统调用,例如:(需要循环,因为多个子进程可能会在短时间内退出一段时间,但可能只产生一个 SIGCHLD)。
You should not be checking
/proc
to determine which process has exited - it's possible for another, unrelated, process to start in the meantime and be coincidentally assigned the same PID.Instead, within your
SIGCHLD
handler you should use thewaitpid()
system call, in a loop such as:(The loop is needed because multiple child processes may exit within a short period of time, but only one SIGCHLD may result).
让我们看看我是否理解了你的意思。您有一个子级列表,并且正在 SIGCLD 处理程序上的 /proc 上运行循环以查看哪些子级仍然活着,不是吗?
这不是很常见,...而且很丑陋,
您通常做的是在 SIGCLD 上运行
while((pid = waitpid(-1, &status, WNOHANG)))
循环处理程序,并使用返回的 pid 和 Wxxx 宏来保持您的子列表最新。请注意,
wait()
和waitpid()
是异步信号安全的。您调用来检查/proc
的函数可能不是。Let's see if I've understood you. You have a list of children and you are running a loop on /proc on your SIGCLD handler to see which children are still alive, isn't it?
That's not very usual,... and it's a but ugly,
What you usually do is run a
while((pid = waitpid(-1, &status, WNOHANG)))
loop on your SIGCLD handler, and use the returned pid and the Wxxx macros to maintain your children list up to date.Notice that
wait()
andwaitpid()
are async-signal-safe. The functions you are calling to examine/proc
are probably not.查看 supervisord。效果很好。
Look into supervisord. It works great.
您可以通过对其 pid 发出
kill()
系统调用来轻松判断进程是否处于活动状态。如果孩子不活着,kill()
将不会成功。此外,如果进程仍处于活动状态,则使用
WNOHANG
选项调用waitpid()
将立即返回零。恕我直言,读取 proc 文件或通过管道传输到 ps 是一种令人讨厌的方法。
You can easily tell if a process is alive by issuing a
kill()
system call to its pid. If the child is not alive,kill()
will not succeed.Also, calling
waitpid()
with theWNOHANG
option will return zero immediately if the process is still alive.IMHO, reading proc files or piping to ps is a nasty way to do it.