在 Linux 上维护长时间运行的任务

发布于 2024-10-31 16:23:50 字数 382 浏览 1 评论 0原文

我的系统包含一个任务,该任务打开网络套接字,接收从网络推送的数据,对其进行处理,然后将其写入磁盘或根据消息对其他计算机执行 ping 操作。该任务旨在永久运行,并且该服务旨在让该任务始终运行。但有时会崩溃。

保持这样的任务活跃的最佳实践是什么?假设任务在我们重新启动之前最多可以停止 30 秒。

一些明显的想法包括设置一个看门狗进程来检查以确保该进程仍在运行。看门狗可以由cron触发。但它如何知道进程是否还活着呢?写一个pid文件? 触摸心跳文件?如果机器陷入困境,看门狗的运行速度快于心跳速度,理想的解决方案不会持续启动更多进程。

有没有标准的 Linux 工具可以做到这一点?我可以想象一个使用消息队列的解决方案,但我不确定这是否是一个好主意。

My system includes a task which opens a network socket, receives pushed data from the network, processes it, and writes it out to disk or pings other machines depending on the messages. This task is intended to run forever, and the service is designed to have this task always running. But sometimes it crashes.

What's the best practice for keeping a task like this alive? Assume it's okay for the task to be dead for up to 30 seconds before we restart it.

Some obvious ideas include having a watchdog process that checks to make sure the process is still running. Watchdog could be triggered by cron. But how does it know if the process is alive or not? Write a pidfile? touch a heartbeat file? An ideal solution wouldn't continuously spin up more processes if the machine gets bogged down to the point where the watchdog is running faster than the heartbeat.

Are there standard linux tools for this? I can imagine a solution that uses a message queue, but I'm not sure if that's a good idea or not.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

向日葵 2024-11-07 16:23:50

根据您希望监视的任务的性质,一种方法是编写一个简单的包装器以在 fork() 中启动您的任务。

然后,包装器任务可以对子任务执行 waitpid() 操作,并在子任务终止时重新启动它。

这确实取决于修改您希望运行的任务的源。

Depending on the nature of the task that you wish to monitor, one method is to write a simple wrapper to start up your task in a fork().

The wrapper task can then do a waitpid() on the child and restart it if it is terminated.

This does depend on modifying the source for the task that you wish to run.

瑕疵 2024-11-07 16:23:50

如果添加到 inittabsysvinit 将重新启动死亡的进程。

如果您担心进程冻结而不会崩溃并结束进程,则可以使用心跳并硬终止活动实例,让 init 重新启动它。

sysvinit will restart processes that die, if added to inittab.

If you're worried about the process freezing without crashing and ending the process, you can use a heartbeat and hard kill the active instance, letting init restart it.

最冷一天 2024-11-07 16:23:50

您可以将 monit守护进程。 *nix 世界中有很多用于此目的的工具。

You could use monit along with daemonize. There are lots of tools for this in the *nix world.

沒落の蓅哖 2024-11-07 16:23:50

Supervisor 正是为此任务而设计的。来自项目网站

Supervisor 是一个客户端/服务器系统,允许用户监视和控制类 UNIX 操作系统上的多个进程。

它作为守护进程 (supervisord) 运行,由命令行工具 supervisorctl 控制。配置文件包含它应该监视的程序列表以及其他设置。

选项的数量相当广泛,--查看文档 获取完整列表。在你的情况下,相关的配置部分可能是这样的:

[program:my-network-task]
command=/bin/my-network-task   # where your binary lives
autostart=true                 # start when supervisor starts?
autorestart=true               # restart automatically when stopped?
startsecs=10                   # consider start successful after how many secs?
startretries=3                 # try starting how many times?

我自己使用了 Supervisor,一旦一切设置完毕,它就工作得非常好。它需要 Python,这在大多数环境中应该不是什么大问题,但实际上可能是。

Supervisor was designed precisely for this task. From the project website:

Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.

It runs as a daemon (supervisord) controlled by a command line tool, supervisorctl. The configuration file contains a list of programs it is supposed to monitor, among other settings.

The number of options is quite extensive, -- have a look at the docs for a complete list. In your case, the relevant configuration section might be something like this:

[program:my-network-task]
command=/bin/my-network-task   # where your binary lives
autostart=true                 # start when supervisor starts?
autorestart=true               # restart automatically when stopped?
startsecs=10                   # consider start successful after how many secs?
startretries=3                 # try starting how many times?

I have used Supervisor myself and it worked really well once everything was set up. It requires Python, which should not be a big deal in most environments but might be.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文