系统架构：在 Web 应用程序后面设置后台任务的简单方法 - 它有效吗？

发布于 2024-08-29 10:52:33 字数 968 浏览 16 评论 0原文

我有一个 Django Web 应用程序，并且有一些应该在后台运行（或实际上：启动）的任务。

应用程序部署如下：

apache2-mpm-worker；
mod_wsgi 处于守护进程模式（1 个进程，15 个线程）。

后台任务有以下特点：

需要定期运行（每5分钟左右）；
它们需要应用程序上下文（即应用程序包需要在内存中可用）；
除了数据库访问之外，它们不需要任何输入，即可执行一些不太繁重的任务，例如发送电子邮件和更新数据库状态。

现在我认为解决这个问题的最简单方法就是简单地利用现有的应用程序进程（由 mod_wsgi 产生）。通过将任务作为应用程序的一部分实现并为其提供 HTTP 接口，我可以防止另一个将所有应用程序保存到内存中的进程的开销。可以设置一个简单的 cronjob，每 5 分钟向此 HTTP 接口发送一个请求，仅此而已。由于应用程序进程提供 15 个线程，并且任务非常轻量级并且每 5 分钟才运行一次，因此我认为它们不会妨碍 Web 应用程序面向用户的操作的性能。

然而......我做了一些在线研究，我发现没有人提倡这种方法。许多文章提出了一种基于成熟的消息传递组件的更加复杂的方法（例如 Celery，它使用 RabbitMQ）。虽然这很性感，但对我来说听起来有些过分了。一些文章建议设置一个 cronjob 来执行执行任务的脚本。但这也不是很有吸引力，因为它会创建一个新进程，将整个应用程序加载到内存中，执行一些微小的任务，然后再次销毁该进程。此过程每 5 分钟重复一次。听起来不像是一个优雅的解决方案。

因此，我正在寻找有关我建议的方法的一些反馈，如上一段之前的段落中所述。我的推理正确吗？我是否忽略了（潜在的）问题？我的应用程序性能不会受到阻碍的假设又如何呢？

原文

I have a Django web application and I have some tasks that should operate (or actually: be initiated) on the background.

The application is deployed as follows:

apache2-mpm-worker;
mod_wsgi in daemon mode (1 process, 15 threads).

The background tasks have the following characteristics:

they need to operate in a regular interval (every 5 minutes or so);
they require the application context (i.e. the application packages need to be available in memory);
they do not need any input other than database access, in order to perform some not-so-heavy tasks such as sending out e-mail and updating the state of the database.

Now I was thinking that the most simple approach to this problem would be simply to piggyback on the existing application process (as spawned by mod_wsgi). By implementing the task as part of the application and providing an HTTP interface for it, I would prevent the overhead of another process that is holding all of the application into memory. A simple cronjob can be setup that sends a request to this HTTP interface every 5 minutes and that would be it. Since the application process provides 15 threads and the tasks are quite lightweight and only running every 5 minutes, I figure they would not be hindering the performance of the web application's user-facing operations.

Yet... I have done some online research and I have seen nobody advocating this approach. Many articles suggest a significantly more complex approach based on a full-blown messaging component (such as Celery, which uses RabbitMQ). Although that's sexy, it sounds like overkill to me. Some articles suggest setting up a cronjob that executes a script which performs the tasks. But that doesn't feel very attractive either, as it results in creating a new process that loads the entire application into memory, performs some tiny task, and destroys the process again. And this is repeated every 5 minutes. Does not sound like an elegant solution.

So, I'm looking for some feedback on my suggested approach as described in the paragraph before the preceeding paragraph. Is my reasoning correct? Am I overlooking (potential) problems? What about my assumption that application's performance will not be impeded?

分享到QQ

分享到微博