系统架构:在 Web 应用程序后面设置后台任务的简单方法 - 它有效吗?
我有一个 Django Web 应用程序,并且有一些应该在后台运行(或实际上:启动)的任务。
应用程序部署如下:
- apache2-mpm-worker;
- mod_wsgi 处于守护进程模式(1 个进程,15 个线程)。
后台任务有以下特点:
- 需要定期运行(每5分钟左右);
- 它们需要应用程序上下文(即应用程序包需要在内存中可用);
- 除了数据库访问之外,它们不需要任何输入,即可执行一些不太繁重的任务,例如发送电子邮件和更新数据库状态。
现在我认为解决这个问题的最简单方法就是简单地利用现有的应用程序进程(由 mod_wsgi 产生)。通过将任务作为应用程序的一部分实现并为其提供 HTTP 接口,我可以防止另一个将所有应用程序保存到内存中的进程的开销。可以设置一个简单的 cronjob,每 5 分钟向此 HTTP 接口发送一个请求,仅此而已。由于应用程序进程提供 15 个线程,并且任务非常轻量级并且每 5 分钟才运行一次,因此我认为它们不会妨碍 Web 应用程序面向用户的操作的性能。
然而......我做了一些在线研究,我发现没有人提倡这种方法。许多文章提出了一种基于成熟的消息传递组件的更加复杂的方法(例如 Celery,它使用 RabbitMQ)。虽然这很性感,但对我来说听起来有些过分了。一些文章建议设置一个 cronjob 来执行执行任务的脚本。但这也不是很有吸引力,因为它会创建一个新进程,将整个应用程序加载到内存中,执行一些微小的任务,然后再次销毁该进程。此过程每 5 分钟重复一次。听起来不像是一个优雅的解决方案。
因此,我正在寻找有关我建议的方法的一些反馈,如上一段之前的段落中所述。我的推理正确吗?我是否忽略了(潜在的)问题?我的应用程序性能不会受到阻碍的假设又如何呢?
I have a Django web application and I have some tasks that should operate (or actually: be initiated) on the background.
The application is deployed as follows:
- apache2-mpm-worker;
- mod_wsgi in daemon mode (1 process, 15 threads).
The background tasks have the following characteristics:
- they need to operate in a regular interval (every 5 minutes or so);
- they require the application context (i.e. the application packages need to be available in memory);
- they do not need any input other than database access, in order to perform some not-so-heavy tasks such as sending out e-mail and updating the state of the database.
Now I was thinking that the most simple approach to this problem would be simply to piggyback on the existing application process (as spawned by mod_wsgi). By implementing the task as part of the application and providing an HTTP interface for it, I would prevent the overhead of another process that is holding all of the application into memory. A simple cronjob can be setup that sends a request to this HTTP interface every 5 minutes and that would be it. Since the application process provides 15 threads and the tasks are quite lightweight and only running every 5 minutes, I figure they would not be hindering the performance of the web application's user-facing operations.
Yet... I have done some online research and I have seen nobody advocating this approach. Many articles suggest a significantly more complex approach based on a full-blown messaging component (such as Celery, which uses RabbitMQ). Although that's sexy, it sounds like overkill to me. Some articles suggest setting up a cronjob that executes a script which performs the tasks. But that doesn't feel very attractive either, as it results in creating a new process that loads the entire application into memory, performs some tiny task, and destroys the process again. And this is repeated every 5 minutes. Does not sound like an elegant solution.
So, I'm looking for some feedback on my suggested approach as described in the paragraph before the preceeding paragraph. Is my reasoning correct? Am I overlooking (potential) problems? What about my assumption that application's performance will not be impeded?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请注意,celery 也可以在没有 RabbitMQ 的情况下工作。它可以使用贫民窟队列(SQLite、MySQL、Postgres 等,以及 Redis、MongoDB),这在测试或 RabbitMQ 似乎大材小用的简单设置中很有用。
请参阅http://ask.github.com/celery/tutorials/otherqueues.html
(使用 Celery 和 Redis/Database 作为消息队列。)
Note that celery works without RabbitMQ as well. It can use a ghetto queue (SQLite, MySQL, Postgres, etc, and Redis, MongoDB), which is useful in testing or for simple setups where RabbitMQ seems overkill.
See http://ask.github.com/celery/tutorials/otherqueues.html
(Using Celery with Redis/Database as the messaging queue.)
所有这些都是合理的方法,具体取决于您的具体要求。
另一种方法是在加载 WSGI 脚本时在进程内启动后台线程。该后台线程可以简单地休眠并偶尔唤醒以执行所需的工作,然后再返回休眠状态。
不过,此方法要求您最多有一个后台线程在其中运行的 Django 进程,以避免不同的处理在任何数据库等上执行相同的工作。
使用单个进程的守护进程模式将满足该标准。即使在多进程配置中,也可能有其他方法可以实现这一目标。
All are reasonable approaches depending on your specific requirements.
Another is to fire up a background thread within the process when the WSGI script is loaded. This background thread could simply sleep and wake up occasionally to perform required work and then go back to sleep.
This method necessitates though that you have at most one Django process which the background thread runs in to avoid different processing doing the same work on any database etc.
Using daemon mode with a single process as you are would satisfy that criteria. There are potentially other ways you could achieve that though even in a multiprocess configuration.