python中的任务队列过程
任务是: 我有任务队列存储在数据库中。它生长了。当我有资源时,我需要通过 python 脚本来解决任务。我看到两种方法:
python 脚本一直在工作。但我不喜欢它(可能是内存泄漏的原因)。
由 cron 调用的 python 脚本并执行一小部分任务。但我需要解决内存中一个正在工作的活动脚本的问题(以防止活动脚本计数增长)。在 python 中实现它的最佳解决方案是什么?
有什么想法可以解决这个问题吗?
Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用锁定文件来防止多个脚本耗尽 cron。请参阅之前问题的答案,“Python:用于创建基于 PID 的锁定文件的模块”。实际上,对于您需要确保不会运行多个实例的任何事情来说,这确实是一个很好的做法,因此即使您确实让脚本不断运行,您也应该研究一下它,我就是这样做的 em>建议。
对于大多数事情来说,避免内存泄漏应该不会太难,但如果你遇到很多麻烦(例如,我有时会使用复杂的第三方 Web 框架) ,我建议使用一个小的、精心设计的主循环来编写脚本,该循环监视数据库中的新作业,然后使用 多处理模块用于分叉新进程来完成每项任务。
任务完成后,子进程可以退出,立即释放未正确垃圾收集的所有内存,并且主循环应该足够简单,以便可以避免任何内存泄漏。
这还提供了一个优点:如果您的系统有多个 CPU 核心,或者您的任务花费大量时间等待 I/O,您可以并行运行多个任务。
You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.
这是一个有点模糊的问题。你应该记住的一件事是,由于自动垃圾收集,Python 中很难泄漏内存。
cron
使用 Python 脚本来处理队列并不是很好,尽管它可以正常工作。我会使用方法1;如果您需要更多功能,您可以创建一个小型 Python 进程来监视数据库队列并启动新进程来处理任务。
This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection.
cron
ing a Python script to handle the queue isn't very nice, although it would work fine.I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.
我建议使用 Celery,这是我自己使用的异步任务排队系统。
对于您的用例来说,它可能看起来有点繁重,但如果/需要时添加更多工作资源,可以轻松地在以后进行扩展。
I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.