python 中的多处理
我的问题可能很模糊。我正在寻找一些合理的方法来执行我的任务。 我开发了一个网页,用户上传一个文件,用作我用 python 开发的代码的输入文件。
当通过网页提交输入文件时,它将保存在临时文件夹中,守护程序从该文件夹将其复制到另一个位置。我想要的是定期查找文件夹中的文件(可以编写守护程序)如果找到多个文件,它将作为单独的作业运行代码,在目录中找到的输入文件限制最多运行 5 个进程同时,当一个进程完成时,如果文件夹中有文件(按时间顺序),它将启动下一个进程。
我知道 python 中的多处理,但不知道如何实现来实现我想要的,或者我应该使用像 XGrid 这样的东西来管理我的工作。代码通常需要几个小时到几天才能完成一项工作。但这些工作是相互独立的。
My question might be quite vague. I am looking for some reasonable approach to perform my task.
I have developed a webpage where the user uploads a file used as an input file for my code developed in python.
When the input file is submitted through the webpage, it is saved in a temporary folder from where the daemon copies it to another location. What I want is to looks for the files in the folder regularly (can write a daemon) If it finds more than one file it runs the code as separate jobs with the input files found in the directory limiting to a max of 5 processes running at the same time and when one process finishes it starts the next if there are files in the folder (in a chronological order).
I am aware of multiprocessing in python but don't know how to implement to achieve what I want or should I go for something like XGrid to mange my jobs. The code usually takes few hours to few day to finish one job. But the jobs are independent of each other.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我使用 SQL 表来执行这样的事情,因为如果我不限制的话,我的用户可能会一起启动数十个任务。
当一个新文件出现时,一个守护进程将其名称写入表中(以及各种其他信息,例如大小、日期、时间、用户等),
然后另一个守护进程读取该表,获取第一个未执行的任务,执行它,并将其标记为已执行。当发现没有任何反应时,它只是再等待一分钟左右。
该表也是所执行作业的日志,也可能包含结果。你可以从中得到平均值。
I use a SQL table to perform such a thing, because my users may launch dozens of tasks alltogether if I do not limit them.
When a new file shows up, a daemon writes its name in the table (with allsort of other information such as size, date, time, user, ...)
Another daemon is then reading the table, getting the first not-executed task, doing it, and marking it as executed. When nothing is found to be done, it just waits for another minute or so.
This table is also the log of jobs performed and may carry results too. And you can get averages from it.