具有复杂依赖关系的任务调度
我正在寻找一种安排任务的方法,其中一旦先前的几个任务完成,任务就会开始。
我有数百个“收集器”进程,它们从各种来源收集数据并将其转储到数据库中。一旦这些完成收集(从 1 秒到几分钟),我想立即启动一堆“数据处理”过程来分析和理解数据库中的数据。当所有这些完成后,我想要启动最后一个任务并向我发送一封包含摘要数据的电子邮件。
我目前正在使用 Gearman 队列,并在预计“收集器”进程完成后启动计时器上的数据处理任务,但这意味着处理步骤将在 10 分钟后开始,即使收集器进程在 3 (或者更糟的是,还没有完成)。
理想情况下,我能够指定特定的规则,例如“当进程 A 和(B 或 C)完成时启动进程 X”,或“当 95% 的指定进程完成或经过 10 分钟时启动进程 Y”。
进程和依赖项需要自动创建,因为每次都会使用不同的参数运行(即我每次都不会进行相同的计算)。
我可以使用队列和监视器自己编写某种图形依赖框架,但这似乎是必须已经解决的事情,我正在寻找任何使用过我所描述的东西的人。
I'm looking for a way of scheduling tasks where a task starts once several previous tasks have completed.
I have several hundred "collector" processes which collect data from a variety of sources and dump it to a database. Once these have finished collecting (anywhere from 1 second to a few minutes) I want to immediately kick off a bunch of "data-processing" processes to analyse and make sense of the data in the database. When all of these have finished I want a final task to start and send me an email of the summary data.
I'm currently using a Gearman queue and starting the data-processing tasks on timers once I expect the "collector" processes to have completed, but this means that the processing step starts after 10 minutes, even if the collector processes finished after 3 (or worse, have not yet finished).
Ideally I'd be able to specify specific rules like "start process X when process A and (B or C) complete", or "start process Y when 95% of the specified processes have completed or 10 minutes have elapsed".
The processes and dependencies need to be automatically created as it will be run with different parameters each time (ie. I'm not doing an identical calculation each time).
I could write some kind of graph-dependency framework myself using queues and monitors, but it seems like the sort of thing that must have already been solved and I'm looking for anyone who has used something like I describe.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为什么不让工作程序 X 启动子工作程序 A、B 和 C 并等待它们完成后再继续?您可以拥有一个进程 X,它同时是 Gearman 工作线程和客户端。
Why not let worker X launch subworkers A, B and C and wait for them to complete before proceeding? You can have a process X that is both a Gearman worker and a client at the same time.
您有一些非常特殊的条件:
起初我认为您的流程只是异步的。在这种情况下,您可以使用称为延迟和承诺的东西。在处理 ajax 数据调用时,我在 JavaScript 中经常使用它。有了这个,您基本上就配置了一个依赖图。
但你的情况更复杂。显然你需要一个“或”,进度监控和计时器。
这些都与 PHP 非常不一样。 PHP 对 cron 作业的支持非常差,不支持异步任务,也没有计时器。你为什么用 PHP 来做这个?
You have some very peculiar conditions:
At first I thought your processes were simply asynchronous. In that case you could use something called deferreds and promises. I'm using this a lot in JavaScript when dealing with ajax calls for data. With this you're basically configuring a dependency graph.
But your case is even more complex. Apparently you need an 'or', progress monitoring and timers.
This is all very much un-PHP like stuff. PHP has very poor cron job support, no support for asynchronous tasks and no timers. Why are you doing this in PHP?