数据库支持的工作队列
我的情况...
我有一组工作人员,这些工作人员被安排定期运行,每个工作人员以不同的时间间隔运行,并且希望找到一个好的实现来管理他们的执行。
示例:假设我有一名工人每周去商店给我买一次牛奶。我想将这个作业及其配置存储在 mysql 表中。但是,轮询表(每秒?)并查看哪些作业已准备好放入执行管道似乎是一个非常坏主意。
我所有的工作人员都是用 javascript 编写的,因此我使用 node.js 执行,并使用 beanstalkd 作为一条管道。
如果新作业(即安排工作人员在给定时间运行)是异步创建的,并且我需要持久存储作业结果和配置,如何避免轮询表?
谢谢!
My situation ...
I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.
Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.
All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.
If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我同意这看起来不太优雅,但考虑到计算机的工作方式,某处*某处*将不得不进行某种的轮询,以便找出要执行哪些工作执行时。那么,让我们回顾一下您的一些选项:
轮询数据库表。这根本不是一个坏主意 - 如果您无论如何都将作业存储在 MySQL 中,这可能是最简单的选择。每秒一次查询的速率不算什么 - 尝试一下,您会发现您的系统甚至感觉不到它。
一些想法可以帮助您将其扩展到每秒数百个查询,或者只是降低系统资源需求:
如果您必须进一步扩展,请将主作业表保留在数据库中,并使用我建议的第二个较小的表,只需将该表放入 RAM 中:要么作为数据库引擎中的内存表,要么作为程序中的某种队列。如果您也有的话,请以极短的间隔查询队列 - 需要一些极端的用例才会导致此处出现任何性能问题。
此选项的主要问题是,您必须跟踪内存中但未执行的作业,例如由于系统崩溃 - 需要更多编码...
为每个作业创建一个线程一堆作业(例如,所有需要在下一分钟执行的作业),然后调用 thread.sleep(millis_until_execution_time) (或者其他什么,我对 node.js 不太熟悉)。
此选项与否有相同的问题。 2 - 您必须跟踪作业执行以进行崩溃恢复。在我看来,这也是最浪费的——每个休眠的作业线程仍然占用系统资源。
当然可能还有其他选择 - 我希望其他人回答有更多想法。
只要意识到每秒轮询数据库根本不是一个坏主意。在我看来,这是最直接的方法(记住 KISS),按照这个速度,您不应该出现性能问题,因此请避免过早优化。
I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:
Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.
Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:
If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.
The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...
Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).
This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.
There may be additional options of course - I hope that others answer with more ideas.
Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.
为什么不在 node.js 中保存一个
Job
对象并保存到数据库中。我建议您只将 id 存储在 RAM 中,并将所有其他 Job 数据保留在数据库中。当你的超时函数最终运行时,它只需要知道
.id
即可获取其他数据。如果服务器崩溃,您所要做的就是查询所有具有
[finished=false]
的作业,将它们加载到 RAM 中并再次启动 setTimeouts。如果出现任何问题,您应该能够像这样干净地重新启动:
Why not have a
Job
object in node.js that's saved to the database.I would suggest you only store the id in RAM and leave all the other
Job
data in the database. When your timeout function finally runs it only needs to know the.id
to get the other data.If the server ever crashes all you have to is query all jobs that have
[finished=false]
, load them into RAM and start the setTimeouts again.If anything goes wrong you should be able to restart cleanly like such: