数据库支持的工作队列

发布于 2024-10-31 04:35:34 字数 417 浏览 9 评论 0原文

我的情况...

我有一组工作人员，这些工作人员被安排定期运行，每个工作人员以不同的时间间隔运行，并且希望找到一个好的实现来管理他们的执行。

示例：假设我有一名工人每周去商店给我买一次牛奶。我想将这个作业及其配置存储在 mysql 表中。但是，轮询表（每秒？）并查看哪些作业已准备好放入执行管道似乎是一个非常坏主意。

我所有的工作人员都是用 javascript 编写的，因此我使用 node.js 执行，并使用 beanstalkd 作为一条管道。

如果新作业（即安排工作人员在给定时间运行）是异步创建的，并且我需要持久存储作业结果和配置，如何避免轮询表？

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

≈。彩虹 2024-11-07 04:35:34

我同意这看起来不太优雅，但考虑到计算机的工作方式，某处*某处*将不得不进行某种的轮询，以便找出要执行哪些工作执行时。那么，让我们回顾一下您的一些选项：

轮询数据库表。这根本不是一个坏主意 - 如果您无论如何都将作业存储在 MySQL 中，这可能是最简单的选择。每秒一次查询的速率不算什么 - 尝试一下，您会发现您的系统甚至感觉不到它。
一些想法可以帮助您将其扩展到每秒数百个查询，或者只是降低系统资源需求：
- 创建第二个表“job_pending”，其中放置需要在接下来的 X 秒/分钟/小时内执行的作业。
- 每隔较长一段时间才对包含所有作业的大表运行一次查询，然后填充每隔较短时间查询的小表。
- 删除从小表中执行的作业以保持较小的规模。
- 在“execute_time”（或任何您所说的名称）列上使用索引。
如果您必须进一步扩展，请将主作业表保留在数据库中，并使用我建议的第二个较小的表，只需将该表放入 RAM 中：要么作为数据库引擎中的内存表，要么作为程序中的某种队列。如果您也有的话，请以极短的间隔查询队列 - 需要一些极端的用例才会导致此处出现任何性能问题。
此选项的主要问题是，您必须跟踪内存中但未执行的作业，例如由于系统崩溃 - 需要更多编码...
为每个作业创建一个线程一堆作业（例如，所有需要在下一分钟执行的作业），然后调用 thread.sleep(millis_until_execution_time) （或者其他什么，我对 node.js 不太熟悉）。
此选项与否有相同的问题。 2 - 您必须跟踪作业执行以进行崩溃恢复。在我看来，这也是最浪费的——每个休眠的作业线程仍然占用系统资源。

当然可能还有其他选择 - 我希望其他人回答有更多想法。

只要意识到每秒轮询数据库根本不是一个坏主意。在我看来，这是最直接的方法（记住 KISS），按照这个速度，您不应该出现性能问题，因此请避免过早优化。

I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:

Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.
Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:
- Create a second table, 'job_pending', where you put the jobs that need to be executed within the next X seconds/minutes/hours.
- Run queries on your big table of all jobs only once in a longer while, then populate the small table which you query every shorter while.
- Remove jobs that were executed from the small table in order to keep it small.
- Use an index on your 'execute_time' (or whatever you call it) column.
If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.
The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...
Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).
This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.

There may be additional options of course - I hope that others answer with more ideas.

Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.

回复收藏 0 原文

几度春秋 2024-11-07 04:35:34

为什么不在 node.js 中保存一个 Job 对象并保存到数据库中。

var Job = {
   id: long,
   task: String,
   configuration: JSON,
   dueDate: Date,
   finished: bit
};

我建议您只将 id 存储在 RAM 中，并将所有其他 Job 数据保留在数据库中。当你的超时函数最终运行时，它只需要知道 .id 即可获取其他数据。

var job = createJob(...); // create from async data somewhere.
job.save(); // save the job.
var id = job.id // only store the id in RAM
// ask the job to be run in the future.
setTimeout(Date.now - job.dueDate, function() {
    // load the job when you want to run it
    db.load(id, function(job) {
        // run it.
        run(job);
        // mark as finished
        job.finished = true;
        // save your finished = true state
        job.save();
    });
});
// remove job from RAM now.
job = null;

如果服务器崩溃，您所要做的就是查询所有具有 [finished=false] 的作业，将它们加载到 RAM 中并再次启动 setTimeouts。

如果出现任何问题，您应该能够像这样干净地重新启动：

db.find("job", { finished: false }, function(jobs) {
    each(jobs, function(job) {
         var id = job.id;
         setTimeout(Date.now - job.dueDate, function() {
             // load the job when you want to run it
             db.load(id, function(job) {
                 // run it.
                 run(job);
                 // mark as finished
                 job.finished = true;
                 // save your finished = true state
                 job.save();
             });
         });
         job = null;
    });
});

Why not have a Job object in node.js that's saved to the database.

var Job = {
   id: long,
   task: String,
   configuration: JSON,
   dueDate: Date,
   finished: bit
};

I would suggest you only store the id in RAM and leave all the other Job data in the database. When your timeout function finally runs it only needs to know the .id to get the other data.

var job = createJob(...); // create from async data somewhere.
job.save(); // save the job.
var id = job.id // only store the id in RAM
// ask the job to be run in the future.
setTimeout(Date.now - job.dueDate, function() {
    // load the job when you want to run it
    db.load(id, function(job) {
        // run it.
        run(job);
        // mark as finished
        job.finished = true;
        // save your finished = true state
        job.save();
    });
});
// remove job from RAM now.
job = null;

If the server ever crashes all you have to is query all jobs that have [finished=false], load them into RAM and start the setTimeouts again.

If anything goes wrong you should be able to restart cleanly like such:

db.find("job", { finished: false }, function(jobs) {
    each(jobs, function(job) {
         var id = job.id;
         setTimeout(Date.now - job.dueDate, function() {
             // load the job when you want to run it
             db.load(id, function(job) {
                 // run it.
                 run(job);
                 // mark as finished
                 job.finished = true;
                 // save your finished = true state
                 job.save();
             });
         });
         job = null;
    });
});

回复收藏 0 原文

~没有更多了~