如何实现自定义云工作者

发布于 2024-12-19 03:29:28 字数 951 浏览 4 评论 0原文

我正在设计一个云应用程序,需要一个工作进程来搜索我的数据库寻找工作,然后执行它。

我似乎找到的有关云中后台任务主题的大多数信息都涉及某种调度程序和/或排队系统。

我所拥有的不太适合“每 5 分钟运行此任务”或“将其添加到队列中以供稍后执行”模型。我认为我的问题的主要区别在于工作人员自己找到工作要做,而不是由定期调度程序或生成工作的外部进程分配。

我所拥有的基本上是一个巨大的表,其中每个条目都有三个字段:

  1. job:要执行的小任务,假设它从 Twitter 帐户获取最后一条消息并将其存储在数据库中
  2. <执行该作业的代码>间隔:假设每 5 分钟一次,注意,该间隔是任意的,并且对于
  3. 执行作业的最后一个日期

表中的每个条目来说, 间隔是不同的实现这个的方法是拥有一个具有无限循环的工作人员。当它进入循环时,它会搜索数据库 a) 查找日期 + 间隔 的项目currentTime,b) 当找到一个时,它设置 date = currentTime,c) 然后执行作业。如果没有工作 ATM,它会休眠几秒钟,然后重试。

我将有许多并行工作人员同时搜索数据库,这就是为什么我在上面的段落中首先执行 b),然后执行 c)。由于存在并行工作程序,操作 a) 和 b) 是对数据库的原子操作,以防止重复工作。如果worker在a)和b)之后崩溃,但在它完成工作之前,也没什么大不了的,worker可以在下一个时间间隔再做;原因是工作不是在时不变的系统中执行的,因此失败作业的积压情况没有任何好处,因为任务必须按照精确的时间间隔执行,因此最好跳过 1 个时间间隔而不是不均匀的时间间隔其间执行任务。

我的问题是这是否是一个合理的实施策略?如果是这样,我如何在云上实现这个过程(我正在使用 Heroku,但将来可能会切换到 EC2)?我还没有编写任何代码,所以我欢迎其他建议(也许我误解了队列系统的用例/应用程序)。

I am designing a cloud app and need a worker process which scours my database looking for work, and then performs it.

Most of the info I seem to find on the subject of background tasks in the cloud involves some kind of scheduler and/or queuing system.

What I have doesn't quite fit into the "run this task every 5 minutes" or "add this to the queue to be executed later" models. I think the main difference to my problem is that the workers themselves find work to do, rather than being assigned it by a periodic scheduler or an external process that generates work.

What I have is basically a giant table where each entry has three fields:

  1. job: a small task to be performed, lets say it gets the last message from a twitter account and stores it in the database
  2. the interval at which to perform that job: say every 5 minutes, N.B. the interval is arbitrary and different for each entry in the table
  3. the last date when the job was performed

The way I would implement this is to have a worker which has an infinite loop. When it enters the loop, it scours the database a)looking for items whose date + interval < currentTime, b)when it finds one, it sets date = currentTime, and c)then executes the job. If there is no work ATM, it sleep for a few seconds, then tries again.

I will have many parallel workers scouring the database simultaneously, which is why I do b) first and then c) in the paragraph above. Since there are parallel workers, action a) and b) are atomic operations on the database to prevent work being duplicated. If the worker crashes after a) and b), but before it manages to finish the work, it's no big deal, and the workers can just do it at the next interval; reason for this is that the work is not performed in a time-invariant system so a backlog scenario of failed jobs has no benefit as the tasks have to be performed at their exact intervals, so it's better to skip 1 interval than to have uneven intervals between which the tasks were executed.

My question is whether that is a reasonable implementation strategy? If so, how do I bring this process to life on the cloud (I am using Heroku, but may switch to EC2 in the future)? I still haven't written any code so I would welcome other suggestions (maybe I misunderstood the use cases/applications for queue systems).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

稀香 2024-12-26 03:29:28

这听起来与使用预定作业之类的东西非常接近,您不妨走人迹罕至的道路并以更传统的方式进行。您没有理由不能安排作业每隔几秒运行一次。

然而,这种找工作的想法听起来很危险。例如,如果两个工作人员发现同一任务同时运行,会发生什么?另外,应用程序中是否没有触发器可以表明工作需要完成?您有“寻找工作”的代码似乎很奇怪。

您可以通过简单的周期性后台任务走很长的路,因此在推出您自己的任务之前,我会穷尽该领域的所有可能性。

This sounds so close to using something like a scheduled job that you might as well tread the well beaten path and do it the more conventional way. There's no reason why you can't schedule a job to run once every few seconds.

However, this idea of looking for work sounds dodgy. What happens if two workers find the same task to run at the same time for instance? Also, are there not triggers in the application which can indicate that work needs doing? It seems strange that you have code 'looking for work'.

You can go a very long way with simple periodic background tasks, so I would exhaust all possibilities in that area before rolling your own.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文