类似 cron 的循环任务调度程序设计
假设您想要安排重复性任务,例如:
- 每周三上午 10 点发送电子邮件
- 在每个月的第一天创建摘要
您希望在 Web 应用程序中为合理数量的用户执行此操作 - 即。 100k 用户每个用户都可以决定他们想要安排什么时间。
并且您希望确保计划的项目运行,即使它们最初被错过 - 例如。由于某种原因,电子邮件没有在周三上午 10 点发送,它应该在下一个检查间隔发送,比如周三上午 11 点。
你会如何设计呢?
如果您使用 cron 每 x 分钟触发您的调度应用程序,那么实现决定每个时间点应该运行什么的部分的好方法是什么?
我见过的类似 cron 的实现会将当前时间与所有指定项目的触发时间进行比较,但我也想处理错过的项目。
我有一种感觉,有一种比我正在烹饪的设计更聪明的设计,所以请赐教。
Say you want to schedule recurring tasks, such as:
- Send email every wednesday at 10am
- Create summary on the first day of every month
And you want to do this for a reasonable number of users in a web app - ie. 100k users each user can decide what they want scheduled when.
And you want to ensure that the scheduled items run, even if they were missed originally - eg. for some reason the email didn't get sent on wednesday at 10am, it should get sent out at the next checking interval, say wednesday at 11am.
How would you design that?
If you use cron to trigger your scheduling app every x minutes, what's a good way to implement the part that decides what should run at each point in time?
The cron-like implementations I've seen compare the current time to the trigger time for all specified items, but I'd like to deal with missed items as well.
I have a feeling there's a more clever design than the one I'm cooking up, so please enlighten me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
基本上有两种设计。
定期运行并将当前时间与调度规范进行比较(即“现在运行吗?”),并执行符合条件的时间。
另一种技术采用当前的调度规范并找到该项目应该触发的下一次时间。然后,它将当前时间与“下一次”小于“当前时间”的所有项目进行比较,并触发这些项目。然后,当一个项目完成时,它会被重新安排到新的“下一次”。
第一种技术不能处理“错过”的项目,第二种技术只能处理先前安排的那些项目。
具体来说,假设您有一个每小时运行一次的计划,在整点运行。
所以,比如说,下午 1 点、2 点、3 点、4 点。
下午 1:30,运行任务已关闭并且未执行任何进程。直到下午 3:20 才会再次开始。
使用第一种技术,调度程序将触发下午 1 点任务,但不会触发下午 2 点和下午 3 点任务,因为在这些时间过去时它没有运行。下一个要运行的作业是下午 4 点作业,嗯,下午 4 点。
使用第二种技术,调度程序将触发下午 1 点任务,并在下午 2 点安排下一个任务。由于系统宕机,下午 2 点的任务没有运行,下午 3 点的任务也没有运行。但当系统在 3:20 重新启动时,它发现它“错过”了下午 2 点的任务,并在 3:20 解雇了它,然后再次安排在下午 4 点。
每种技术都有其优点和缺点。使用第一种技术,你会错过工作。使用第二种技术,您仍然可能会错过作业,但它可以“赶上”(在某种程度上),但它也可能“在错误的时间”运行作业(也许它应该在整点运行一段时间)原因)。
第二种技术的好处是,如果您在执行作业结束时重新安排,则不必担心级联作业问题。
假设您有一项每分钟都在运行的作业。使用第一种技术,工作每分钟都会被解雇。但是,通常情况下,如果作业未在一分钟内完成,那么您可能会运行 2 个作业(一个在进程后期,另一个启动)。如果作业未设计为同时运行多次,这可能会成为问题。而且这种情况可能会加剧(如果确实存在问题,10 分钟后就会有 10 个工作互相争斗)。
使用第二种技术,如果您在作业结束时安排,那么如果作业恰好运行一分钟多一点,那么您将“跳过”一分钟”并启动下一分钟,而不是自行运行因此,您可以在下午 1:01、下午 1:03、下午 1:05 等实际运行的每一分钟安排一个作业。
根据您的作业设计,其中任何一个都可以是“好”或“坏”。 。
最后,与实现第二个技术相比,实现第一个技术确实非常简单。与导出 cron 字符串的 NEXT 有效时间相比,确定 cron 字符串(例如)是否与给定时间匹配的代码很简单 我知道,我有几百行代码来证明它并不漂亮。
There's 2 designs, basically.
One runs regularly and compares the current time to the scheduling spec (i.e. "Does this run now?"), and executes those that qualify.
The other technique takes the current scheduling spec and finds the NEXT time that the item should fire. Then, it compares the current time to all of those items who's "next time" is less than "current time", and fires those. Then, when an item is complete, it is rescheduled for the new "next time".
The first technique can not handle "missed" items, the second technique can only handle those items that were previously scheduled.
Specifically consider you you have a schedule that runs once every hour, at the top of the hour.
So, say, 1pm, 2pm, 3pm, 4pm.
At 1:30pm, the run task is down and not executing any processes. It does not start again until 3:20pm.
Using the first technique, the scheduler will have fired the 1pm task, but not fired the 2pm, and 3pm tasks, as it was not running when those times passed. The next job to run will be the 4pm job, at, well, 4pm.
Using the second technique, the scheduler will have fired the 1pm task, and scheduled the next task at 2pm. Since the system was down, the 2pm task did not run, nor did the 3pm task. But when the system restarted at 3:20, it saw that it "missed" the 2pm task, and fired it off at 3:20, and then scheduled it again for 4pm.
Each technique has it's ups and downs. With the first technique, you miss jobs. With the second technique you can still miss jobs, but it can "catch up" (to a point), but it may also run a job "at the wrong time" (maybe it's supposed to run at the top of the hour for a reason).
A benefit of the second technique is that if you reschedule at the END of the executing job, you don't have to worry about a cascading job problem.
Consider that you have a job that runs every minute. With the first technique, the job gets fired each minute. However, typically, if the job is not FINISHED within it's minute, then you can potentially have 2 jobs running (one late in the process, the other starting up). This can be a problem if the job is not designed to run more than once simultaneously. And it can exacerbate (if there's a real problem, after 10 minutes you have 10 jobs all fighting each other).
With the second technique, if you schedule at the end of the job, then if a job happens to run just over a minute, then you'll "skip" a minute" and start up the following minute rather than run on top of itself. So, you can have a job scheduled for every minute actually run at 1:01pm, 1:03pm, 1:05pm, etc.
Depending on your job design, either of these can be "good" or "bad". There's no right answer here.
Finally, implementing the first technique is really, quite trivial compared to implementing the second. The code to determine if a cron string (say) matches a given time is simple compared to deriving what time a cron string will be valid NEXT. I know, and I have a couple hundred lines of code to prove it. It's not pretty.
如果您想跳过设计并开始使用,请查看 Celery。调度程序称为 celerybeat。
编辑:
另外相关:如何每周发送 100,000 封电子邮件?
In case you want to skip designing and start using have a look at Celery. The scheduler is called celerybeat.
Edit:
Also relevant: How to send 100,000 emails weekly?
将支持 Java 进程与 Quartz 调度程序结合使用是一种可能的潜在解决方案。我相信 Quartz 应该可以很好地扩展到这个水平。请参阅此相关的SO问题:“如何扩展 Quartz Scheduler”.. 。
如果您仔细查看 Quartz 文档,我想您会发现您对触发和错过执行的担忧得到了明确的处理,并提供了许多合适的策略可供选择 在可扩展性方面,我相信您可以将作业存储在 JDBC 后备存储中。
被淘汰了,因为提问者专门寻找设计讨论......
<罢工>
如果您在提出“Python 任务调度程序”问题之前先在 StackOverflow 中进行搜索,那么您可能会发现:“Python 的企业调度程序...”。我强烈建议寻找现有的实现,而不是尝试 NIH 开发类似的东西,尽管在其他答案中对如何做到这一点有很好的观察。考虑到您所声明的可扩展性目标,您正在完成一项相当具有挑战性的任务,并且您应该在从头开始处理像本主题这样经过大量开发的主题之前消除所有其他选项。一个可以考虑的途径是通过 Jython 适应备受推崇的 Quartz,并确定您的用例是否可以在该上下文中处理,同时尽可能少地涉足 Java 世界(可能不是您的第一选择) 。
Using a backing Java process with Quartz scheduler is a likely potential solution. I believe Quartz should scale to this level reasonably well. See this related SO question: "How to scale the Quartz Scheduler"...
If you take a careful look at the Quartz documentation, I think you'll find that your concerns regarding triggering and missed executions are dealt with cleanly, and offer a number of suitable policies to choose from. In terms of scalability, I believe you can store jobs in a JDBC backing store.
Struck out, since the questioner was specifically looking for a design discussion...
If you framed your initial StackOverflow search prior to asking the question in terms of "task schedulers for Python", you would have turned this up: "An enterprise scheduler for python...". I strongly suggest looking for an existing implementation rather than attempting a NIH development for something like this, despite the great observations about how you might do this in the other answer. Given your stated scalability goals, you're biting off a fairly challenging task, and you should eliminate all other options before going down the from-scratch road on a topic as heavily developed as this one. One possible avenue to consider would be adaptation to the well-regarded
Quartz
via Jython, and determine whether your use cases could be handled in that context with minimal dipping into the Java world (presumably not your first choice).