Quartz.Net 作业并不总是运行 - 找不到任何原因
我们使用 Quartz.Net 来安排大约 200 个重复作业。每个作业都使用相同的 IJob 实现类,但它们可以有不同的时间表。在实践中,他们最终具有相同的时间表,因此我们有大约 200 个作业详细信息,每个作业详细信息都有自己的(相同的)重复/简单触发器,已安排。间隔一小时。
此作业执行的任务是下载 rss feed,然后下载 rss feed 中链接到的所有媒体文件。下载之前,它会擦除要放置文件的目录。一次作业运行需要几秒到十几秒(有时更长)。
我们的调度方法是在新的 StdSchedulerFactory 上调用 GetScheduler()(所有作业立即调度到同一个 IScheduler 实例中)。我们通过立即 Start() 来遵循调度。
这些作业似乎运行良好,但经过仔细检查,我们发现少数作业偶尔(或几乎从不)运行。
例如,所有 200 个作业应该在今晚 6:40 运行。他们中的大多数人都这么做了。但少数人没有。我通过查看文件时间戳来确定这一点,如果作业运行,当然应该更新文件时间戳(因为它会删除并重新下载文件)。
我启用了 Quartz.Net 日志记录,并向我们的代码中添加了相当多的日志记录语句。
我收到日志消息,表明 Quartz 在一轮作业开始后大约一分钟内正在创建和执行作业。
之后,所有活动停止。没有作业运行,也没有创建日志消息。零。
然后,在下一个触发间隔,Quartz 再次启动,我的日志文件更新,并且各种文件开始下载。但是 - 看起来有些 JobDetail 实例永远不会到达队列的头部(可以这么说)或者很少出现。整个周末,一些作业似乎更新得相当频繁,而最近,另一些作业自周五开始该过程以来就没有更新过一次(顺便说一句,它在 Windows Service shell 中运行)。
所以...我希望有人可以帮助我理解 Quartz 的这种行为。
我需要确保每项工作都在运行。如果错过了触发,我需要 Quartz 尽快运行它。通过阅读文档,我认为这将是默认行为 - 对于具有无限重复计数的 SimpleTrigger,如果错过触发窗口,它将重新安排作业以立即执行。事实似乎并非如此。有什么方法可以确定 Quartz 不解雇这些工作的原因吗?我正在跟踪级别进行日志记录,但它们根本不存在。它创建并执行大量作业,但如果我发现缺少一个作业,我所能找到的只是它上次运行过(例如,有时它有几个小时或几天没有运行)。没有任何关于为什么它被跳过的信息(我希望 Quartz 记录一些东西,如果它因任何原因跳过一个作业)等等。
任何帮助都会非常非常感激 - 我花了一整天的时间试图弄清楚这一点。
We're using Quartz.Net to schedule about two hundred repeating jobs. Each job uses the same IJob implementing class, but they can have different schedules. In practice, they end up having the same schedule, so we have about two hundred job details, each with their own (identical) repeating/simple trigger, scheduled. The interval is one hour.
The task this job performs is to download an rss feed, and then download all of the media files linked to in the rss feed. Prior to downloading, it wipes the directory where it is going to place the files. A single run of a job takes anywhere from a couple seconds to a dozen seconds (occasionally more).
Our method of scheduling is to call GetScheduler() on a new StdSchedulerFactory (all jobs are scheduled at once into the same IScheduler instance). We follow the scheduling with an immediate Start().
The jobs appear to run fine, but upon closer inspection we are seeing that a minority of the jobs occasionally - or almost never - run.
So, for example, all two hundred jobs should have run at 6:40 pm this evening. Most of them did. But a handful did not. I determine this by looking at the file timestamps, which should certainly be updated if the job runs (because it deletes and redownloads the file).
I've enabled Quartz.Net logging, and added quite a few logging statements to our code as well.
I get log messages that indicate Quartz is creating and executing jobs for roughly one minute after the round of jobs starts.
After that, all activity stops. No jobs run, no log messages are created. Zero.
And then, at the next firing interval, Quartz starts up again and my log files update, and various files start downloading. But - it certainly appears like some JobDetail instances never make it to the head of the line (so to speak) or do so very infrequently. Over the entire weekend, some jobs appeared to update quite frequently, and recently, and others had not updated a single time since starting the process on Friday (it runs in a Windows Service shell, btw).
So ... I'm hoping someone can help me understand this behavior of Quartz.
I need to be certain that every job runs. If it's trigger is missed, I need Quartz to run it as soon as possible. From reading the documentation, I thought this would be the default behavior - for SimpleTrigger with an indefinite repeat count it would reschedule the job for immediate execution if the trigger window was missed. This doesn't seem to be the case. Is there any way I can determine why Quartz is not firing these jobs? I am logging at the trace level and they just simply aren't there. It creates and executes an awful lot of jobs, but if I notice one missing - all I can find is that it ran it the last time (for example, sometimes it hasn't run for hours or days). Nothing about why it was skipped (I expected Quartz to log something if it skips a job for any reason), etc.
Any help would really, really be appreciated - I've spent my entire day trying to figure this out.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
读完你的文章后,听起来很像少数未执行的作业很可能会失败。我相信这一点的原因是:
在 Quartz.NET 中,默认失火阈值是 1 分钟。您可能需要检查日志记录配置以确定为什么未记录这些失火事件。我敢打赌,如果您打开日志记录的闸门(即将所有内容设置为调试,并确保您确实有 Quartz 调度程序类的日志记录指令),然后重新运行您的作业。我几乎肯定问题是失火事件没有显示在您的日志中,因为日志记录配置缺少某些内容。这是可以理解的,因为日志记录配置很快就会变得非常混乱。
另外,将来您可能需要咨询 google 上的quartz.net 论坛,因为那里讨论了一些更棘手的问题。
http://groups.google.com/group/quartznet?pli=1
现在,关于设置调度程序应该执行的策略的另一个问题,我无法具体帮助您,但是如果您仔细阅读 API 文档,并且还咨询了 google 讨论组,您应该能够轻松设置适合您需求的失火策略标志。我相信触发器有一个您可以配置的 MisfireInstruction 属性。
另外,我认为失火会带来很多“噪音”,应该避免;也许增加调度程序上的线程数是避免失火的一种方法?另一种选择是将作业执行错开为单独/多个批次。
祝你好运!
After reading your post, it sounds a lot like the handful of jobs that are not executing are very likely misfiring. The reason that I believe this:
In Quartz.NET the default misfire threshold is 1 minute. Chances are, you need to examine your logging configuration to determine why those misfire events are not being logged. I bet if you throw open the the floodgates on your logging (ie. set everything to debug, and make sure that you definitely have a logging directive for the Quartz scheduler class), and then rerun your jobs. I'm almost positive that the problem is the misfire events are not showing up in your logs because the logging configuration is lacking something. This is understandable, because logging configuration can get very confusing, very quickly.
Also, in the future, you might want to consult the quartz.net forum on google, since that is where some of the more thorny issues are discussed.
http://groups.google.com/group/quartznet?pli=1
Now, your other question about setting the policy for what the scheduler should do, I can't specifically help you there, but if you read the API docs closely, and also consult the google discussion group, you should be able to easily set the misfire policy flag that suits your needs. I believe that Trigger's have a MisfireInstruction property which you can configure.
Also, I would argue that misfires introduce a lot of "noise" and should be avoided; perhaps bumping up the thread count on your scheduler would be a way to avoid misfires? The other option would be to stagger your job execution into separate/multiple batches.
Good luck!