gearman 和重试具有不可靠外部依赖的工人
我使用 gearman 对各种不同的作业进行排队,其中一些作业始终可以立即得到服务,而另一些作业可能会“失败”,因为它们需要不可靠的外部服务。 (例如,发送电子邮件可能需要经常不可用的 SMTP 服务器。)
如果外部服务出现故障,我希望将需要该服务的所有作业保留在队列中,并偶尔重试一项作业(例如每隔几分钟) )直到服务再次可用。 (如果服务在几个小时内不可用,也许可以选择发送电子邮件。)
但是,我希望将不需要失败服务的工作尽快转移给工作人员。如何才能实现这一目标? (如果有必要,我很乐意将一些逻辑放入工作人员中,尽管在工作人员方面进行限制似乎有点“晚了”。)
I'm using gearman to queue a variety of different jobs, some which can always be serviced immediately, and some which can "fail", because they require an unreliable external service. (For example, sending email might require an SMTP server that's frequently unavailable.)
If an external service goes down, I'd like to keep all jobs which require that service on the queue, and retry one job occasionally (every few minutes, say) until the service becomes available again. (Perhaps optionally sending email if the service has not been available for hours.)
However I'd like jobs that don't require a failed service to be passed on to workers as soon as possible. How can this be achieved? (I'm happy to put some of the logic in the workers if necessary, although it seems to be a bit "late" to throttle on the worker side.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Gearman 应该已经可以处理这个问题了。只要你有一些工作人员专门处理具有不可靠依赖关系的工作并且不处理其他工作,以及一些工作人员要么完成所有工作,要么只处理没有不可靠依赖关系的工作。
您所需要做的就是向不可靠的依赖工作人员添加一些代码,以便他们只接受一次检查依赖服务是否正在运行的作业,如果服务关闭,那么只需让他们稍等一下并重新测试服务(然后继续无限),一旦服务启动,然后让它们加入 gearmand 服务器,执行工作,返回工作,重新测试服务等。
当依赖服务关闭时,不处理需要该服务的工作的工作人员将保留正在缓慢地经过其他作业的作业队列。如果有可用的工作线程来处理其他作业类型,Gearmand 不会阻止一种作业类型的整个作业队列(或工作线程)。
关键是要明智地定义工作类型和员工。
编辑
- 啊哈,我知道我的想法有点过时,(大约一年前我编写了我的 gearman 系统,从那以后就没有真正接触过它)。我对此类问题的解决方案是,一旦从属服务检测到故障,就让所有通常处理从属作业的工作人员在 gearmand 服务器上注销其从属作业处理能力。 (并且当前尝试完成该作业的任何工作人员都应该返回失败。)一旦服务备份 - 让这些工作人员重新注册他们处理该工作的能力。请注意,这确实需要另一个通信渠道来通知工作人员相关服务的状态。
希望这有帮助
Gearman should already be handle this. As long as you have some workers which specialise in handling jobs with unreliable dependancies and don't handle other jobs, along with some workers that either do all jobs, or just jobs without unreliable dependencies.
All you would need to do it add some code the unreliable dependancy workers so that they only accept jobs once that have checked that the dependent service is running, if the service is down then just have them wait a bit and retest the service (and continue ad infinitum), once the service is up then have them join the gearmand server, do job, return work, retest service, etc etc.
While the dependent service is down, the workers that don't handle jobs that need the service will keep on trundling through the job queue for the other jobs. Gearmand won't block an entire job queue (or worker) on one job type if there are workers available to handle other job types.
The key is to be sensible about how you define your job types and workers.
EDIT--
Ah-ha, I knew my thinking was a little out, (I wrote my gearman system about a year ago and haven't really touched it since). My solution to this type of issue was to have all the workers that normally handle dependent-job unregister their dependent job handling capability with the gearmand server once a failure was detected with the dependent service. (and any workers that are currently trying to complete that job should return a failure.) Once the service is backup - get those same workers to reregister their ability to handle that job. Do note this does require another channel of communications for the workers to be notified of the status of the dependent services.
Hope this helps