分布式 REST 队列的可用实现
我有以下(我猜是常见的)场景:一个主节点保存要处理的项目列表(url、作业等),以及一组 N 个工作节点。
每个工作线程从队列中弹出一个项目,执行某些操作,然后通知主节点作业已成功完成。
工作人员可以将新项目推送到主节点,以便这些项目排队等待处理。
要求非常简单:
- 没有作业被执行两次,
- 没有作业被两个节点选择(即“pop”是原子的)
- 如果作业失败,在固定的超时之后,该作业再次可用于处理
- 并发工作人员的数量可能大量
- 工作人员可能生活在单个节点
- 主节点上,并且工作人员不假定位于同一网络中
- 弹出和推送必须公开为 REST API(即队列与语言无关)
- 项目必须持久存储在主节点上(即没有内存解决方案)
我是无法找到简单且轻量级的 REST 实现:我查看了 RabbitMQ,< a href="http://celeryproject.org/" rel="nofollow">Celery、Google App Engine 和其他一些不太成熟的项目,但所有这些项目似乎都管理起来相当复杂,对于我的需要来说有点大材小用。
我可能会忽略任何解决方案吗?
I have the following (common, I guess) scenario: a master node holding a list of items (urls, jobs, whatever) to be processed, and a set of N worker nodes.
Each worker pops an item from the queue, does something, then notifies the master node that the job has successfully finished.
A worker may push new items to the master node, for these to be queued for processing.
Requirements are quite simple:
- no job gets executed twice
- no job gets picked by two nodes (ie. "pop" is atomic)
- if a job fails, after a fixed timeout, the job is again available for processing
- the number of concurrent workers is potentially big
- several workers may live on a single node
- master and workers are not assumed to be in the same network
- pop and push must be exposed as a REST API (ie. the queue is language-agnostic)
- items must be stored persistently on the master node (ie. no in-memory solutions)
I am not able to find simple and lightweight REST implementations: I have looked at RabbitMQ, Celery, Google App Engine and a bunch of other less mature projects, but all of them seem quite complex to manage, and a bit like overkill for what I need.
Any solution that I might be overlooking?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Amazon SQS 可能正是您想要的:http://aws.amazon.com/sqs/
Amazon SQS might be what you want: http://aws.amazon.com/sqs/
Jan
我也经历了同样的搜索。我发现 celery 很接近(也看了其他人,比如 octobot)——没有一个看起来像我想要的那么简单,并且缺少一些东西。我发现 celery 相当简单,但是引入了合理数量的依赖项,而我在混合中还没有这些依赖项,所以我选择了一些定制的东西(基于 erlang)
I went through the same search. I found celery was close (also looked at others like octobot) - none seemed as simple as I desired, and were missing a few things. I found celery was fairly simple, however introduces a reasonable amount of dependencies which I didn't already have in the mix, so I went with something bespoke instead (erlang based)