如何用 Celery 保证消息传递?
我有一个 python 应用程序,我想开始在后台做更多工作,以便它在变得更忙时能够更好地扩展。过去我使用 Celery 来执行正常的后台任务,效果很好。
该应用程序与我过去所做的其他应用程序之间的唯一区别是,我需要保证这些消息得到处理,并且不会丢失。
对于这个应用程序,我不太关心消息队列的速度,我首先需要可靠性和耐用性。为了安全起见,我希望拥有两台队列服务器,它们都位于不同的数据中心,以防出现问题,其中一台是另一台的备份。
看看 Celery,它看起来支持一堆不同的后端,其中一些比其他后端具有更多功能。两个最流行的看起来像是 redis 和 RabbitMQ,所以我花了一些时间进一步研究它们。
RabbitMQ: 支持持久队列和集群,但目前集群方式的问题是,如果您丢失集群中的一个节点,则该节点中的所有消息都将不可用,直到您将该节点重新联机为止。它不会在集群中的不同节点之间复制消息,它只是复制有关消息的元数据,然后它返回到原始节点来获取消息,如果该节点没有运行,那么您是 SOL 不理想的。
他们建议解决这个问题的方法是设置第二台服务器并使用 DRBD 复制文件系统,然后运行诸如pacemaker之类的东西在需要时将客户端切换到备份服务器。看起来很复杂,不知道是否有更好的方法。有人知道更好的方法吗?
Redis: 支持读取从属设备,这将允许我在紧急情况下进行备份,但它不支持主-主设置,并且我不确定它是否可以处理主从设备之间的主动故障转移。它没有与 RabbitMQ 相同的功能,但看起来更容易设置和维护。
问题:
设置 celery 的最佳方法是什么 这样才能保证消息 处理。
以前有人这样做过吗?如果是这样, 介意分享一下你做了什么吗?
I have a python application where I want to start doing more work in the background so that it will scale better as it gets busier. In the past I have used Celery for doing normal background tasks, and this has worked well.
The only difference between this application and the others I have done in the past is that I need to guarantee that these messages are processed, they can't be lost.
For this application I'm not too concerned about speed for my message queue, I need reliability and durability first and formost. To be safe I want to have two queue servers, both in different data centers in case something goes wrong, one a backup of the other.
Looking at Celery it looks like it supports a bunch of different backends, some with more features then the others. The two most popular look like redis and RabbitMQ so I took some time to examine them further.
RabbitMQ:
Supports durable queues and clustering, but the problem with the way they have clustering today is that if you lose a node in the cluster, all messages in that node are unavailable until you bring that node back online. It doesn't replicated the messages between the different nodes in the cluster, it just replicates the metadata about the message, and then it goes back to the originating node to get the message, if the node isn't running, you are S.O.L. Not ideal.
The way they recommend to get around this is to setup a second server and replicate the file system using DRBD, and then running something like pacemaker to switch the clients to the backup server when it needs too. This seems pretty complicated, not sure if there is a better way. Anyone know of a better way?
Redis:
Supports a read slave and this would allow me to have a backup in case of emergencies but it doesn't support master-master setup, and I'm not sure if it handles active failover between master and slave. It doesn't have the same features as RabbitMQ, but looks much easier to setup and maintain.
Questions:
What is the best way to setup celery
so that it will guarantee message
processing.Has anyone done this before? If so,
would be mind sharing what you did?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
自 OP 以来发生了很多变化!现在有一个高可用性选项,也称为“镜像”队列。这对于解决您所描述的问题已经大有帮助。请参阅http://www.rabbitmq.com/ha.html。
A lot has changed since the OP! There is now an option for high-availability aka "mirrored" queues. This goes pretty far toward solving the problem you described. See http://www.rabbitmq.com/ha.html.
您可能想查看 IronMQ,它满足您的要求(耐用、高可用性等)并且是云原生的解决方案因此零维护。还有一个 Celery 代理:https://github.com/iron-io/iron_celery所以你只需更改 Celery 配置即可开始使用它。
You might want to check out IronMQ, it covers your requirements (durable, highly available, etc) and is a cloud native solution so zero maintenance. And there's a Celery broker for it: https://github.com/iron-io/iron_celery so you can start using it just by changing your Celery config.
我怀疑将 Celery 绑定到现有后端对于您所需的可靠性保证来说是错误的解决方案。
考虑到您想要一个具有强大耐用性和可靠性保证的分布式排队系统,我首先会寻找这样一个系统(它们确实存在),然后找出在 Python 中绑定到它的最佳方法。这可能是通过 Celery &一个新的后端,或者不是。
I suspect that Celery bound to existing backends is the wrong solution for the reliability guarantees you need.
Given that you want a distributed queueing system with strong durability and reliability guarantees, I'd start by looking for such a system (they do exist) and then figuring out the best way to bind to it in Python. That may be via Celery & a new backend, or not.
我已使用 Amazon SQS 来实现此建议并取得了良好的结果。您将收到消息,直到您将其从队列中删除为止,并且它允许将您的应用程序增长到您需要的高度。
I've used Amazon SQS for this propose and got good results. You will recieve message until you will delete it from queue and it allows to grow you app as high as you will need.
使用分布式渲染系统是一种选择吗?通常为 HPC 保留,但很多概念是相同的。查看 Qube 或 Deadline Render。还有其他开源解决方案。鉴于某些渲染的高度复杂性和失败风险,每个图像序列帧可能需要数小时的时间,所有这些都考虑到了故障转移。
Is using a distributed rendering system an option? Normally reserved for HPC but alot of concepts are the same. Check out Qube or Deadline Render. There are other, open source solutions as well. All have failover in mind given the high degree of complexity and risk of failure in some renders that can take hours per image sequence frame.