数据库队列和队列处理
我目前正在为基于事件的分布式系统构建参考架构,其中事件使用普通旧表(无 SQL Server Service Broker)存储在 SQL Server Azure 数据库中。
事件将使用辅助角色进行处理,辅助角色将轮询队列以获取新的事件消息。
在我的研究中,我看到了许多允许多个处理器处理队列中的消息的解决方案。我看到的许多模式的问题是,当多个进程尝试访问单个消息队列时,管理锁定等会增加复杂性。
据我了解,传统的队列模式是让多个处理器从单个队列中拉取。但是,假设可以按任何顺序处理事件消息,是否有任何理由不只在队列及其队列处理器之间创建一对一的关系以及在不同队列之间进行负载平衡?
队列_1 =>处理器_1
队列_2 => handler_2
此实现避免了管理跨多个处理器对队列的并发访问所需的所有管道。事件发布者可以使用任何负载平衡算法来决定将消息发布到哪个队列。
事实上,我在任何搜索中都没有看到这种实现,这让我觉得我忽略了这个设计中的一个主要缺陷。
编辑
这篇文章引发了关于使用数据库表作为队列与 MSMQ、Azure 队列等的争论。我知道有许多本机队列选项可供我使用,包括 Azure 中的持久消息缓冲区应用程序结构。我评估了我的选项并确定 SQL Azure 表就足够了。我的问题的目的是讨论针对单个队列使用多个处理器与每个队列使用一个处理器。
I am currently in the process of putting together a reference architecture for a distributed event-based system where events are stored in a SQL Server Azure database using plain old tables (no SQL Server Service Broker).
Events will be processed using Worker Roles that will poll the queue for new event messages.
In my research, I see a number of solutions that allow for multiple processors to process messages off of the queue. The problem I have with a lot of the patterns I'm seeing is the added complexity of managing locking, etc when multiple processes are trying to access the single message queue.
I understand that the traditional queue pattern is to have multiple processors pulling from a single queue. However, assuming that event messages can be processed in any order, is there any reason not to just create a one-to-one relationship between a queue and its queue processor and just load-balance between the different queues?
queue_1 => processor_1
queue_2 => processor_2
This implementation avoids all of the plumbing necessary to manage concurrent access to the queue across multiple processors. The event publisher can use any load-balancing algorithm to decide which queue to publish messages to.
The fact that I don't see this sort of implementation in any of my searches makes me think I'm overlooking a major deficit in this design.
Edit
This post has triggered a debate over using database tables as queues vs. MSMQ, Azure Queues, etc. I understand that there are a number of native queuing options available to me, including Durable Message Buffers in Azure AppFabric. I've evaluated my options and determined that SQL Azure tables will be sufficient. The intention of my question was to discuss the use of multiple processors against a single queue vs. one processor per queue.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有关此问题的更详细讨论,请参阅将表用作队列话题。问题不仅在于如何访问“队列”,还在于如何对其建立索引,聚集索引必须允许直接查找下一行以出列,否则您将不断陷入死锁。
如果您希望处理器竞争到同一个队列,那么通过分散到不同队列来实现负载平衡是一种反模式。它会导致车队和人为延迟,其中项目在较晚的处理器后面排队,但其他处理器却空闲且闲置,因为它们的队列是空的。
See Using tables as Queues for a more detailed discussion of this topic. The issue is not only how you access the 'queue', but also how you index it, the clustered index must allows direct seek of the next row to dequeue, otherwise you'll deadlock constantly.
You want your processors to race to the same queue, load balancing by spreading out to different queues is an anti-pattern. It leads to convoys and artificial latency where you have items queued up behind a late processor, but other processors are free and idle because their queue is empty.
将表作为队列非常容易做到。请在此处查看我的答案:SQL Server 进程队列竞争条件
Tables as queues are quite easy to do. See my SO answer here please: SQL Server Process Queue Race Condition
正如 S.Lott 提到的,您可以使用一些消息队列机制。 MSMQ 在 Windows Azure 中并没有真正的帮助,但 Windows Azure 已经有了持久的队列机制。您可以轻松设置每个辅助角色实例来读取一个(或多个)队列项目。读取队列项目后,无论您指定的时间长度(如果未指定时间,则为 30 秒),它都是“不可见的”。队列消息最大可达 8K,并且它们被认为是“持久的”——所有 Azure 存储至少复制 3 次(SQL Azure 也是如此)。
虽然您可以实现类似 gbn 所描述的内容,但我确实认为您在 Windows Azure 中工作时应该考虑本机 Azure 队列服务。您将能够轻松扩展到多个队列使用者,并且不必担心并发或特殊的负载平衡代码 - 只需增加(或减少)实例计数。
有关 Windows Azure 队列的详细信息,请查看 Azure 平台培训套件 - 有几个简单的实验可引导您了解队列基础知识。
As S.Lott mentioned, there are message queue mechanisms you can use. MSMQ won't really help in Windows Azure, but Windows Azure already has a durable queue mechanism. You can easily set up each worker role instance to read one (or more) queue items. Once a queue item is read, it's "invisible" for whatever length of time you specify (or 30 seconds if no time specified). Queue messages can be up to 8K, and they're considered "durable" - all Azure storage is replicated a minimum of 3 times (as is SQL Azure).
While you can implement something like what gbn describes, I really think you should consider the native Azure Queue service when working in Windows Azure. You'll easily be able to scale to multiple queue consumers and won't have to worry about concurrency or special load-balancing code - just increase (or decrease) instance count.
For more info about Windows Azure queues, check out the Azure Platform Training Kit - there are several simple labs that walk you through queue basics.
在我看来,您忽略的一点是,使用队列时,重要的一点是保存订单,并且一旦订单进入队列,无论发生什么,都不会丢失。
现在轮询器进程可能会死掉,它们会遇到很多不同的问题,你不在乎,队列是订单安全的地方。
轮询器不需要相同级别的稳健性。例如,Postfix 是邮件传输程序的一种非常安全的实现,其中消息队列在多个级别中使用(应用程序中的每个子系统都需要不同的安全级别,并通过队列与其他子系统进行通信) - 您可以关掉电源你不会丢失任何邮件,工人可能会死得很惨,但邮件不会。
编辑
这意味着基本用法是存储订单,并忽略工作人员将如何处理订单、还有多少工作人员还活着等等。因此,处理多个队列的唯一原因是管理多个目的地为您的订单(应用程序逻辑)而不是管理工作人员应与他们合作的方式(解耦)。
The point you're missing, to my mind, is that when using queues one of the important point is that orders are saved and whatever happens once it's in the queue it won't be lost.
Now pollers process can die, they wan have a lot of different problems, you don't care, the queue is the place where the orders are safe.
Pollers does'nt require the same level of robustness. Postfix for example is a very secure implementation of mail transporter where message queues are used in a lot of levels (each subsystem in the application which requires a different security level communicate with others with queues) - and you can switch off the power you will not loose any mail, workers can die very badly, mails can't.
Edit
That means the basic usage is storing an order, and ignoring what the workers will do with that, how many workers are still alive, etc. So the only reason to handle several queues is to manage several destinations for your order (application logic) and not to manage the way the workers should work with them (Decoupling).