消息队列与通过 CRON 的数据库表队列
我们很快就会有一个大型项目,其中包含大量媒体处理(图像、视频)以及电子邮件输出等,通常我们会将这些内容放入名为“email_queue”的表中,然后使用 cron 来运行脚本处理表中的队列。
我读了很多关于像 beanstalkd 这样的消息队列系统的文章,甚至还设置了它。它很容易使用,很好用,问题是我不确定我是否遗漏了一些东西。
有人可以详细说明使用队列系统而不是表和 CRON 的好处吗?因为我实在看不出它们是什么。
谢谢
We have a large project coming up soon with quite a lot of media processing (Images, Video) as well email output etc, the sort of stuff normally we'd put into a table called "email_queue" and we use a cron to run a script process the queue in the table.
I have been reading a lot on Message Queue systems like beanstalkd, and have even set it up. It was easy and nice to use, the problem is that I am unsure whether I am missing something.
Could someone detail the benefits of using a queue system rather than a table and a CRON? Since I really can't see to see what they are.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
区别:
一旦消息被放入队列,它就可以立即被传递。因此,如果您的 cron 通常每 5 分钟运行一次,您可以通过队列更快地处理。
如果您的排队系统支持事务,那么如果处理失败,它会自动重新发送消息。
查询队列中的内容可能会比较困难。数据库表有一种很好的搜索方式 (sql)。
如果您有多个服务器/进程/线程处理消息,队列系统将确保消息仅传递给其中之一。使用数据库表,您需要通过应用程序代码(锁定、标志等...)来处理此问题
Differences:
Once a message is put on the queue it can be immediately delivered. So if your cron normally ran every 5 minutes, you could process faster with the queuing.
If your queueing system supports transactions, then it will automatically re-deliver a message if the processing fails.
It can be harder to query what is in your queue. A database table has a nice way to search (sql).
If you have multiple servers/processes/threads handling messages, the queue system will make sure a message is only delivered to one of them. With a DB table you need to handle this via application code (locking, flags, etc ...)
消息队列(至少是分布式消息队列,例如 RabbitMQ)使您能够跨物理节点分发工作。您仍然需要在每个节点上有一个进程来使工作出列并处理它。
我想这最终取决于您的要求。您可以使用消息队列实现更易于管理的大规模解决方案:您可以更轻松地解耦节点。
当然,有一个学习曲线......所以它再次回到你的目标。
请注意,在每个节点上,您仍然可以重复使用 cron/db 表,直到(并且如果)您希望更改实现。 这就是尽可能解耦的好处。
A message queue (a distributed one at least, e.g. RabbitMQ) gives you the ability to distribute work across physical nodes. You still need to have a process on each node to dequeue work and process it.
It gets down ultimately to your requirements I guess. You can achieve a more manageable solution at scale with using message queues: you can decouple your nodes more easily.
Of course, there is a learning curve... so it again comes back to your target goals.
Note that on each node you can still reuse your cron/db table until (and if) you wish to change the implementation. That's what great about decoupling when you can.
首先,队列通常由实际的数据库表支持,并且可以保持消息的持久性。除此之外,队列是一种自然的方式来推迟需要异步完成的工作,如果您从一开始就按照该原则进行设计,那么队列将非常强大。
除了表(实体)具有一组硬列(属性)这一事实之外,由一组记录组成的表以及队列都只不过是东西的列表,您正在使用队列作为-a-table 作为正式队列,只是您定期(cron)轮询它。
MQ 添加了另一个漂亮的功能,不过通常会同步对消息本身的访问(您可能会或可能不会在 SQL 中执行此操作以获取下一个内容)。
我喜欢将 cron/table 机制视为基于 POLL 的机制,将 MQ 视为基于事件的机制。
我认为队列的好处是它负责同步、状态更新。 MQ 可以设置为“广播”(主题)或向一组消费者或侦听器提供消息。
尽管异步,MQ 可能会在您的 cron 窗口之间运行。您如何知道在下一个 cron 作业运行并尝试执行上一个作业之前可以完成表中处理的消息数量?
MQ 的多个使用者允许您根据需要扩展工作。在上面的示例中,如果您发现您的
平均负载
(与操作系统的进程队列中的情况相同)大于您的预期,您可以配置另一个使用者来处理所述负载,使其开启和脱机根据指标要求。MQ 可以设置为具有不同的操作参数,例如消息优先级和性能(一些队列可以保留在内存中,其他队列可以保留在磁盘中)。
缺点是(如前所述)队列有时很难查询并获取指标。我总是找到具有数据库后备存储的 MQ 系统,以便我自己可以使用 SQL 监视队列。
First, queues are often backed by actual DB tables and can maintain message durability. That aside, the queue is a natural way to shove off work that needs to be done asynchronously, which if you design on that principal from the start is very powerful.
Other than the fact that a table (entity) has a set of hard columns (attributes), both this table being composed of a set of records composing as well as a queue are nothing more than lists of stuff You are employing the queue-as-a-table as a formal queue, just that you are polling it on a regular (cron) basis.
MQs add another nifty feature though of generally synchronizing access to the message itself (you may or may not be doing this in your SQL to get the next thing).
I like to consider the cron/table mechanism as POLL-based and the MQ as EVENT-based.
Benefit of a queue in my opinion is that it takes care of the sync'ing, status updating. MQs can be set up to "broadcast" (topic) or make available the message to a group of consumers or listeners.
MQs though asynchronous would likely operate between your cron window. How do you know that the number of messages you process in your table can be accomplished before the next cron job runs and tries to step on the previous job?
Multiple consumers for the MQ allows you to scale the work as you see fit. In the example above if you saw that your
load average
(just the same in the OS' process queue) is greater than you like, you can provision another consumer to handle said load, bringing it on and offline as metrics demand.MQs can be set up to have different operational parameters such as message priority and performance (some queues can remain in memory, others persist to disk).
Downside is that (as already mentioned) that the queue can sometimes be hard to query and for which to obtain metrics. I always find MQ systems that have a DB backing store so that I can myself watch the queue with SQL.
这个问题经常被问到,如果您对数据库感到满意,通常没有令人信服的理由去使用 MQ。 这是一个示例线程。
我的看法是,您可能希望避免学习曲线,除非您的数据要求包括异常高的容量,如果您使用 cron 而不是带有计时器的进程(更不用说带有计时器的多个进程),则不太可能。
This gets asked fairly frequently, and there's usually not a compelling reason to go MQ if you're comfortable with databases. Here's one example thread.
My take is that you might want to avoid the learning curve unless your data requirements include exceptionally high volumes, which is unlikely if you're thing cron rather than a process with a timer (much less multiple processes with timers.)