如何在RabbitMQ服务器上设置超时检测?

发布于 2024-08-03 03:27:08 字数 893 浏览 10 评论 0原文

我正在尝试 RabbitMQ这个 python 绑定。

我注意到的一件事是,如果我不干净地杀死一个消费者(模拟崩溃的程序),服务器会认为这个消费者仍然存在很长一段时间。这样做的结果是所有其他消息都将被忽略。

例如,如果您终止消费者 1 次并重新连接,则 1/2 消息将被忽略。如果你杀死另一个消费者,那么 2/3 的消息将被忽略。如果你杀死了第 3 个,那么 3/4 的消息将被忽略,依此类推。

我尝试过打开致谢功能,但似乎没有帮助。我找到的唯一解决方案是手动停止服务器并重置它。

有更好的办法吗?

如何重新创建此场景

  • 运行rabbitmq。

  • 解压此库。 p>

  • 此处下载使用者和发布者。 运行 amqp_consumer.py 两次。运行 amqp_publisher.py,输入一些数据并观察它是否按预期工作。消息以循环方式接收。

  • 使用kill -9 或任务管理器终止使用者进程之一。

  • 现在,当您发布消息时,50% 的消息将会丢失。

I am trying out RabbitMQ with this python binding.

One thing I noticed is that if I kill a consumer uncleanly (emulating a crashed program), the server will think that this consumer is still there for a long time. The result of this is that every other message will be ignored.

For example if you kill a consumer 1 time and reconnect, then 1/2 messages will be ignored. If you kill another consumer, then 2/3 messages will be ignored. If you kill a 3rd, then 3/4 messages will be ignored and so on.

I've tried turning on acknowledgments, but it doesn't seem to be helping. The only solution I have found is to manually stop the server and reset it.

Is there a better way?

How to recreate this scenario

  • Run rabbitmq.

  • Unarchive this library.

  • Download the consumer and publisher here.
    Run amqp_consumer.py twice. Run amqp_publisher.py, feeding in some data and observe that it works as expected. Messages are received round robin style.

  • Kill one of the consumer processes with kill -9 or task manager.

  • Now when you publish a message, 50% of the messages will be lost.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

枉心 2024-08-10 03:27:08

我在 tarball 中没有看到 amqp_consumer.pyamqp_ Producer.py,因此重现错误很棘手。

每当操作系统告知套接字已关闭时,RabbitMQ 都会终止连接,释放未确认的消息以重新传递给其他客户端。您的症状非常奇怪,即使是 kill -9 也应该会导致 TCP 套接字被正确清理。

有些人注意到,在 AMQP 客户端和服务器之间使用防火墙或 NAT 设备运行时,套接字的生存时间比应有的时间长。这可能是这里的问题,还是您在本地主机上运行所有内容?另外,您在什么操作系统上运行系统的各个组件?

预计到达时间:根据您下面的评论,我猜测当您在 Linux 上运行服务器时,您可能会在 Windows 上运行客户端。如果是这种情况,则可能是 Windows TCP 驱动程序未正确关闭套接字,这与 Unix 上的kill-9 行为不同。 (在 Unix 上,内核将正确关闭任何被杀死进程上的 TCP 连接。)

如果是这种情况,那么坏消息是 RabbitMQ 只能在套接字关闭时释放资源,因此如果客户端操作系统不这样做,它无能为力。这与几乎所有其他基于 TCP 的服务相同。

不过,好消息是,AMQP 在网络结构不可信的情况下支持“心跳”选项。您可以尝试启用心跳。启用它们后,如果服务器在可配置的时间间隔内未收到任何流量,则它会判定连接必须已断开。

然而,坏消息是,我认为 py-amqplib 目前不支持心跳。不过值得一试!

I don't see amqp_consumer.py or amqp_producer.py in the tarball, so reproducing the fault is tricky.

RabbitMQ terminates connections, releasing their unacknowledged messages for redelivery to other clients, whenever it is told by the operating system that a socket has closed. Your symptoms are very strange, in that even a kill -9 ought to cause the TCP socket to be cleaned up properly.

Some people have noticed problems with sockets surviving longer than they should when running with a firewall or NAT device between the AMQP clients and the server. Could that be an issue here, or are you running everything on localhost? Also, what operating system are you running the various components of the system on?

ETA: From your comment below, I am guessing that while you are running the server on Linux, you may be running the clients on Windows. If this is the case, then it could be that the Windows TCP driver is not closing the sockets correctly, which is different from the kill-9 behaviour on Unix. (On Unix, the kernel will properly close the TCP connections on any killed process.)

If that's the case, then the bad news is that RabbitMQ can only release resources when the socket is closed, so if the client operating system doesn't do that, there's nothing it can do. This is the same as almost every other TCP-based service out there.

The good news, though, is that AMQP supports a "heartbeat" option for exactly these cases, where the networking fabric is untrustworthy. You could try enabling heartbeats. When they're enabled, if the server doesn't receive any traffic within a configurable interval, it decides that the connection must be dead.

The bad news, however, is that I don't think py-amqplib supports heartbeats at the moment. Worth a try, though!

鸩远一方 2024-08-10 03:27:08

RabbitMQ 在客户端确认消息已处理时没有超时:请参阅 这篇文章(整个线程可能会引起兴趣)。帖子中的一些要点:

订阅的 AMQP 确认模型
和“拉”是相同的。在两者中
情况下消息保留在
服务器但其他服务器不可用
消费者,直到它已经
ack'ed(并被删除),nack'ed
(使用 basic.reject;虽然 RabbitMQ
没有实现)或
通道/连接关闭(此时
点消息变为可用
其他消费者)。

和(我的重点)

等待没有超时
确认。通常这不是问题
由于失踪的情况很常见
ack - 网络或客户端故障 -
将导致连接得到
掉落
(从而触发
上述行为)。仍然,
超时可能有用,例如
处理活着但没有反应的情况
消费者。这已经出现在
之前讨论。有没有具体的
您想到的用例
需要这样的功能吗?


这个问题很可能会发生,因为在客户端拉模型中,服务器更难检测到断开的连接(与活动但无响应的消费者相反),特别是当服务器似乎乐于永远等待确认时。

更新:在 Linux 上,您可以附加 SIGTERM 和/或 SIGKILL 和/或 SIGINT 的信号处理程序,并希望以有序的方式从客户端关闭连接。在 Windows 上,我相信从任务管理器关闭会调用 Win32 TerminateProcess API,MSDN 对此表示:

如果进程被终止
TerminateProcess,所有线程
进程立即终止
没有机会运行额外的代码。
这意味着该线程不
在终止处理程序中执行代码
块。另外,没有附加的DLL
被告知该过程是
分离。

这意味着可能很难有序地终止和关闭。

可能值得在 RabbitMQ 列表上使用您自己的用例来寻求确认超时。

RabbitMQ doesn't have a timeout on acknowledgements from the client that a message has been processed: see this post (the whole thread might be of interest). Some salient points from the post:

The AMQP ack model for subscriptions
and "pull" are identical. In both
cases the message is kept on the
server but is unavailable to other
consumers until it either has been
ack'ed (and gets removed), nack'ed
(with basic.reject; though RabbitMQ
does not implement that) or the
channel/connection is closed (at which
point the message becomes available
to other consumers).

and (my emphases)

There is no timeout on waiting for
acks. Usually that is not a problem
since the common cases of a missing
ack - network or client failure -
will result in the connection getting
dropped
(and thus trigger the
behaviour described above). Still,
a timeout could be useful to, say,
deal with alive but unresponsive
consumers
. That has come up in
discussion before. Is there a specific
use case you have in mind that
requires such functionality?

The problem might well be occurring because in a client pull model, it's harder for the server to detect a broken connection (as opposed to an alive but unresponsive consumer), particularly as the server seems happy to wait forever for an ack.

Update: On Linux, you can attach signal handlers for SIGTERM and/or SIGKILL and/or SIGINT and hopefully close down the connection in an orderly way from the client. On Windows, I believe closing from Task Manager invokes the Win32 TerminateProcess API, about which MSDN says:

If a process is terminated by
TerminateProcess, all threads of the
process are terminated immediately
with no chance to run additional code.
This means that the thread does not
execute code in termination handler
blocks. In addition, no attached DLLs
are notified that the process is
detaching.

This means it might be difficult to catch termination and close down in an orderly way.

It might be worth pursuing on the RabbitMQ list with your own use case for an ack timeout.

真心难拥有 2024-08-10 03:27:08

请提供有关您已声明的组件的更多详细信息。 属性的队列

  • 具有独占和
  • 自动删除

通常(并且独立于客户端实现)一旦声明客户端和代理之间的连接中断, 就应该被删除。不过,这对共享队列没有帮助。请详细说明您到底想要建模什么。

Please provide a few more specifics regarding the components you've declared. Usually (and independent of the the client implementation) a queue with the properties

  • exclusive and
  • auto-delete

should get removed as soon as the connection between the declaring client and the broker breaks up. This won't help you with shared queues, though. Please detail a bit what exactly you are trying to model.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文