NServiceBus MSMQ 消息间歇性地卡在传出队列上
我们有一个基于 NServiceBus 的 Pub / Sub 系统,在该系统中,我们会遇到间歇性问题,即消息无限期地卡在发布者传出队列上,而不是传输到订阅者输入队列。
需要注意的是:
- 当我们重新启动发布者服务和订阅者服务时,消息流会恢复正常一段时间。
- 如果消息之间持续存在一段时间,则该问题似乎会更频繁地发生。
- 发布者服务位于 LAN 上,订阅者位于防火墙的另一侧。
- 有些消息可以通过!正如所提到的,服务重新启动后,一段时间内一切都会正常。
- 使用 QueueExplorer,我可以看到传出队列上的消息处于等待状态。
令人烦恼的是,我们的开发环境没有表现出这种行为,但发布者和订阅者都驻留在该环境中的同一 LAN 上。
We have a Pub / Sub system based on NServiceBus, where we have intermittent issues with messages getting stuck on the Publishers outgoing queue indefinitely, rather than being transmitted to the Subscribers input queues.
Points to note:
- When we restart the Publisher Service and Subscriber services, message flow resumes normally for a while.
- The problem seems to occur more often if a sustained period of time between messages occurs.
- The publisher service resides on the LAN, the subscribers on the otherside of a firewall.
- Some messages get through! As mentioned after service restarts, things go fine for a while.
- Using QueueExplorer, I can see the messages on the Outgoing queue have a state of WAITING.
Annoyingly our development environment does not exhibit this behaviour, but then again the publisher and subscribers all reside on the same LAN in this environment.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
MSMQ 消息滞留在传出队列中纯粹是 MSMQ 问题。
重新启动发布者和订阅者服务应该没有什么区别,因为它们不直接参与消息传递。如果您可以通过仅重新启动发布/订阅服务而不是消息队列服务来解决问题,那么它看起来像是资源/内存泄漏问题。
我想象会发生这样的事情:
偶尔有消息在足够的情况下通过内核内存由使用它的众多服务和设备驱动程序之一临时释放。
这篇博文的第 4 条是最有可能的罪魁祸首:
http:// /blogs.msdn.com/b/johnbreakwell/archive/2006/09/18/insufficient-resources-run-away-run-away.aspx
干杯
约翰·布瑞克威尔
MSMQ messages being stuck in an outgoing queue is purely an MSMQ issue.
Restarting the Publisher and Subscriber services should make no difference as they are not directly involved in message delivery. If you can fix the problem by ONLY restarting the Pub/Sub services and NOT the Message Queuing services then it looks like a resources/memory leak problem.
I imagine something like this happening:
Occasional messages get through when just enough kernel memory is temporarily freed up by one of the many services and device drivers that use it.
Item 4 of this blog post is the most likely culprit:
http://blogs.msdn.com/b/johnbreakwell/archive/2006/09/18/insufficient-resources-run-away-run-away.aspx
Cheers
John Breakwell
我们在生产中遇到了类似的情况,结果是我们将订阅者端点之一迁移到新的物理主机,并在关闭旧端点之前忘记取消订阅。我们的发布者试图将消息传递到新旧端点,但只能到达新端点。最终,发布者出站队列变得如此之大,以至于开始影响所有传出消息。
We had a similar scenario in production, it turned out we migrated one of our subscriber endpoints to a new physical host and forgot to unsubscribe before shutting down the old endpoint. Our publisher was trying to deliver messages to both the old and new endpoints but could only reach the new one. Eventually the publishers outbound queue grew so large that it started affecting all outgoing messages.
我也遇到了这个问题,我知道它不是第 4 项,因为在它卡在传出队列中之前我没有向它发送任何内容。如果我在发送消息之前让发布者和订阅者都等待大约 10 分钟,则消息永远不会离开传出队列。如果我在该时间之前发送消息,则消息传输正常。另外,如果我重新启动订阅者,消息就会流动。每次我让它们闲置 10 分钟时,都会出现这种情况。
我想我在这里找到了答案,至少这解决了我遇到的问题:
http://support.microsoft .com/kb/2554746
另外,就我而言,它与重新启动无关,所以不要让这让您失望,我确实在 netstat 中表现出了症状,并且消息最初会在客户端时通过是第一次启动的。
I have run into this issue as well, I know it is not Item 4, as I don't send anything to it before it gets stuck in the outgoing queue. If I let both publisher and subscriber sit for about 10 minutes before sending a message, it never leaves the outgoing queue. If I send a message before that amount of time, it flows fine. Also, if I restart the subscriber the message will then flow. This is reproducible every time I let them sit idle for 10 minutes.
I think I found the answer here, at least this fixed the issue I was having:
http://support.microsoft.com/kb/2554746
Also, in my case it had nothing to do with restarting, so don't let that throw you off, I did exhibit the symptoms in the netstat and messages would initially go through when the client was first started up.
只是抛出我的 2p:
我们遇到了一个问题,消息队列服务存在某种内存泄漏,并且会消耗大量未释放的内存。
这会导致消息长时间卡住 - 尽管它们最终会被传递(有时在 3 年后)天)。
我们还没有费心修复这个问题,因为它只发生在服务负载较重的情况下,而这种情况并不经常发生。
Just to throw my 2p in:
We had an issue where the message queuing service had some kind of memory leak and would consume large amounts of memory which is did not release.
This lead to messages getting stuck for long periods of time - although they would eventually be delivered (sometimes after 3 days).
We have not bothered fixing this yet as it only happens when the service is under heavy load which does not happen often.