当前位置：文江博客话题详情

websphere mq 中的队列存储文件系统已满

发布于 2024-11-26 07:44:43 字数 159 浏览 1 评论 0原文

我们遇到过这样的场景：在linux环境下，磁盘空间被空队列占用。

由于文件系统已满，我们的队列管理器意外结束，我们需要清空 q 文件以恢复队列管理器。

但实际上我们队列中根本没有任何消息。这显示了一个特定的队列。

为什么磁盘空间保存在这里？根本原因是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终难遇 2024-12-03 07:44:43

WMQ 不会实时收缩队列文件。例如，队列中有 100 条消息，您使用了第一条消息。然后，WMQ 不会收缩文件并将所有消息上移一位。如果它尝试对每条消息执行此操作，您将永远无法获得当前在产品中看到的吞吐量。

实际发生的情况是，WMQ 将在处理生命周期中的某些点收缩队列文件。队列变空和队列下的文件缩小队列之间存在一些延迟，但这种延迟通常很小，以至于无法察觉。

您所描述的事件理论上可能会在某些非常特定的条件下发生，但这种情况极为罕见。事实上，在我使用 WMQ 的 15 年里，我只见过几个实例，其中收缩队列文件的延迟甚至是明显的。我猜想这里实际发生的事情是你的假设或观察之一是错误的。例如：

队列实际上是空的吗？

在您删除文件之后，队列肯定是空的。在你把文件吹走之前，你怎么知道它是空的？
如果任何队列上有非持久消息，则 QMgr 重新启动后该队列将为空。这是另一种情况，其中队列在 QMgr 重新启动后可能显示为空，但在失败时并非为空。
如果从同步点下的队列中检索消息，则队列深度会递减，但消息在队列文件中仍然处于活动状态。如果队列在单个事务中被清空，它将保留其完整深度，直到发生 COMMIT。这可能会使队列看起来像是空的，但实际上并非空。

它实际上是填满文件系统的队列文件吗？

日志范围可能会填满文件系统，即使是循环日志也是如此。例如，如果次要范围的值较大，则日志文件可能会显着扩展，然后同样快地消失。
FDC 文件可能会填满文件系统，具体取决于分配方式。

甚至是MQ吗？

如果 QMgr 与其他用户或应用程序共享文件系统空间，临时文件可能会填满该空间。

我们经常看到的问题之一是应用程序尝试将超过 5,000 条消息放入队列并收到 QFULL 错误。大多数人做的第一件事就是设置 MAXDEPTH(999999999) 以确保这种情况永远不会再次发生。问题是 QFULL 是一个软错误，应用程序可以从中恢复，但填满文件系统是一个硬错误，可能会导致整个 QMgr 崩溃。设置 MAXDEPTH(999999999) 会将可管理的软错误换成致命错误。 MQ 管理员有责任确保队列上的 MAXDEPTH 和 MAXMSGL 设置为不会填满底层文件系统。在大多数商店中，对所有文件系统都进行了额外的监控，以便在文件系统填满之前发出警报。

所以综上所述，WMQ在大多数情况下在收缩队列文件方面做得非常好。特别是，当队列清空时，这是可以收缩文件的自然同步点，并且这通常发生在队列清空后的几秒钟内。您要么遇到了罕见的竞争情况，其中文件收缩得不够快，要么这里发生了其他事情，在您的初始分析中并不明显。在任何情况下，管理 MAXDEPTH 和 MAXMSGL 以便没有队列可以填满文件系统并编写代码来处理 QFULL 条件。

WMQ does not shrink the queue files in real time. For example, you have 100 messages on a queue and you consume the first one. WMQ does not then shrink the file and move all the messages up by one position. If it tried to do that for each message, you'd never be able to get the throughput that you currently see in the product.

What does occur is that WMQ will shrink the queue files at certain points in the processing lifecycle. There is some latency between a queue becoming empty and the file under it shrinking it but this latency is normally so small as to be unnoticeable.

The event you are describing could in theory occur under some very specific conditions however it would be an extremely rare. In fact in the 15 years I've been working with WMQ I've only ever seen a couple of instances where the latency in shrinking a queue file was even noticeable. I would guess that what is actually going on here is that one of your assumptions or observations is faulty. For example:

Was the queue actually empty?

The queue was most definitely empty after you blew away the file. How do you know it was empty before you blew away the file?
If there were non-persistent messages on any queue, the queue will be empty after the QMgr restarts. This is another case where the queue can appear to be empty after the QMgr is restarted but was not at the time of failure.
If a message is retrieved from a queue under syncpoint, the queue depth decrements but the message is still active in the queue file. If a queue is emptied in a single transaction it retains it's full depth until the COMMIT occurs. This can make it look like the queue is empty when it is not.

Was it actually the queue file that filled up the file system?

Log extents can fill the file system, even with circular logs. For example, with a large value for secondary extents log files can expand significantly and then disappear just as quickly.
FDC files can fill up the file system, depending on how the allocations were made.

Was it even MQ?

If the QMgr shares filesystem space with other users or apps, transient files can fill up the space.

One of the issues that we see very frequently is that an application will try to put more than 5,000 messages on a queue and receive a QFULL error. The very first thing most people then do is set MAXDEPTH(999999999) to make sure this NEVER happens again. The problem with this is that QFULL is a soft error from which an application can recover but filling up the filesystem is a hard error which can bring down the entire QMgr. Setting MAXDEPTH(999999999) trades a manageable soft error for a fatal error. It is the responsibility of the MQ administrator to make sure that MAXDEPTH and MAXMSGL on the queues are set such that the underlying filesystem does not fill. In most shops additional monitoring is in place on all the filesystems to raise alerts well before they fill.

So to sum up, WMQ does a very good job of shrinking queue files in most cases. In particular, when a queue empties this is a natural point of synchronization at which the file can be shrunk and this usually occurs within seconds of the queue emptying. You have either hit a rare race condition in which the file was not shrunk fast enough or there is something else going on here that is not readily apparent in your initial analysis. In any case, manage MAXDEPTH and MAXMSGL such that no queue can fill up the filesystem and write the code to handle QFULL conditions.

回复收藏 0 原文

~没有更多了~