如何调试 JBoss 或 PostgreSQL 内存不足问题？

发布于 2024-10-10 01:14:59 字数 2839 浏览 0 评论 0原文

我正在尝试调试 JBoss 内存不足问题。当 JBoss 启动并运行一段时间后，它似乎按照启动配置的预期使用内存。然而，当使用 JBoss 提供的唯一 Web 应用程序执行某些未知的用户操作（或日志文件增长到一定大小）时，内存会急剧增加并且 JBoss 会冻结。当 JBoss 死机时，由于内存不足，很难杀死进程或执行任何操作。

当最终通过 -9 参数杀死进程并重新启动服务器时，日志文件非常小，并且只有包含新启动进程的启动输出，但没有任何有关内存增加如此之多的信息。这就是调试如此困难的原因：server.log 没有来自被终止进程的信息。日志设置为增长到 2 GB，新进程的日志文件只有大约 300 Kb，尽管它在正常内存情况下会适当增长。

这是有关 JBoss 配置的信息：
JBoss（MX微内核）4.0.3
JDK 1.6.0更新22
PermSize=512m
MaxPermSize=512m
Xms=1024m
Xmx=6144m

这是系统的基本信息：
操作系统：CentOS Linux 5.5
内核和 CPU：x86_64 上的 Linux 2.6.18-194.26.1.el5
处理器信息：Intel(R) Xeon(R) CPU E5420 @ 2.50GHz，8 核

这是 jboss 服务启动几分钟后正常预冻结条件下系统的很好的示例信息：
正在运行的进程：183
CPU 平均负载：0.16（1 分钟）0.06（5 分钟）0.09（15 分钟）
CPU 使用率：0% 用户、0% 内核、1% IO、99% 空闲
实际内存：总共 17.38 GB，已用 2.46 GB
虚拟内存：总计 19.59 GB，已使用 0 字节
本地磁盘空间：总计 113.37 GB，已使用 11.89 GB

当 JBoss 死机时，系统信息如下所示：
正在运行的进程：225
CPU 平均负载：4.66（1 分钟）1.84（5 分钟）0.93（15 分钟）
CPU 使用率：0% 用户、12% 内核、73% IO、15% 空闲
实际内存：总共 17.38 GB，已用 17.18 GB
虚拟内存：总计 19.59 GB，已使用 706.29 MB
本地磁盘空间：总计 113.37 GB，已使用 11.89 GB

====================================== =======================

下面添加了此问题的更新

非常感谢您的评论。我们正在发布对此问题的更新，这可能会有所帮助。

在另外 3 次出现内存问题时，使用 unix top 实用程序似乎表明 JBoss 进程是消耗所有内存的进程。当问题出现时，它似乎发生得很快。例如，在 JBoss 正常运行一段时间（例如几天）后，在某些时候用户会采取某些神秘的操作，之后内存消耗似乎需要 1-3 分钟才能达到导致性能严重下降的水平再过 5-10 分钟，退化就会变得严重（例如，很难通过 ssh 运行简单的 bash 命令）。当然，这种模式会根据用户在 Web 应用程序上执行的操作而略有不同。

例如，当按内存排序时，有一次 JBoss 进程报告有以下统计信息（请注意，实际内存总计为 17.38 GB，而 JBoss 仅分配了 6 GB 堆）：
VIRT（总虚拟内存）：23.1g
RES（驻留装置大小）：15g
％CPU：111.3％
%MEM: 97.6%

在同一示例中，9 分钟后，JBoss 进程报告具有以下统计信息：
VIRT（总虚拟内存）：39.1g
RES（驻留装置大小）：17g
％CPU：415.6％
%MEM: 98.4%

使用 SIGKILL 信号 (-9) 终止 JBoss 进程后，报告新的 JBoss 进程具有类似于以下内容的统计信息：
VIRT（总虚拟内存）：7147m
RES（驻留集大小）：1.3g
％CPU：11.6％
%MEM: 7.3%

现在我们知道是 JBoss 进程消耗了所有内存，我们想要弄清楚它的去向。我们已经尝试使用诸如 jmap -dump:file=/home/dump.txt 16054 这样的命令来尝试 jmap，但这似乎使服务器的响应速度大大降低，并且一段时间后似乎没有任何反应（例如，提示不返回）。我们的猜测是因为可用内存太少，而堆太大，导致某些东西挂起。

另外，我们在启动 JVM 时设置了 JVM 选项 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps，但当出现内存问题时，似乎没有任何内容写入该路径。

建议使用以下其他选项：
[1] 使用 pmap 生成进程地址空间的列表并查找大块（特别是名称为 [anon] 的大块）
[2] 连续多次向进程发送 SIGQUIT (kill -QUIT) 并查找公共堆栈跟踪
[3]使用jstack通过诸如jstack>之类的命令来获取线程转储tdump.out
[4] 摆弄 JBoss 附带的 JBoss 管理工具/控制台，看看当它开始耗尽内存时，还有哪些类型的对象残留在周围
[5] 探索 Nagios 作为另一种监控解决方案

以下是一些后续问题：
* 从上面的top报告信息来看，对于问题有什么新的见解或者思考吗？
* 对于上述选项 1-5，哪一个最有可能在问题造成的内存极低的情况下工作？
* 对于上述选项 1-5，哪一个最有可能在问题诊断所需的极短时间范围内（例如 1-3 分钟）发挥作用？
* 当特定进程的内存使用达到几个特定的百分比阈值时，是否有一种方法可以自动将时间戳写入文本文件，以便在查看 JBoss 日志文件时可以使用该时间戳？
* 有没有办法在特定进程的内存使用达到几个特定百分比阈值时自动发送带有时间戳的电子邮件，以便我们开始更有针对性的监控？

原文

I am trying to debug a JBoss out of memory problem. When JBoss starts up and runs for a while, it seems to use memory as intended by the startup configuration. However, it seems that when some unknown user action is taken (or the log file grows to a certain size) using the sole web application JBoss is serving up, memory increases dramatically and JBoss freezes. When JBoss freezes, it is difficult to kill the process or do anything because of low memory.

When the process is finally killed via a -9 argument and the server is restarted, the log file is very small and only contains outputs from the startup of the newly started process and not any information on why the memory increased so much. This is why it is so hard to debug: server.log does not have information from the killed process. The log is set to grow to 2 GB and the log file for the new process is only about 300 Kb though it grows properly during normal memory circumstances.

This is information on the JBoss configuration:
JBoss (MX MicroKernel) 4.0.3
JDK 1.6.0 update 22
PermSize=512m
MaxPermSize=512m
Xms=1024m
Xmx=6144m

This is basic info on the system:
Operating system: CentOS Linux 5.5
Kernel and CPU: Linux 2.6.18-194.26.1.el5 on x86_64
Processor information: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, 8 cores

This is good example information on the system during normal pre-freeze conditions a few minutes after the jboss service startup:
Running processes: 183
CPU load averages: 0.16 (1 min) 0.06 (5 mins) 0.09 (15 mins)
CPU usage: 0% user, 0% kernel, 1% IO, 99% idle
Real memory: 17.38 GB total, 2.46 GB used
Virtual memory: 19.59 GB total, 0 bytes used
Local disk space: 113.37 GB total, 11.89 GB used

When JBoss freezes, system information looks like this:
Running processes: 225
CPU load averages: 4.66 (1 min) 1.84 (5 mins) 0.93 (15 mins)
CPU usage: 0% user, 12% kernel, 73% IO, 15% idle
Real memory: 17.38 GB total, 17.18 GB used
Virtual memory: 19.59 GB total, 706.29 MB used
Local disk space: 113.37 GB total, 11.89 GB used

===========================================================

UPDATE TO THIS QUESTION IS ADDED BELOW

Thank you very much for your comments. We are posting an update to this question that will likely be helpful.

On 3 more occurrences of the memory issue, using the unix top utility seems to indicate that the JBoss process is the process consuming all the memory. When the problem occurs, it seems to happen very quickly. For example, after JBoss in running fine for a while (ex. several days), at some point users take certain mysterious actions after which it seems to take 1-3 minutes for memory consumption to ramp up to a level that causes major performance degradation and another 5-10 minutes for that degradation to become severe (ex. difficult to run simple bash commands through ssh). Of course, this pattern varies a bit depending on what users are doing on the web application.

For example, when sorting by memory, on one occurrence the JBoss process is reported to have the following statistics (note that the real memory is 17.38 GB total and JBoss is only given a 6 GB heap):
VIRT (total virtual memory): 23.1g
RES (resident set size): 15g
%CPU: 111.3%
%MEM: 97.6%

In that same example, 9 minutes later the JBoss process is reported to have the following statistics:
VIRT (total virtual memory): 39.1g
RES (resident set size): 17g
%CPU: 415.6%
%MEM: 98.4%

After killing the JBoss process with a SIGKILL signal (-9), the new JBoss process is reported to have the statistics similar to the following:
VIRT (total virtual memory): 7147m
RES (resident set size): 1.3g
%CPU: 11.6%
%MEM: 7.3%

Now that we know it is the JBoss process that is consuming all the memory, we want to figure out where it is going. We have tried jmap with a command such as jmap -dump:file=/home/dump.txt 16054 however this seems to make the server much less responsive and after some time nothing seems to happen (ex. prompt does not return). Our guess is because so little memory is available and the heap is so large something hangs.

Also, we set the JVM options -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps when starting the JVM but nothing seems to be written to the path when the memory problem occurs.

These other options have been suggested:
[1] use pmap to produce a listing of the process address space and look for large chunks (particularly large chunks that have the name [anon])
[2] send SIGQUIT (kill -QUIT) to the process several times in succession and look for common stack traces
[3] use jstack to get a thread dump with a command such as jstack > tdump.out
[4] mess around with the JBoss Management Tools / Console that's included with JBoss and see what kind of objects are left hanging around as the thing starts to eat up memory
[5] explore Nagios as another monitoring solution

Here are some follow-up questions:
* From the above top report information, are there any new insights or thoughts on the problem?
* For the above options 1-5, which are the most likely to work under the extremely low memory circumstances that the problem creates?
* For the above options 1-5, which are the most likely to work under the very short time frame that the problem allows for diagnosis (ex. 1-3 minutes)?
* Is there a way to automatically write to a text file a time stamp when the memory use of a specific process reaches several specific percentage thresholds so this time stamp can be used when looking through he JBoss log files?
* Is there a way to automatically send an email with a time stamp when the memory use of a specific process reaches several specific percentage thresholds so this can be used for us to begin more focused monitoring?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

踏雪无痕 2024-10-17 01:14:59

我之前已经通过这个基本过程解决了这些类型的问题：

启动 JVM 时设置 JVM 选项 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps 。
运行应用程序，等待失败（或者如果可以的话导致失败），收集转储（.hprof 文件）
在 Eclipse 内存分析器 (MAT)，它有一个不错的“泄漏嫌疑报告”
该报告希望能说明类似“XYZ 类的 82,302 个实例占用 74% 的堆空间”之类的内容，然后您可以检查其中一些对象如果您需要更多信息。

希望这至少足以为您指明找到泄漏点的正确方向。

调试愉快！

回复收藏 0 原文

爱你是孤单的心事 2024-10-17 01:14:59

这对于诊断来说是不够的信息。

但让我们从我们拥有的开始。我不知道你用什么来显示内存统计信息，但它显示你的整体系统内存消耗已经增加了 15 GB。考虑到您只给 JBoss 提供了 6 GB 堆，这很奇怪。

因此，首先要做的是验证 JBoss 是否是实际问题所在。最简单的方法是使用 top，按总虚拟内存 (VIRT) 或驻留集大小 (RES) 排序。要更改排序字段，请键入大写“F”，然后在随后的屏幕中选择字段。

如果是 JBoss 进程消耗了所有内存，那么您需要弄清楚它的去向。可能性包括大型内存映射 JAR 文件、通过 Java 分配的堆外缓冲区以及从本机模块分配的内存。由于您将从顶部获得进程 ID，因此请使用 pmap 生成进程地址空间的列表并查找大块（特别是名称为 [anon]）。

如果不清楚内存分配的位置，您可以随时向进程发送 SIGQUIT (kill -QUIT)，这会将线程转储写入 stderr（这将要么转到控制台，要么（希望）转到日志文件）。连续执行几次此操作，并查找常见的堆栈跟踪。

根据您的更新（显示 JBoss 进程的虚拟大小不断增长），我认为检查 Java 堆是浪费时间。虽然我认为 JVM 可能会忽略 -Xmx 选项，但这种可能性极小。

这意味着增长发生在非堆内存中。一些可能性：

使用直接的ByteBuffer。如果您使用缓冲区来缓存数据库中的结果，则很可能您分配了太多缓冲区。这可以通过 pmap 进行诊断，寻找大的 [anon] 块。
不受控制的线程创建。每个线程都需要一定量的空间用于其线程堆栈。我不认为这是问题，因为每个线程的空间量很小（iirc，不到 1 MB）；你必须创建数以万计的它们。您可以使用 pmap 来诊断此问题，查找小 [anon] 块，或者通过向 JVM 发送 SIGQUIT 来诊断。
在 C 堆上分配大量内存的本机代码。您可以使用 pmap 进行诊断，但第一步是检查您的依赖项以查看是否存在本机库。如果有，请使用 gdb 或等效工具进行调试。

最后的评论是：我建议不要问什么在低内存条件下可能有效，而是尝试这些选项，看看什么有效，什么无效。

This is not enough information for a diagnosis.

But let's start with what we have. I don't know what you're using to show memory statistics, but it shows that your overall system memory consumption has jumped 15 GB. Which is strange, considering you've only given JBoss a 6 GB heap.

So the first thing to do is verify that JBoss is the actual problem. Easiest way to do this is with top, sorting either by total virtual memory (VIRT) or resident set size (RES). To change the sort field, type a capital "F" and then select the field in the screen that follows.

If it is the JBoss process that's consuming all that memory, then you need to figure out where it's going. Possibilities include large memory-mapped JARfiles, off-heap buffers allocated via Java, and memory allocated from a native module. Since you'll have the process ID from top, use pmap to produce a listing of the process address space and look for large chunks (particularly large chunks that have the name [anon]).

If it's not clear where the memory is being allocated, you can always send SIGQUIT (kill -QUIT) to the process, which will write a thread dump to stderr (which will either go to the console or -- hopefully -- to a logfile). Do this several times in succession, and look for common stack traces.

Based on your updates, which show the virtual size growing for the JBoss process, I think that examining the Java heap is a waste of time. While I suppose it's possible that the JVM is ignoring the -Xmx option, it's extremely unlikely.

So that means the growth is happening in non-heap memory. Some possibilities:

Use of direct ByteBuffers. If you're using buffers to cache results from the database, then it's very possible that you're allocating too many buffers. This would be diagnosed via pmap, looking for large [anon] blocks.
Uncontrolled thread creation. Each thread requires some amount of space for its thread stack. I wouldn't expect this to be the problem, because the amount of per-thread space is tiny (iirc, under 1 MB); you'd have to be creating tens of thousands of them. You can diagnose this with pmap, looking for small [anon] blocks, or by sending SIGQUIT to the JVM.
Native code that's allocating lots of memory on the C heap. You can probably diagnose with pmap, but a first step is to check your dependencies to see if there's a native library. And if there is, debug with gdb or equivalent.

As a final comment: rather than ask what is likely to work under low-memory conditions, I recommend just trying the options and seeing what does and doesn't work.

回复收藏 0 原文