JVM Tenured/Old gen 已达到限制服务器挂起

发布于 2024-11-06 07:28:41 字数 731 浏览 3 评论 0原文

我们的应用程序需要非常大的内存，因为它处理非常大的数据。因此，我们将最大堆大小增加到 12GB (-Xmx)。

以下是环境详细信息

OS - Linux 2.6.18-164.11.1.el5    
JBoss - 5.0.0.GA
VM Version - 16.0-b13 Sun JVM
JDK - 1.6.0_18

我们上面有 env &我们的 QA & 中的配置产品。在 QA 中，我们将最大 PS Old Gen（堆内存）分配为 8.67GB，而在 Prod 中仅为 8GB。

在 Prod 中，针对特定作业，旧代堆达到 8GB，挂在那里，并且 Web URL 变得无法访问。服务器正在下降。但在 QA 中，它也达到了 8.67GB，但执行了完整 GC，然后又回到了 6.5GB 左右。这里它没有被绞死。

我们无法找到解决方案，因为两个盒子上的环境和配置都是相同的。

我这里有3个问题，

最大堆的 2/3 将分配给老一代/终身教授如果是这样的话为什么一处是8GB，一处是8.67GB 在另一个地方？
如何为新产品提供有效的比率和本例中的任期（12GB）？
为什么它在一个地方被完全GCed并且不在另一个？

任何帮助都将非常感激。谢谢。

如果您需要有关 env 或 conf 的更多详细信息，请告诉我。

原文

Our application requires very huge memory since it deals with very large data. Hence we increased our max heap size to 12GB (-Xmx).

Following are the environment details

OS - Linux 2.6.18-164.11.1.el5    
JBoss - 5.0.0.GA
VM Version - 16.0-b13 Sun JVM
JDK - 1.6.0_18

We have above env & configuration in our QA & prod.
In QA we have max PS Old Gen (Heap memory) allocated as 8.67GB whereas in Prod it is just 8GB.

In Prod for a particular job Old Gen Heap reaches 8GB, hangs there and the web URL become inaccessible. Server is getting down.
But in QA also it reaches 8.67GB but full GC is performed and its coming back to 6.5GB or something. Here its not getting hanged.

We couldn't figure out a solution for this because both the environment and configuration on both the boxes are same.

I have 3 questions here,

2/3rd of max heap will be allocated to
old/tenured gen. If that is the case
why it is 8GB in one place and 8.67GB
in another place?
How to provide a valid ratio for New
and Tenure in this case(12GB)?
Why it is full GCed in one place and
not in the other?

Any help would be really appreciable. Thanks.

Pls let me know if you need further details on env or conf.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱本泡沫多脆弱 2024-11-13 07:28:41

对于您的具体问题：

新生代和老年代之间的默认比率可能取决于系统以及 JVM 确定的最佳比率。
使用-XX:NewRatio=3指定新生代和老年代之间的特定比例。
如果你的 JVM 挂起并且堆已满，它可能会陷入持续的 GC 中。

听起来你需要更多的内存来进行生产。如果 QA 请求完成，那么也许额外的 0.67GB 就足够了。但这似乎并没有给你留下太多的空间。您是否在 QA 上运行与产品上相同的测试？

由于您使用的是 12GB，因此您必须使用 64 位。您可以使用 -XX:+UseCompressedOops 选项节省 64 位寻址的内存开销。它通常可以节省 40% 的内存，因此您的 12GB 可以使用更多。

根据您正在执行的操作，并发收集器可能也更好，特别是在减少长 GC 暂停时间方面。我建议尝试这些选项，因为我发现它们效果很好：

-Xmx12g -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68

For your specific questions:

The default ratio between new and old generations can depend on the system and what the JVM determines will be best.
To specify a specific ratio between new and old generations with -XX:NewRatio=3.
If your JVM is hanging and the heap is full it's probably stuck doing constant GC's.

It sounds like you need more memory for prod. If on QA the request finishes then perhaps that extra 0.67GB is all that it needs. That doesn't seem to leave you much headroom though. Are you running the same test on QA as will happen on prod?

Since you're using 12GB you must be using 64-bit. You can save the memory overhead of 64-bit addressing by using the -XX:+UseCompressedOops option. It typically saves 40% memory, so your 12GB will go a lot further.

Depending on what you're doing the concurrent collector might be better as well, particularly to reduce long GC pause times. I'd recommend trying these options as I've found them to work well:

-Xmx12g -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68

回复收藏 0 原文

双马尾 2024-11-13 07:28:41

您需要获取更多数据才能了解发生了什么，只有这样您才会知道需要修复什么。在我看来，这意味着

获取有关垃圾收集器正在做什么的详细信息，这些参数是一个好的开始（用一些首选路径和文件代替 gc.log）
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -verbose:gc
重复运行，扫描 gc 日志以查找挂起和停止的时间段。使用该输出发回
考虑使用 VisualGc 观看输出（需要在服务器上运行 jstatd，解释如何进行此设置的一个随机链接是这个），它是 jvmstat，这是查看堆中各代大小如何调整的简单方法（尽管可能不会持续 6 小时！）

我还强烈建议您也进行一些阅读，以便您知道所有这些开关所指的是什么，否则您将盲目地尝试使用没有真正理解为什么一件事有帮助而另一件事没有帮助。我将从 oracle java 6 gc 调整页面开始，您可以找到此处

我只建议您在获得基准性能后更改选项。话虽如此，CompressedOops 很可能是一个轻松的胜利，但您可能需要注意它自 6u23 以来已默认为打开状态。

最后你应该考虑升级 jvm，6u18 已经有点进步了，性能也在不断提高。

每项作业需要 3 个小时才能完成，几乎有 6 个作业相继运行。最后一个作业运行时达到最大 8GB 并在产品中挂起

这些作业是否相关？如果他们不在同一数据集上工作，这听起来确实像是逐渐的内存泄漏。如果堆使用率不断上升并最终崩溃，那么就会出现内存泄漏。您应该考虑使用 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/some/dir 来捕获堆转储（尽管请注意，对于 13G 堆，它将是一个大文件，因此请确保你有磁盘空间）如果/当它爆炸时。然后，您可以使用 jhat 来查看什么当时在堆上。

you need to get some more data in order to know what is going on, only then will you know what needs to be fixed. To my mind that means

get detailed information about what the garbage collector is doing, these params are a good start (substitute some preferred path and file in place of gc.log)
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -verbose:gc
repeat the run, scan through the gc log for the period when it is hanging & post back with that output
consider watching the output using visualgc (requires jstatd running on the server, one random link that explains how to do this setup is this one) which is part of jvmstat, this is a v easy way to see how the various generations in the heap are sized (though perhaps not for 6hrs!)

I also strongly recommend you do some reading too so you know what all these switches are referring to otherwise you'll be blindly trying stuff with no real understanding of why 1 thing helps and another doesn't. I'd start with the oracle java 6 gc tuning page which you can find here

I'd only suggest changing options once you have baselined performance. Having said that CompressedOops is v likely to be an easy win, you may want to note it has been defaulted to on since 6u23.

Finally you should consider upgrading the jvm, 6u18 is getting on a bit and performance keeps improving.

each job will take 3 hours to complete and almost 6 jobs running one after another. Last job when running reaches 8GB max and getting hang in prod

are these jobs related at all? this really sounds like a gradual memory leak if they're not working on the same dataset. If heap usage keeps going up and up and eventually blows then you have a memory leak. You should consider using -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/some/dir to catch a heap dump (though note with a 13G heap it will be a big file so make sure you have the disk space) if/when it blows. You can then use jhat to look at what was on the heap at the time.

回复收藏 0 原文

~没有更多了~