Java ConcurrentMarkSweep 垃圾收集器未清除所有垃圾

发布于 2024-09-26 12:11:58 字数 3669 浏览 2 评论 0原文

简短形式：CMS 垃圾收集器似乎无法收集不断增加的垃圾量；最终，我们的 JVM 被填满，应用程序变得无响应。通过外部工具（JConsole 或 jmap -histo:live）强制执行 GC 会将其清理一次。

更新：问题似乎与 JConsole 的 JTop 插件有关；如果我们不运行 JConsole，或者在没有 JTop 插件的情况下运行它，该行为就会消失。

（技术说明：我们在 Linux 2.6.9 机器上运行 Sun JDK 1.6.0_07，32 位。升级 JDK 版本并不是真正的选择，除非有不可避免的重大原因。而且，我们的系统不是连接到可访问 Internet 的计算机，因此无法选择 JConsole 的屏幕截图等。）

我们当前正在使用以下标志运行 JVM：

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

观察 JConsole 中的内存图，每大约 15 次就会运行一次完整的 GC在我们的应用程序生命周期的最初几个小时内大约几分钟；每次完整GC之后，仍有越来越多的内存仍在使用中。几个小时后，系统达到稳定状态，旧 CMS 中大约有 2GB 的已用内存。

这听起来像是一个经典的内存泄漏，除了如果我们使用任何强制执行完整 GC 的工具（点击 JConsole 中的“收集垃圾”按钮，或运行 jmap -histo:live 等）， old gen 的使用量突然下降到约 500MB，并且我们的应用程序在接下来的几个小时内再次响应（在此期间，相同的模式继续 - 在每次完整 GC 之后，越来越多的 old gen 已满。）

值得注意的一件事：在 JConsole 中，报告的 ConcurrentMarkSweep GC 计数将保持为 0，直到我们使用 jconsole/jmap/etc 强制执行 GC。

依次使用 jmap -histo 和 jmap -histo:live，我能够确定明显未收集的对象包括：

数百万个 HashMap HashMap$Entry 和数组（比例为 1:1）
数百万个 Vector 和对象数组（比例为 1:1，大约与HashMap）
数百万个 HashSet、Hashtable 和 com.sun.jmx.remote.util.OrderClassLoader，以及 数组>Hashtable$Entry（每个数量大约相同；大约是 HashMap 和 Vector 的一半）

下面是 GC 输出的一些摘录；我对它们的解释似乎是 CMS GC 正在中止，而不会故障转移到 stop-the-world GC。我是否以某种方式误解了这个输出？有什么东西会导致这种情况吗？

在正常运行时，CMS GC 输出块看起来像这样：

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

就是这样；下一行将是下一个 ParNew GC。

当我们使用 jmap -histo:live 强制执行 GC 时，我们会得到：后面

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

跟着大约 125 行以下形式的内容：（一些GenevedMethodAccessor、一些GenerateSerializationConstructorAccessor、一些GeneratedConstructorAccessor 等），

[Unloading class sun.reflect.GeneratedMethodAccessor3]

然后是：

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

提前致谢！

原文

Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or jmap -histo:live) cleans it up once.

UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away.

(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.)

We're currently running our JVM with the following flags:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen.

Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.)

One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc.

Using jmap -histo and jmap -histo:live in sequence, I am able to determine that the apparently uncollected objects consist of:

several million HashMaps and arrays of HashMap$Entry (in a 1:1 ratio)
several million Vectors and Object arrays (1:1 ratio, and about the same as the number of HashMaps)
several million HashSet, Hashtable, and com.sun.jmx.remote.util.OrderClassLoaders, as well as arrays of Hashtable$Entry (about the same number of each; about half as many as the HashMaps and Vectors)

There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that?

During the normal runtime, the CMS GC output blocks look about like this:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

and that's it; the next line will be the next ParNew GC.

When we force a GC using jmap -histo:live, we instead get:

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc)

[Unloading class sun.reflect.GeneratedMethodAccessor3]

followed by:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦途 2024-10-03 12:11:58

com.sun.jmx.remote.util.OrderClassLoader 用于 JMX 的远程层，快速查看代码表明它们是作为 JVM 内部远程请求的解组过程的一部分创建的。这些类加载器的生命周期将与被解组的事物的生命周期直接相关，这样一旦不再有对该事物的任何引用，就可以释放该类加载器。

如果在这种情况下这些实例的存在是您使用 JConsole 检查 JVM 中发生的情况的直接结果，我不会感到惊讶。看起来它们只是作为正常操作的一部分被 GC 清理掉了。

我猜想 JMX 实现中可能存在错误（在相对最新的 JVM 中似乎不太可能），或者您可能有一些自定义 MBean 或正在使用一些导致问题的自定义 JMX 工具。但最终，我怀疑 OrderClassLoader 可能是转移注意力，问题出在其他地方（GC 损坏或其他泄漏）。

回复收藏 0 原文

々眼睛长脚气 2024-10-03 12:11:58

技术说明：我们正在运行 Sun JDK
1.6.0_07，32 位，在 Linux 2.6.9 机器上。升级JDK版本不行
确实是一个选择，除非有
不可避免，主要原因。

几个较新的 Java 版本已对 CMS 垃圾收集器进行了更新。特别是 6u12、6u14 和 6u18。

我不是 GC 方面的专家，但我猜测 6u14 中的预清理修复可能可以解决您遇到的问题。当然，对于 6u18 的类卸载错误，我也可以这么说。就像我说的，我不是 GC 方面的专家。

修复了以下问题：

6u10：（影响 6u4+）当 -XX:+ParallelRefProcEnabled 时，CMS 从不清除所指对象
6u12：CMS：并发预清理期间溢出对象数组的编码不正确
6u12：CMS：使用并行并发标记时溢出处理不正确
6u14：CMS：断言失败“is_cms_thread == Thread::current()->is_ConcurrentGC_thread()”
6u14：CMS：需要 CMSInitiatingPermOccupancyFraction 进行永久存储，与 CMSInitiatingOccupancyFraction 分离
6u14：CMS 断言：_concurrent_iteration_safe_limit 更新丢失
6u14：CMS：预清理参考期间的溢出处理不正确列出
6u14：使用 CMS 和 COOP 运行时的 SIGSEGV 或 (!is_null(v)，“oop 值永远不能为零”) 断言
6u14：CMS：CompactibleFreeListSpace::block_size() 中的活锁。
6u14：使 CMS 与压缩 oops 一起工作
6u18：CMS：使用 -XX:+UseCompressedOops 进行核心转储
6u18：CMS：与类卸载相关的错误
6u18：CMS：ReduceInitialCardMarks 在存在 cms 预清理的情况下不安全
6u18：[回归] -XX：NewRatio使用 -XX:+UseConcMarkSweepGC 会导致致命错误
6u20: 卡标记可以推迟太久

除了上述所有内容之外，6u14 还引入了 G1 垃圾收集器，尽管它仍处于测试阶段。 G1 旨在取代 Java 7 中的 CMS。G1

可以通过以下命令行开关在 Java 6u14 及更高版本中使用：

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

回复收藏 0 原文

只有一腔孤勇 2024-10-03 12:11:58

我会从更简单的事情开始，例如：

-server -Xms3072m -Xmx3072m -XX:+UseParallelOldGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

看看这是否满足您的需求。

I would start with something far simpler, like:

-server -Xms3072m -Xmx3072m -XX:+UseParallelOldGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

And see if this meets your needs.

回复收藏 0 原文

蹲墙角沉默 2024-10-03 12:11:58

看起来您正在构建指向其所有者的对象（ A 指向 B 指向 A ）。这会导致引用计数保持大于零，因此垃圾收集器无法清理它们。当你释放它们时，你需要打破这个循环。取消 A 或 B 中的引用即可解决问题。即使在较大的参考中（A -> B -> C -> D -> A），这也有效。 HashMap 可以使用向量和对象数组。

远程加载程序的存在可能表明无法清理和关闭对通过 JNDI 或其他远程访问方法加载的对象的引用。

编辑：我又看了你的最后一行。您可能想要增加 Perm 分配。

回复收藏 0 原文

~没有更多了~