Java ConcurrentMarkSweep 垃圾收集器未清除所有垃圾

发布于 2024-09-26 12:11:58 字数 3669 浏览 2 评论 0原文

简短形式:CMS 垃圾收集器似乎无法收集不断增加的垃圾量;最终,我们的 JVM 被填满,应用程序变得无响应。通过外部工具(JConsole 或 jmap -histo:live)强制执行 GC 会将其清理一次。

更新:问题似乎与 JConsole 的 JTop 插件有关;如果我们不运行 JConsole,或者在没有 JTop 插件的情况下运行它,该行为就会消失。

(技术说明:我们在 Linux 2.6.9 机器上运行 Sun JDK 1.6.0_07,32 位。升级 JDK 版本并不是真正的选择,除非有不可避免的重大原因。而且,我们的系统不是连接到可访问 Internet 的计算机,因此无法选择 JConsole 的屏幕截图等。)

我们当前正在使用以下标志运行 JVM:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

观察 JConsole 中的内存图,每大约 15 次就会运行一次完整的 GC在我们的应用程序生命周期的最初几个小时内大约几分钟;每次完整GC之后,仍有越来越多的内存仍在使用中。几个小时后,系统达到稳定状态,旧 CMS 中大约有 2GB 的已用内存。

这听起来像是一个经典的内存泄漏,除了如果我们使用任何强制执行完整 GC 的工具(点击 JConsole 中的“收集垃圾”按钮,或运行 jmap -histo:live 等), old gen 的使用量突然下降到约 500MB,并且我们的应用程序在接下来的几个小时内再次响应(在此期间,相同的模式继续 - 在每次完整 GC 之后,越来越多的 old gen 已满。)

值得注意的一件事:在 JConsole 中,报告的 ConcurrentMarkSweep GC 计数将保持为 0,直到我们使用 jconsole/jmap/etc 强制执行 GC。

依次使用 jmap -histojmap -histo:live,我能够确定明显未收集的对象包括:

  • 数百万个 HashMap HashMap$Entry 和数组(比例为 1:1)
  • 数百万个 Vector 和对象数组(比例为 1:1,大约与HashMap)
  • 数百万个 HashSetHashtablecom.sun.jmx.remote.util.OrderClassLoader,以及 数组>Hashtable$Entry(每个数量大约相同;大约是 HashMap 和 Vector 的一半)

下面是 GC 输出的一些摘录;我对它们的解释似乎是 CMS GC 正在中止,而不会故障转移到 stop-the-world GC。我是否以某种方式误解了这个输出?有什么东西会导致这种情况吗?

在正常运行时,CMS GC 输出块看起来像这样:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

就是这样;下一行将是下一个 ParNew GC。

当我们使用 jmap -histo:live 强制执行 GC 时,我们会得到: 后面

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

跟着大约 125 行以下形式的内容:(一些GenevedMethodAccessor、一些GenerateSerializationConstructorAccessor、一些GeneratedConstructorAccessor 等),

[Unloading class sun.reflect.GeneratedMethodAccessor3]

然后是:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

提前致谢!

Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or jmap -histo:live) cleans it up once.

UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away.

(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.)

We're currently running our JVM with the following flags:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen.

Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.)

One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc.

Using jmap -histo and jmap -histo:live in sequence, I am able to determine that the apparently uncollected objects consist of:

  • several million HashMaps and arrays of HashMap$Entry (in a 1:1 ratio)
  • several million Vectors and Object arrays (1:1 ratio, and about the same as the number of HashMaps)
  • several million HashSet, Hashtable, and com.sun.jmx.remote.util.OrderClassLoaders, as well as arrays of Hashtable$Entry (about the same number of each; about half as many as the HashMaps and Vectors)

There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that?

During the normal runtime, the CMS GC output blocks look about like this:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

and that's it; the next line will be the next ParNew GC.

When we force a GC using jmap -histo:live, we instead get:

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc)

[Unloading class sun.reflect.GeneratedMethodAccessor3]

followed by:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦途 2024-10-03 12:11:58

com.sun.jmx.remote.util.OrderClassLoader 用于 JMX 的远程层,快速查看代码表明它们是作为 JVM 内部远程请求的解组过程的一部分创建的。这些类加载器的生命周期将与被解组的事物的生命周期直接相关,这样一旦不再有对该事物的任何引用,就可以释放该类加载器。

如果在这种情况下这些实例的存在是您使用 JConsole 检查 JVM 中发生的情况的直接结果,我不会感到惊讶。看起来它们只是作为正常操作的一部分被 GC 清理掉了。

我猜想 JMX 实现中可能存在错误(在相对最新的 JVM 中似乎不太可能),或者您可能有一些自定义 MBean 或正在使用一些导致问题的自定义 JMX 工具。但最终,我怀疑 OrderClassLoader 可能是转移注意力,问题出在其他地方(GC 损坏或其他泄漏)。

com.sun.jmx.remote.util.OrderClassLoader is used in the remote'ing layer for JMX and a quick review of the code suggests they're created as part of the unmarshalling process for remote requests inside of the JVM. The lifetime of those classloaders will be directly related to the lifetime of the thing that was unmarshalled such that once there are no longer any references to that thing the classloader could be released.

I wouldn't be surprised if in this case the presence of these instances was a direct result of you using JConsole to examine the goings on in the JVM. And it looks like they'd just be cleaned up by GC as part of normal operation.

I guess it's possible there's a bug in the JMX implementation (seems unlikely in a relatively up-to-date JVM) or perhaps you have some custom MBeans or are using some custom JMX tools that are causing the problem. But ultimately, I'm suspecting the OrderClassLoader is probably a red-herring and the issue lies elsewhere (broken GC or some other leak).

々眼睛长脚气 2024-10-03 12:11:58

技术说明:我们正在运行 Sun JDK
1.6.0_07,32 位,在 Linux 2.6.9 机器上。升级JDK版本不行
确实是一个选择,除非有
不可避免,主要原因。

几个较新的 Java 版本已对 CMS 垃圾收集器进行了更新。特别是 6u12、6u14 和 6u18。

我不是 GC 方面的专家,但我猜测 6u14 中的预清理修复可能可以解决您遇到的问题。当然,对于 6u18 的类卸载错误,我也可以这么说。就像我说的,我不是 GC 方面的专家。

修复了以下问题:

  • 6u10:(影响 6u4+)当 -XX:+ParallelRefProcEnabled 时,CMS 从不清除所指对象
  • 6u12:CMS:并发预清理期间溢出对象数组的编码不正确
  • 6u12:CMS:使用并行并发标记时溢出处理不正确
  • 6u14:CMS:断言失败“is_cms_thread == Thread::current()->is_ConcurrentGC_thread()”
  • 6u14:CMS:需要 CMSInitiatingPermOccupancyFraction 进行永久存储,与 CMSInitiatingOccupancyFraction 分离
  • 6u14:CMS 断言:_concurrent_iteration_safe_limit 更新丢失
  • 6u14:CMS:预清理参考期间的溢出处理不正确列出
  • 6u14:使用 CMS 和 COOP 运行时的 SIGSEGV 或 (!is_null(v),“oop 值永远不能为零”) 断言
  • 6u14:CMS:CompactibleFreeListSpace::block_size() 中的活锁。
  • 6u14:使 CMS 与压缩 oops 一起工作
  • 6u18:CMS:使用 -XX:+UseCompressedOops 进行核心转储
  • 6u18:CMS:与类卸载相关的错误
  • 6u18:CMS:ReduceInitialCardMarks 在存在 cms 预清理的情况下不安全
  • 6u18:[回归] -XX:NewRatio使用 -XX:+UseConcMarkSweepGC 会导致致命错误
  • 6u20: 卡标记可以推迟太久

除了上述所有内容之外,6u14 还引入了 G1 垃圾收集器,尽管它仍处于测试阶段。 G1 旨在取代 Java 7 中的 CMS。G1

可以通过以下命令行开关在 Java 6u14 及更高版本中使用:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

Technical notes: we're running Sun JDK
1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not
really an option, unless there's an
unavoidable, major reason.

Several newer Java versions have had updates to the CMS garbage collector. Notably 6u12, 6u14, and 6u18.

I'm not an expert with GC stuff, but I'm guessing the preclean fixes in 6u14 may fix the issue you're seeing. Of course, I could say the same thing about 6u18's class unloading bugs. Like I said, I'm not an expert at GC stuff.

There are fixes for:

  • 6u10: (affects 6u4+) CMS never clears referents when -XX:+ParallelRefProcEnabled
  • 6u12: CMS: Incorrect encoding of overflown object arrays during concurrent precleaning
  • 6u12: CMS: Incorrect overflow handling when using parallel concurrent marking
  • 6u14: CMS: assertion failure "is_cms_thread == Thread::current()->is_ConcurrentGC_thread()"
  • 6u14: CMS: Need CMSInitiatingPermOccupancyFraction for perm, divorcing from CMSInitiatingOccupancyFraction
  • 6u14: CMS assert: _concurrent_iteration_safe_limit update missed
  • 6u14: CMS: Incorrect overflow handling during precleaning of Reference lists
  • 6u14: SIGSEGV or (!is_null(v),"oop value can never be zero") assertion when running with CMS and COOPs
  • 6u14: CMS: Livelock in CompactibleFreeListSpace::block_size().
  • 6u14: Make CMS work with compressed oops
  • 6u18: CMS: core dump with -XX:+UseCompressedOops
  • 6u18: CMS: bugs related to class unloading
  • 6u18: CMS: ReduceInitialCardMarks unsafe in the presence of cms precleaning
  • 6u18: [Regression] -XX:NewRatio with -XX:+UseConcMarkSweepGC causes fatal error
  • 6u20: card marks can be deferred too long

In addition to all of the above, 6u14 also introduced the G1 garbage collector, although it is still in testing. G1 is intended to replace CMS in Java 7.

G1 can be used in Java 6u14 and newer with the following command-line switches:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

只有一腔孤勇 2024-10-03 12:11:58

我会从更简单的事情开始,例如:

-server -Xms3072m -Xmx3072m -XX:+UseParallelOldGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 

看看这是否满足您的需求。

I would start with something far simpler, like:

-server -Xms3072m -Xmx3072m -XX:+UseParallelOldGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 

And see if this meets your needs.

蹲墙角沉默 2024-10-03 12:11:58

看起来您正在构建指向其所有者的对象( A 指向 B 指向 A )。这会导致引用计数保持大于零,因此垃圾收集器无法清理它们。当你释放它们时,你需要打破这个循环。取消 A 或 B 中的引用即可解决问题。即使在较大的参考中(A -> B -> C -> D -> A),这也有效。 HashMap 可以使用向量和对象数组。

远程加载程序的存在可能表明无法清理和关闭对通过 JNDI 或其他远程访问方法加载的对象的引用。

编辑:我又看了你的最后一行。您可能想要增加 Perm 分配。

It looks like you are building objects which point back to their owners ( A points to B points to A ). This results in the reference counts remaining greater than zero, so the garbage collector can't clean them up. You need to break the cycle when you release them. Nullifying the reference in either A or B will solve the problem. This works even in larger reference looks like ( A -> B -> C -> D -> A). Vectors and object arrays may be used by your HashMaps.

The presense of the remote loaders may indicate a failure to cleanup and close references to objects loaded via JNDI or other remote access method.

EDIT: I took a second look at your last line. You may want to increase the perm allocation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文