OutOfMemory,但许多对象没有 gcroots
我们正在开发一个相当大的Windows 窗体应用程序。在一些客户的计算机中,它经常因内存不足异常而崩溃。在异常发生后(从 UnhandledException 处理程序调用 clrdump)获得应用程序的完整内存转储后,我使用“.NET Memory Profiler”和 Windbg 对其进行了分析。
内存分析器在活动对象实例中仅显示 130MB。有趣的是,对于许多对象类型来说,它显示了大量无法访问的实例(例如 22000 个无法访问的 Byte[] 实例)。在本机内存统计中,所有数据堆中的总计为 127MB(这没问题),但表明第 2 代堆中无法访问 133MB,大型堆中则有 640MB(不好!)。
在使用 Windbg 分析转储时,确认了上述统计信息:
!dumpheap -stat
..... acceptable object sizes...
79330a00 467216 30638712 System.String
0016d488 4804 221756612 Free
79333470 27089 574278304 System.Byte[]
应用程序在其运行时确实使用了大量的短缓冲区,但没有泄漏它们。使用 !gcroot 测试许多 Byte[] 实例最终没有根。显然,根据内存分析器的指示,大多数阵列都是无法访问的。
为了确保一切正常,!finalizequeue 显示没有对象正在等待最终确定
generation 0 has 138 finalizable objects (18bd1938->18bd1b60)
generation 1 has 182 finalizable objects (18bd1660->18bd1938)
generation 2 has 75372 finalizable objects (18b87cb0->18bd1660)
Ready for finalization 0 objects (18bd1b60->18bd1b60)
,并且还检查本机终结器线程堆栈跟踪,显示它没有被阻塞。
目前我不知道如何诊断 GC 不收集数据的原因(我相信它会很乐意这样做,因为进程内存不足。)
编辑:基于下面的输入我阅读了有关大型对象堆碎片的更多内容,似乎情况可能就是这样。
我已经看到一些建议为此类数据分配更大的内存块(在我的例子中是各种 byte[])并自己管理该区域的内存,但这似乎是一个相当黑客的解决方案,而不是我所期望的解决方案解决不太特殊的桌面应用程序的问题。
碎片问题是由于 LOH 上的对象在存在期间不会重新定位这一事实(至少微软的许多人在博客中是这么说的)引起的,这是可以理解的,但一旦达到一定的内存压力,似乎是合乎逻辑的,例如如果存在 OOM 威胁,则应进行搬迁。
在完全相信碎片是原因之前,唯一让我担心的是 LOH 上的许多对象没有 gcroot 引用 - 这是因为即使对于 LOH 垃圾收集也只是部分执行?
我很乐意向我指出任何有趣的解决方案,因为目前我所知道的唯一解决方案是某些预分配内存块的自定义管理。
欢迎任何想法。 谢谢。
We are developing a rather large Windows Forms application. In several customers' computers it often crashes with OutOfMemory exception. After obtaining full memory dump of the application moments after the exception (clrdump invoked from UnhandledException handler) I analyzed it with ".NET Memory Profiler" and windbg.
The Memory Profiler has shown only 130MB in live object instances. What's interesting is that for many object types is has shown a very large number of unreachable instances (e.g. 22000 unreachable Byte[] instances). In native memory statistics it totals 127MB in all heaps for Data (which is ok), but indicates unreachable 133MB in gen #2 heap and 640MB in large heap(not ok!).
When analyzing the dump with windbg, the above stats are confirmed:
!dumpheap -stat
..... acceptable object sizes...
79330a00 467216 30638712 System.String
0016d488 4804 221756612 Free
79333470 27089 574278304 System.Byte[]
The application does use large number of short buffers through its run time, but does not leak them. Testing many of the Byte[] instances with !gcroot ends up with no roots. Obviously most of those arrays are unreachable as indicated by the memory profiler.
Just to ensure all is fine, !finalizequeue shows no objects are waiting to be finalized
generation 0 has 138 finalizable objects (18bd1938->18bd1b60)
generation 1 has 182 finalizable objects (18bd1660->18bd1938)
generation 2 has 75372 finalizable objects (18b87cb0->18bd1660)
Ready for finalization 0 objects (18bd1b60->18bd1b60)
And also check for native finalizer thread stack trace shows it is not blocked.
At the moment I don't how to diagnose why the GC doesn't collect the data (and I believe it would love to since the process ran out of memory..)
edit: Based in input below I read some more on Large Object Heap fragmentation and it seems that this could be the case.
I have seen some advices to allocate bigger blocks of memory for this kind of data (various byte[] in my case) and manage the memory in this area by myself, but this seems like a rather hackish solution, not the one I would expect to resolve a problem with not-so-special desktop application.
The fragmentation issue is caused by the fact (At least that is what many people from Microsoft state in blogs) that objects on LOH are not relocated during existence, which is understandable, but it seems logical that once some memory pressure is reached, such as a threat of getting OOM, relocation should be performed.
The only thing that worries me before fully trusting that fragmentation is the cause, is that so many object on the LOH are without gcroot references - is this because even for LOH garbage collection is performed only partially?
I'll be happy for pointing me to any interesting solution as at the moment the only one that I know of is custom management of some preallocated memory block.
Any ideas are welcome.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
LOH 容易出现碎片。 本文 提供了分析以及解决该问题的基本方向。
也许您可以发布一些代码来显示这些 byte[] 缓冲区的“典型”用法?
The LOH is subject to fragmentation. This article provides an analysis and the basic directions to work around it.
Maybe you could post some code showing a 'typical' usage of those byte[] buffers?
与往常一样,事情结果并没有什么不同。我们发现了一个用例,应用程序确实消耗了大量内存,最终会出现 OOM。在我们发现这一点之前,我们得到的转储中奇怪的是,有很多没有 gcroot 的对象 - 我不明白为什么不将其释放并用于新的分配?然后我想到当 OOM 发生时可能会发生什么 - 堆栈被展开并且拥有内存的对象不再可访问,然后执行转储。这就是为什么似乎有很多内存可以被 GC 的原因。
我在调试版本中所做的 - 检索真实内存转储状态 - 是创建一个 Threading.Timer 来检查是否可以分配一些相当大的对象 - 如果不能分配已分配,这表明我们已接近 OOM,并且是进行内存转储的好时机。代码如下:
As usually things turned out to be little different. We have found a usecase where the application did consume lots of memory and eventually would go OOM. What was strange in the dumps we got before we found this was that there were lots of objects without gcroot - I didn't understand why wasn't it freed and used for new allocations? Then it came to me that what probably what happened when the OOM occurred - the stack was unwound and the objects that owned the memory were no longer reachable and THEN the dump was performed. So that is why there seemed lots of memory that could be GCed.
What I did in a debug version - to retrieve real state-of-the-memory dump - is to created a Threading.Timer that checks whether some reasonably large object could be allocated - if it can't be allocated, it is an indication that we're near OOM and that its good time to take the memory dump. Code follows:
有时 Image.FromFile("非图像文件") 会抛出 OutOfMemoryException。零字节文件就是这样的文件之一。
Sometimes Image.FromFile("a non-image file") throws OutOfMemoryException. A zero byte file is one such file that will.
如果您认为 LOH 是问题所在,那么在 LOH 分配上设置断点可能会为您指明正确的方向。你可能会这样做
bp mscorwks!gc_heap::allocate_large_object "!clrstack;.echo *********大对象堆的分配************;g"
If you think LOH is the problem then having a break point on LOH allocation could point you in the right direction. You could probably do something like this
bp mscorwks!gc_heap::allocate_large_object "!clrstack;.echo *********Allocation of large object heap***********;g"