使用大对象堆后是否应该立即调用 GC.Collect 以防止碎片

发布于 2024-08-15 04:24:57 字数 1837 浏览 10 评论 0原文

我的应用程序对大型对象进行了大量的二进制序列化和压缩。未压缩的序列化数据集约为 14 MB。压缩后大小约为 1.5 MB。我发现每当我对数据集调用序列化方法时,我的大对象堆性能计数器就会从 1 MB 以下跃升至大约 90 MB。我还知道,在负载相对较重的系统下,通常在运行一段时间(几天)后,此序列化过程发生几次,当调用此序列化方法时,应用程序会抛出内存不足的异常,即使有看来内存充足。我猜碎片是问题所在(虽然我不能说我100%确定,但我非常接近)

最简单的短期修复(我想我正在寻找短期和长期的解决方案)答案)我能想到的是在完成序列化过程后立即调用 GC.Collect 。在我看来,这将从 LOH 中对对象进行垃圾回收,并且很可能会在将其他对象添加到其中之前进行。这将允许其他对象与堆中的剩余对象紧密配合,而不会造成太多碎片。

除了这个可笑的 90MB 分配之外,我认为我没有其他任何使用 LOH 丢失的东西。这种 90 MB 的分配也相对较少(大约每 4 小时一次)。当然,我们仍然会保留 1.5 MB 的数组,也许还有其他一些较小的序列化对象。

有什么想法吗?

由于良好的反应而更新

这是我的代码,可以完成这项工作。我实际上已经尝试更改此设置以压缩 WHILE 序列化,以便序列化同时序列化为流,但我没有得到更好的结果。我还尝试将内存流预分配到 100 MB,并尝试连续两次使用相同的流,无论如何,LOH 都会上升到 180 MB。我正在使用 Process Explorer 来监视它。太疯狂了。我想接下来我要尝试 UnmanagedMemoryStream 的想法。

如果你们不愿意的话,我鼓励你们尝试一下。它不必是这个确切的代码。只需序列化一个大型数据集,您就会得到令人惊讶的结果(我的有很多表,大约 15 个表,还有很多字符串和列)

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

尝试使用 UnmanagementMemoryStream 进行二进制序列化后更新

即使我序列化到 UnmanagementMemoryStream,LOH跳到相同的大小。看来无论我做什么,调用BinaryFormatter来序列化这个大对象都会使用LOH。至于预分配,似乎没有多大帮助。假设我预分配假设我预分配 100MB,然后我序列化,它将使用 170MB。这是代码。比上面的代码还要简单

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

中间的 GC.Collect() 只是更新 LOH 性能计数器。您将看到它将分配正确的 100 MB。但是当您调用序列化时,您会注意到它似乎将其添加到您已经分配的 100 个之上。

My application does a good deal of binary serialization and compression of large objects. Uncompressed the serialized dataset is about 14 MB. Compressed it is arround 1.5 MB. I find that whenever I call the serialize method on my dataset my large object heap performance counter jumps up from under 1 MB to about 90 MB. I also know that under a relatively heavy loaded system, usually after a while of running (days) in which this serialization process happens a few time, the application has been known to throw out of memory excpetions when this serialization method is called even though there seems to be plenty of memory. I'm guessing that fragmentation is the issue (though i can't say i'm 100% sure, i'm pretty close)

The simplest short term fix (i guess i'm looking for both a short term and a long term answer) i can think of is to call GC.Collect right after i'm done the serialization process. This, in my opinion, will garbage collect the object from the LOH and will do so likely BEFORE other objects can be added to it. This will allow other objects to fit tightly tightly against the remaining objects in the heap without causing much fragmentation.

Other than this ridiculous 90MB allocation i don't think i have anything else that uses a lost of the LOH. This 90 MB allocation is also relatively rare (arround every 4 hours). We of course will still have the 1.5 MB array in there and maybe some other smaller serialized objects.

Any ideas?

Update as a result of good responses

Here is my code which does the work. I've actually tried changing this to compress WHILE serializing so that serialization serializes to a stream at the same time and i don't get much better result. I've also tried preallocating the memory stream to 100 MB and trying to use the same stream twice in a row, the LOH goes up to 180 MB anyways. I'm using Process Explorer to monitor it. It's insane. I think i'm going to try the UnmanagedMemoryStream idea next.

I would encourage you guys to try it out if you wont. It doesn't have to be this exact code. Just serialize a large dataset and you will get surprising results (mine has lots of tables, arround 15 and lots of strings and columns)

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

Update after trying binary serialization with UnmanagedMemoryStream

Even if I serialize to an UnmanagedMemoryStream the LOH jumps up to the same size. It seems that no matter what i do, called the BinaryFormatter to serialize this large object will use the LOH. As for pre-allocating, it doesn't seem to help much. Say i pre-allocate say i preallocate 100MB, then i serialize, it will use 170 MB. Here is the code for that. Even simpler than the above code

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

The GC.Collect() in the middle there is just to update the LOH performance counter. You will see that it will allocate the correct 100 MB. But then when you call the serialize, you will notice that it seems to add that on top of the 100 that you have already allocated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

凉月流沐 2024-08-22 04:24:57

请注意集合类和流(例如 MemoryStream)在 .NET 中的工作方式。它们有一个底层缓冲区,一个简单的数组。每当集合或流缓冲区增长超出数组的分配大小时,数组就会被重新分配,现在大小是之前大小的两倍。

这可能会导致 LOH 中存在许多数组副本。您的 14MB 数据集将开始使用 128KB 的 LOH,然后再使用 256KB,然后再使用 512KB,依此类推。最后一个,也就是实际使用的,大约是 16MB。 LOH 包含这些内容的总和,大约 30MB,其中只有一个在实际使用中。

在没有 gen2 集合的情况下执行此操作 3 次,您的 LOH 已增长至 90MB。

通过将缓冲区预分配到预期大小来避免这种情况。 MemoryStream 有一个获取初始容量的构造函数。所有集合类也是如此。在清空所有引用后调用 GC.Collect() 可以帮助疏通 LOH 并清除那些中间缓冲区,但代价是过早堵塞 gen1 和 gen2 堆。

Beware of the way collection classes and streams like MemoryStream work in .NET. They have an underlying buffer, a simple array. Whenever the collection or stream buffer grows beyond the allocated size of the array, the array gets re-allocated, now at double the previous size.

This can cause many copies of the array in the LOH. Your 14MB dataset will start using the LOH at 128KB, then take another 256KB, then another 512KB, etcetera. The last one, the one actually used, will be around 16MB. The LOH contains the sum of these, around 30MB, only one of which is in actual use.

Do this three times without a gen2 collection and your LOH has grown to 90MB.

Avoid this by pre-allocating the buffer to the expected size. MemoryStream has a constructor that takes an initial capacity. So do all collection classes. Calling GC.Collect() after you've nulled all references can help unclog the LOH and purge those intermediate buffers, at the cost of clogging the gen1 and gen2 heaps too soon.

谢绝鈎搭 2024-08-22 04:24:57

不幸的是,解决这个问题的唯一方法是将数据分成块,以免在 LOH 上分配大块。这里提出的所有答案都很好,并且预计会起作用,但事实并非如此。 .NET 中的二进制序列化(使用 .NET 2.0 SP2)似乎在幕后发挥了自己的小魔力,阻止用户控制内存分配。

那么这个问题的答案是“这不太可能起作用”。当谈到使用 .NET 序列化时,最好的选择是将大对象序列化为较小的块。对于所有其他场景,上面提到的答案都很好。

Unfortunately, the only way I could fix this was to break up the data in chunks so as not to allocate large chunks on the LOH. All the proposed answers here were good and were expected to work but they did not. It seems that the binary serialization in .NET (using .NET 2.0 SP2) does its own little magic under the hood which prevents users from having control over memory allocation.

Answer then to the question would be "this is not likely to work". When it comes to using .NET serialization, your best bet is to serialize the large objects in smaller chunks. For all other scenarios, the answers mentioned above are great.

你的背包 2024-08-22 04:24:57

90MB RAM 并不算多。

除非出现问题,否则请避免调用 GC.Collect。如果您遇到问题,并且没有更好的解决方法,请尝试调用 GC.Collect 并查看您的问题是否得到解决。

90MB of RAM is not much.

Avoid calling GC.Collect unless you have a problem. If you have a problem, and no better fix, try calling GC.Collect and seeing if your problem is solved.

隔岸观火 2024-08-22 04:24:57

不用担心 LOH 尺寸会突然增大。担心分配/解除分配 LOH。 .Net 对于 LOH 非常愚蠢——它不是将 LOH 对象分配到远离常规堆的地方,而是在下一个可用的 VM 页上进行分配。我有一个 3D 应用程序,它对 LOH 和常规对象进行大量分配/解除分配 - 结果(如 DebugDiag 转储报告中所示)是小堆和大堆的页面最终在整个 RAM 中交替,直到没有大块为止应用程序的数量 剩余 2 GB VM 空间。可能的解决方案是分配一次您需要的内容,然后不要释放它——下次重新使用它。

使用 DebugDiag 来分析您的进程。查看虚拟机地址如何逐渐攀升至 2 GB 地址标记。然后做出改变,防止这种情况发生。

Don't worry about LOH size jumping up. Worry about allocating/deallocating LOH. .Net very dumb about LOH -- rather than allocating LOH objects far away from regular heap, it allocates at next available VM page. I have a 3D app that does much allocate/deallocate of both LOH and regular objects -- the result (as seen in DebugDiag dump report) is that pages of small heap and large heap end up alternating throughout RAM, until there are no large chunks of the applications 2 GB VM space left. The solution when possible is to allocate once what you need, and then don't release it -- re-use it next time.

Use DebugDiag to analyze your process. See how the VM addresses gradually creep up towards 2 GB address mark. Then make a change that keeps that from happening.

漫雪独思 2024-08-22 04:24:57

我同意这里其他一些发帖者的观点,即您可能想要尝试使用技巧来使用 .NET Framework,而不是尝试强制它通过 GC.Collect 与您一起工作。

您可能会发现这个第 9 频道视频很有帮助,其中讨论了减轻垃圾收集器压力的方法。

I agree with some of the other posters here that you might want to try and use tricks to work with the .NET Framework instead of trying to force it to work with you via GC.Collect.

You may find this Channel 9 video helpful which discusses ways to ease pressure on the Garbage collector.

愿与i 2024-08-22 04:24:57

如果您确实需要将 LOH 用于服务或需要长时间运行的内容,则需要使用永远不会释放的缓冲池,并且理想情况下可以在启动时进行分配。当然,这意味着您必须为此自行进行“内存管理”。

根据您使用此内存执行的操作,您可能还必须 p/Invoke 到所选部分的本机代码,以避免必须调用某些 .NET API,这些 API 会强制您将数据放在 LOH 中新分配的空间上。

这是关于这些问题的一个很好的起点文章:https:// devblogs.microsoft.com/dotnet/using-gc-efficiently-part-3/

如果你的 GC 技巧能起作用,我会认为你很幸运,而且只有在没有发生太多事情的情况下它才会起作用同时在系统中。如果您有并行的工作,这只会稍微延迟不可避免的事情。

另请阅读有关 GC.Collect.IIRC 的文档,GC.Collect(n) 仅表示它收集的数据不会超过第 n 代,而不是说它实际上曾经到达第 n 代。

If you really need to use the LOH for something like a service or something that needs to be running for a long time, you need to use buffer pools that are never deallocated and that you can ideally allocate on start-up. This means you'll have to do your 'memory management' yourself for this, of course.

Depending on what you're doing with this memory, you might also have to p/Invoke over to native code for selected parts to avoid having to call some .NET API that forces you to put the data on newly allocated space in the LOH.

This is a good starting point article about the issues: https://devblogs.microsoft.com/dotnet/using-gc-efficiently-part-3/

I'd consider you very lucky if you GC trick would work, and it would really only work if there isn't much going on at the same time in the system. If you have work going on in parallel, this will just slightly delay the unevitable.

Also read up on the documentation about GC.Collect.IIRC, GC.Collect(n) only says that it collects no further than the generation n -- not that it actually ever GETS to generation n.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文