使用大对象堆后是否应该立即调用 GC.Collect 以防止碎片

发布于 2024-08-15 04:24:57 字数 1837 浏览 10 评论 0原文

我的应用程序对大型对象进行了大量的二进制序列化和压缩。未压缩的序列化数据集约为 14 MB。压缩后大小约为 1.5 MB。我发现每当我对数据集调用序列化方法时，我的大对象堆性能计数器就会从 1 MB 以下跃升至大约 90 MB。我还知道，在负载相对较重的系统下，通常在运行一段时间（几天）后，此序列化过程发生几次，当调用此序列化方法时，应用程序会抛出内存不足的异常，即使有看来内存充足。我猜碎片是问题所在（虽然我不能说我100%确定，但我非常接近）

最简单的短期修复（我想我正在寻找短期和长期的解决方案）答案）我能想到的是在完成序列化过程后立即调用 GC.Collect 。在我看来，这将从 LOH 中对对象进行垃圾回收，并且很可能会在将其他对象添加到其中之前进行。这将允许其他对象与堆中的剩余对象紧密配合，而不会造成太多碎片。

除了这个可笑的 90MB 分配之外，我认为我没有其他任何使用 LOH 丢失的东西。这种 90 MB 的分配也相对较少（大约每 4 小时一次）。当然，我们仍然会保留 1.5 MB 的数组，也许还有其他一些较小的序列化对象。

有什么想法吗？

由于良好的反应而更新

这是我的代码，可以完成这项工作。我实际上已经尝试更改此设置以压缩 WHILE 序列化，以便序列化同时序列化为流，但我没有得到更好的结果。我还尝试将内存流预分配到 100 MB，并尝试连续两次使用相同的流，无论如何，LOH 都会上升到 180 MB。我正在使用 Process Explorer 来监视它。太疯狂了。我想接下来我要尝试 UnmanagedMemoryStream 的想法。

如果你们不愿意的话，我鼓励你们尝试一下。它不必是这个确切的代码。只需序列化一个大型数据集，您就会得到令人惊讶的结果（我的有很多表，大约 15 个表，还有很多字符串和列）

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

尝试使用 UnmanagementMemoryStream 进行二进制序列化后更新

即使我序列化到 UnmanagementMemoryStream，LOH跳到相同的大小。看来无论我做什么，调用BinaryFormatter来序列化这个大对象都会使用LOH。至于预分配，似乎没有多大帮助。假设我预分配假设我预分配 100MB，然后我序列化，它将使用 170MB。这是代码。比上面的代码还要简单

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

中间的 GC.Collect() 只是更新 LOH 性能计数器。您将看到它将分配正确的 100 MB。但是当您调用序列化时，您会注意到它似乎将其添加到您已经分配的 100 个之上。

原文

My application does a good deal of binary serialization and compression of large objects. Uncompressed the serialized dataset is about 14 MB. Compressed it is arround 1.5 MB. I find that whenever I call the serialize method on my dataset my large object heap performance counter jumps up from under 1 MB to about 90 MB. I also know that under a relatively heavy loaded system, usually after a while of running (days) in which this serialization process happens a few time, the application has been known to throw out of memory excpetions when this serialization method is called even though there seems to be plenty of memory. I'm guessing that fragmentation is the issue (though i can't say i'm 100% sure, i'm pretty close)

The simplest short term fix (i guess i'm looking for both a short term and a long term answer) i can think of is to call GC.Collect right after i'm done the serialization process. This, in my opinion, will garbage collect the object from the LOH and will do so likely BEFORE other objects can be added to it. This will allow other objects to fit tightly tightly against the remaining objects in the heap without causing much fragmentation.

Other than this ridiculous 90MB allocation i don't think i have anything else that uses a lost of the LOH. This 90 MB allocation is also relatively rare (arround every 4 hours). We of course will still have the 1.5 MB array in there and maybe some other smaller serialized objects.

Any ideas?

Update as a result of good responses

Here is my code which does the work. I've actually tried changing this to compress WHILE serializing so that serialization serializes to a stream at the same time and i don't get much better result. I've also tried preallocating the memory stream to 100 MB and trying to use the same stream twice in a row, the LOH goes up to 180 MB anyways. I'm using Process Explorer to monitor it. It's insane. I think i'm going to try the UnmanagedMemoryStream idea next.

I would encourage you guys to try it out if you wont. It doesn't have to be this exact code. Just serialize a large dataset and you will get surprising results (mine has lots of tables, arround 15 and lots of strings and columns)

        byte[] bytes;
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();            
        System.IO.MemoryStream memStream = new System.IO.MemoryStream();
        serializer.Serialize(memStream, obj);
        bytes = CompressionHelper.CompressBytes(memStream.ToArray());
        memStream.Dispose();
        return bytes;

Update after trying binary serialization with UnmanagedMemoryStream

Even if I serialize to an UnmanagedMemoryStream the LOH jumps up to the same size. It seems that no matter what i do, called the BinaryFormatter to serialize this large object will use the LOH. As for pre-allocating, it doesn't seem to help much. Say i pre-allocate say i preallocate 100MB, then i serialize, it will use 170 MB. Here is the code for that. Even simpler than the above code

BinaryFormatter serializer  = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);

The GC.Collect() in the middle there is just to update the LOH performance counter. You will see that it will allocate the correct 100 MB. But then when you call the serialize, you will notice that it seems to add that on top of the 100 that you have already allocated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉月流沐 2024-08-22 04:24:57

请注意集合类和流（例如 MemoryStream）在 .NET 中的工作方式。它们有一个底层缓冲区，一个简单的数组。每当集合或流缓冲区增长超出数组的分配大小时，数组就会被重新分配，现在大小是之前大小的两倍。

这可能会导致 LOH 中存在许多数组副本。您的 14MB 数据集将开始使用 128KB 的 LOH，然后再使用 256KB，然后再使用 512KB，依此类推。最后一个，也就是实际使用的，大约是 16MB。 LOH 包含这些内容的总和，大约 30MB，其中只有一个在实际使用中。

在没有 gen2 集合的情况下执行此操作 3 次，您的 LOH 已增长至 90MB。

通过将缓冲区预分配到预期大小来避免这种情况。 MemoryStream 有一个获取初始容量的构造函数。所有集合类也是如此。在清空所有引用后调用 GC.Collect() 可以帮助疏通 LOH 并清除那些中间缓冲区，但代价是过早堵塞 gen1 和 gen2 堆。

回复收藏 0 原文

谢绝鈎搭 2024-08-22 04:24:57

不幸的是，解决这个问题的唯一方法是将数据分成块，以免在 LOH 上分配大块。这里提出的所有答案都很好，并且预计会起作用，但事实并非如此。 .NET 中的二进制序列化（使用 .NET 2.0 SP2）似乎在幕后发挥了自己的小魔力，阻止用户控制内存分配。

那么这个问题的答案是“这不太可能起作用”。当谈到使用 .NET 序列化时，最好的选择是将大对象序列化为较小的块。对于所有其他场景，上面提到的答案都很好。

回复收藏 0 原文

你的背包 2024-08-22 04:24:57

90MB RAM 并不算多。

除非出现问题，否则请避免调用 GC.Collect。如果您遇到问题，并且没有更好的解决方法，请尝试调用 GC.Collect 并查看您的问题是否得到解决。

回复收藏 0 原文

隔岸观火 2024-08-22 04:24:57

不用担心 LOH 尺寸会突然增大。担心分配/解除分配 LOH。 .Net 对于 LOH 非常愚蠢——它不是将 LOH 对象分配到远离常规堆的地方，而是在下一个可用的 VM 页上进行分配。我有一个 3D 应用程序，它对 LOH 和常规对象进行大量分配/解除分配 - 结果（如 DebugDiag 转储报告中所示）是小堆和大堆的页面最终在整个 RAM 中交替，直到没有大块为止应用程序的数量剩余 2 GB VM 空间。可能的解决方案是分配一次您需要的内容，然后不要释放它——下次重新使用它。

使用 DebugDiag 来分析您的进程。查看虚拟机地址如何逐渐攀升至 2 GB 地址标记。然后做出改变，防止这种情况发生。

回复收藏 0 原文