当前位置：文江博客话题详情

收藏与记忆

发布于 2024-10-25 18:23:24 字数 304 浏览 1 评论 0原文

我有一个应用程序，可以读取 3-4 GB 的数据，从每一行中构建实体，然后将它们存储在列表中。

我遇到的问题是，内存疯狂增长，变成 13 到 15 GB。为什么存储这些实体需要这么多内存。

因此，我构建了一棵树并做了类似于霍夫曼编码的操作，总体内存大小变为大约 200 - 300 MB。

我明白，我压缩了数据。但我没想到在列表中存储对象会增加这么多内存。为什么会发生这种事？

其他数据结构如字典、堆栈、队列、数组等怎么样？

在哪里可以找到有关数据结构内部和内存分配的更多信息？

或者我做错了什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生生漫 2024-11-01 18:23:24

在 .NET 中，大对象位于未压缩的大对象堆上。大是指超过 85,000 字节的所有内容。当您增加列表时，它们可能会变得更大，并且一旦超过当前容量就必须重新分配。重定位意味着它们很可能被放在堆的末尾。因此，您最终会得到非常碎片化的 LOH 和大量内存使用。

更新：如果您使用所需的容量初始化列表（我猜您可以从数据库中确定），那么您的内存消耗应该会稍微下降。

回复收藏 0 原文

南风起 2024-11-01 18:23:24

无论您要使用哪种数据结构，您的内存消耗永远不会低于存储所有数据所需的内存。

你计算过存储一个实例类对象需要多少内存吗？

您的霍夫曼编码是一种节省空间的优化，这意味着您自己消除了类对象中的大量重复数据。这与您用来保存数据的数据结构无关。这取决于您的数据本身的结构，以便您可以利用不同的节省空间的策略（其中霍夫曼编码是多种可能性中的一种，适合消除常见的前缀，并且用于存储它的数据结构是树）。

现在，回到你的问题。在不优化数据（即对象）的情况下，您可以注意一些事情来提高内存使用效率。

我们所有的物体大小都相似吗？

您是否只是简单地运行一个循环，动态分配内存，然后将它们插入到列表中，如下所示：

foreach (var obj in collection) { myList.Add(new myObject(obj)); }

在这种情况下，您的列表对象会不断扩展。如果末尾没有足够的可用内存来扩展列表，.NET 将分配一块新的、更大的内存，并将原始数组复制到新内存。本质上，您最终会得到两块内存——原始的内存和新的扩展内存（现在保存列表）。执行此操作很多很多次（因为您显然需要处理 GB 的数据），并且您正在查看大量碎片内存空间。

一次性为整个列表分配足够的内存会更好。

作为后注，我忍不住想知道：您将如何在这个巨大的列表中搜索以找到您需要的东西？您不应该使用二叉树或哈希表之类的东西来帮助您搜索吗？也许您只是读入所有数据，对所有数据执行一些处理，然后将它们写回......

Regardless of the data structure you're going to use, your memory consumption is never going to drop below the memory required to store all your data.

Have you calculated how much memory it is required to store one instance class object?

Your huffman encoding is a space-saving optimization, which means that you are eliminating a lot of duplicated data within your class objects yourself. This has nothing to do with the data structure you use to hold your data. This depends on how your data itself is structured so that you can take advantage of different space-saving strategies (of which huffman encoding is one out of many possibilities, suitable for eliminating common prefixes and the data structure used to store it is a tree).

Now, back to your question. Without optimizing your data (i.e. objects), there are things you can watch out to improve memory usage efficiency.

Are all our objects of similar size?

Did you simply run a loop, allocate memory on-the-fly, then insert them into a list, like this:

foreach (var obj in collection) { myList.Add(new myObject(obj)); }

In that case, your list object is constantly being expanded. And if there is not enough free memory at the end to expand the list, .NET will allocate a new, larger piece of memory and copies the original array to the new memory. Essentially you end up with two pieces of memory -- the original one, and the new expanded one (now holding the list). Do this many many many times (as you obviously need to for GB's of data), and you are looking at a LOT of fragmented memory spaces.

You'll be better off just allocating enough memory for the entire list at one go.

As an afternote, I can't help but wondering: how in the world are you going to search this HUGE list to find something you need? Shouldn't you be using something like a binary tree or a hash-table to aid in your searching? Maybe you are just reading in all the data, perform some processing on all of them, then writing them back out...

回复收藏 0 原文