StringBuilder 的 RAM 消耗情况如何?
我们有一些操作需要进行大量的大字符串连接,并且最近遇到了内存不足的异常。 不幸的是,调试代码不是一种选择,因为这是在客户站点发生的。
因此,在对我们的代码进行彻底修改之前,我想问一下:对于大字符串,StringBuilder 的 RAM 消耗特征是什么?
特别是当它们与标准字符串类型相比时。 字符串的大小远远超过 10 MB,我们似乎遇到了 20 MB 左右的问题。
注意:这与速度无关,而是与 RAM 相关。
We have a few operations where we are doing a large number of large string concatenations, and have recently encountered an out of memory exception. Unfortunately, debugging the code is not an option, as this is occurring at a customer site.
So, before looking into a overhaul of our code, I would like to ask: what is the RAM consumption characteristics of StringBuilder for large strings?
Especially as they compare to the standard string type. The size of the strings are well over 10 MB, and we seem to run into the issues around 20 MB.
NOTE: This is not about speed but RAM.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可能对绳索数据结构感兴趣。 本文:绳索:理论和实践解释了它们的优点。 也许.NET 有一个实现。
【更新,回复评论】
它使用的内存更少吗? 在文章中搜索内存,你会发现一些提示。
基本上,是的,尽管有结构开销,因为它只是在需要时添加内存。 StringBuilder,当耗尽旧缓冲区时,必须分配一个更大的缓冲区(这已经可能浪费空内存)并删除旧缓冲区(它将被垃圾收集,但同时仍然可以使用大量内存)。
我还没有找到 .NET 的实现,但至少有一个 C++ 实现(在 SGI 的 STL 中: http://www.sgi.com/tech/stl/Rope.html)。 也许你可以利用这个实现。 请注意,我引用的页面有关于内存性能的研究。
请注意,绳索并不能解决所有问题:它们的实用性在很大程度上取决于您如何构建大型绳索以及如何使用它们。 这些文章指出了优点和缺点。
You might be interested by the ropes data structure. This article: Ropes: Theory and practice explains their advantages. Maybe there is an implementation for .NET.
[Update, to answer the comment]
Does it use less memory? Search memory in the article, you will find some hints.
Basically, yes, despite the structure overhead, because it just adds memory when needed. StringBuilder, when exhausting old buffer, must allocate a much bigger one (which can already waste empty memory) and drops the old one (which will be garbage collected, but can still use lot of memory in the mean time).
I haven't found an implementation for .NET, but there is at least a C++ implementation (in SGI's STL: http://www.sgi.com/tech/stl/Rope.html). Maybe you can leverage this implementation. Note the page I reference have a work on memory performance.
Note that Ropes aren't the cure to all problems: their usefulness depends heavily how you build your large strings, and how you use them. The articles point out advantages and drawbacks.
每次 StringBuilder 空间不足时,它都会重新分配一个两倍于原始缓冲区大小的新缓冲区,复制旧字符,并让旧缓冲区被 GC 回收。 您可能只是使用了足够的内存(称为 x),使得 2x 大于您允许分配的内存。 您可能想要确定字符串的最大长度,并将其传递给 StringBuilder 的构造函数,以便进行预分配,并且不会受到加倍重新分配的影响。
Each time StringBuilder runs out of space, it reallocates a new buffer twice the size of the original buffer, copies the old characters, and lets the old buffer get GC'd. It's possible that you're just using enough (call it x) such that 2x is larger than the memory you're allowed to allocate. You may want to determine a maximum length for your strings, and pass it to the constructor of StringBuilder so you preallocate, and you're not at the mercy of the doubling reallocation.
这是关于字符串连接与内存分配的很好的研究。
Here is a nice study about String Concatenation vs Memory Allocation.
Strigbuilder 是解决连接字符串引起的内存问题的完美解决方案。
为了回答您的具体问题,与普通字符串相比,Stringbuilder 具有恒定大小的开销,其中字符串的长度等于当前分配的 Stringbuilder 缓冲区的长度。 缓冲区可能是生成的字符串大小的两倍,但在连接到 Stringbuilder 时不会再进行内存分配,直到缓冲区被填满,因此这确实是一个出色的解决方案。
与字符串相比,这是非常出色的。
该代码有四个字符串作为文字存储在代码中,其中两个是在方法中创建的,一个是来自变量的,但它使用了六个越来越长的独立中间字符串。 如果这种模式继续下去,它将以指数速度增加内存使用量,直到 GC 开始清理它。
Strigbuilder is a perfectly good solution to memory problems caused by concatenating strings.
To answer your specific question, Stringbuilder has a constant-sized overhead compared to a normal string where the length of the string is equal to the length of the currently-allocated Stringbuilder buffer. The buffer could potentially be twice the size of the string that results, but no more memory allocations will be made when concatenating to the Stringbuilder until the buffer is filled, so it is really an excellent solution.
Compared with string, this is outstanding.
This code has four strings that stored as literals in the code, two that are created in the methods and one from a variable, but it uses six separate intermediate strings which get longer and longer. If this pattern is continued, it will increase memory usage at an exponential rate until the GC kicks in to clean it up.
我不知道字符串生成器的确切内存模式,但公共字符串不是一个选项。
当您使用公共字符串时,每次连接都会创建另外几个字符串对象,并且内存消耗会急剧上升,从而导致垃圾收集器被频繁调用。
I don't know about the exactly memory pattern of string builder but the common string is not an option.
When you use the common string every concatenation creates another couple of string objects, and the memory consumption skyrocket, making the garbage collector being called too often.