StringBuilder 的 RAM 消耗情况如何?

发布于 2024-07-06 00:34:28 字数 264 浏览 13 评论 0原文

我们有一些操作需要进行大量的大字符串连接,并且最近遇到了内存不足的异常。 不幸的是,调试代码不是一种选择,因为这是在客户站点发生的。

因此,在对我们的代码进行彻底修改之前,我想问一下:对于大字符串,StringBuilder 的 RAM 消耗特征是什么?

特别是当它们与标准字符串类型相比时。 字符串的大小远远超过 10 MB,我们似乎遇到了 20 MB 左右的问题。

注意:这与速度无关,而是与 RAM 相关。

We have a few operations where we are doing a large number of large string concatenations, and have recently encountered an out of memory exception. Unfortunately, debugging the code is not an option, as this is occurring at a customer site.

So, before looking into a overhaul of our code, I would like to ask: what is the RAM consumption characteristics of StringBuilder for large strings?

Especially as they compare to the standard string type. The size of the strings are well over 10 MB, and we seem to run into the issues around 20 MB.

NOTE: This is not about speed but RAM.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

不弃不离 2024-07-13 00:34:28

您可能对绳索数据结构感兴趣。 本文:绳索:理论和实践解释了它们的优点。 也许.NET 有一个实现。

【更新,回复评论】
它使用的内存更少吗? 在文章中搜索内存,你会发现一些提示。
基本上,是的,尽管有结构开销,因为它只是在需要时添加内存。 StringBuilder,当耗尽旧缓冲区时,必须分配一个更大的缓冲区(这已经可能浪费空内存)并删除旧缓冲区(它将被垃圾收集,但同时仍然可以使用大量内存)。

我还没有找到 .NET 的实现,但至少有一个 C++ 实现(在 SGI 的 STL 中: http://www.sgi.com/tech/stl/Rope.html)。 也许你可以利用这个实现。 请注意,我引用的页面有关于内存性能的研究。

请注意,绳索并不能解决所有问题:它们的实用性在很大程度上取决于您如何构建大型绳索以及如何使用它们。 这些文章指出了优点和缺点。

You might be interested by the ropes data structure. This article: Ropes: Theory and practice explains their advantages. Maybe there is an implementation for .NET.

[Update, to answer the comment]
Does it use less memory? Search memory in the article, you will find some hints.
Basically, yes, despite the structure overhead, because it just adds memory when needed. StringBuilder, when exhausting old buffer, must allocate a much bigger one (which can already waste empty memory) and drops the old one (which will be garbage collected, but can still use lot of memory in the mean time).

I haven't found an implementation for .NET, but there is at least a C++ implementation (in SGI's STL: http://www.sgi.com/tech/stl/Rope.html). Maybe you can leverage this implementation. Note the page I reference have a work on memory performance.

Note that Ropes aren't the cure to all problems: their usefulness depends heavily how you build your large strings, and how you use them. The articles point out advantages and drawbacks.

挽心 2024-07-13 00:34:28

每次 StringBuilder 空间不足时,它都会重新分配一个两倍于原始缓冲区大小的新缓冲区,复制旧字符,并让旧缓冲区被 GC 回收。 您可能只是使用了足够的内存(称为 x),使得 2x 大于您允许分配的内存。 您可能想要确定字符串的最大长度,并将其传递给 StringBuilder 的构造函数,以便进行预分配,并且不会受到加倍重新分配的影响。

Each time StringBuilder runs out of space, it reallocates a new buffer twice the size of the original buffer, copies the old characters, and lets the old buffer get GC'd. It's possible that you're just using enough (call it x) such that 2x is larger than the memory you're allowed to allocate. You may want to determine a maximum length for your strings, and pass it to the constructor of StringBuilder so you preallocate, and you're not at the mercy of the doubling reallocation.

明月夜 2024-07-13 00:34:28

这是关于字符串连接与内存分配的很好的研究。

如果可以避免连接,那就这样做!

如果你不这样做的话,这是理所当然的
必须连接但想要你的
源代码看起来不错,使用
第一种方法。 它将被优化为
如果它是单个字符串。

永远不要使用 += 连接。发生了太多变化
幕后,不明显的
首先从我的代码中。 我
建议使用 String.Concat()
显式地使用任何重载(2
字符串、3 个字符串、字符串数组)。
这将清楚地显示您的代码
没有任何惊喜,同时
允许自己检查
效率。

尝试估计 StringBuilder 的目标大小。

您的估算越准确
所需的尺寸,临时性越小
StringBuilder 必须的字符串
创造以增加其内在
缓冲区。

当性能存在问题时,请勿使用任何 Format() 方法。

涉及太多开销
解析格式,当你可以的时候
当以下情况时,构造一个由碎片组成的数组
您使用的只是 {x} 替换。
Format() 有利于可读性,但是
当你在的时候要做的事情之一
挤出所有可能的性能
您的应用程序。

Here is a nice study about String Concatenation vs Memory Allocation.

If you can avoid concatenating, do it!

This is a no brainer, if you don't
have to concatenate but want your
source code to look nice, use the
first method. It will get optimized as
if it was a single string.

Don't use += concatenating ever. Too much changes are taking place
behind the scene, which aren't obvious
from my code in the first place. I
advise to rather use String.Concat()
explicitly with any overload (2
strings, 3 strings, string array).
This will clearly show what your code
does without any surprises, while
allowing yourself to keep a check on
the efficiency.

Try to estimate the target size of a StringBuilder.

The more accurate you can estimate the
needed size, the less temporary
strings the StringBuilder will have to
create to increase its internal
buffer.

Do not use any Format() methods when performance is an issue.

Too much overhead is involved in
parsing the format, when you could
construct an array out of pieces when
all you are using are {x} replaces.
Format() is good for readability, but
one of the things to go when you are
squeezing all possible performance out
of your application.

眼眸里的那抹悲凉 2024-07-13 00:34:28

Strigbuilder 是解决连接字符串引起的内存问题的完美解决方案。

为了回答您的具体问题,与普通字符串相比,Stringbuilder 具有恒定大小的开销,其中字符串的长度等于当前分配的 Stringbuilder 缓冲区的长度。 缓冲区可能是生成的字符串大小的两倍,但在连接到 Stringbuilder 时不会再进行内存分配,直到缓冲区被填满,因此这确实是一个出色的解决方案。

与字符串相比,这是非常出色的。

string output = "Test";
output += ", printed on " + datePrinted.ToString();
output += ", verified by " + verificationName;
output += ", number lines: " + numberLines.ToString();

该代码有四个字符串作为文字存储在代码中,其中两个是在方法中创建的,一个是来自变量的,但它使用了六个越来越长的独立中间字符串。 如果这种模式继续下去,它将以指数速度增加内存使用量,直到 GC 开始清理它。

Strigbuilder is a perfectly good solution to memory problems caused by concatenating strings.

To answer your specific question, Stringbuilder has a constant-sized overhead compared to a normal string where the length of the string is equal to the length of the currently-allocated Stringbuilder buffer. The buffer could potentially be twice the size of the string that results, but no more memory allocations will be made when concatenating to the Stringbuilder until the buffer is filled, so it is really an excellent solution.

Compared with string, this is outstanding.

string output = "Test";
output += ", printed on " + datePrinted.ToString();
output += ", verified by " + verificationName;
output += ", number lines: " + numberLines.ToString();

This code has four strings that stored as literals in the code, two that are created in the methods and one from a variable, but it uses six separate intermediate strings which get longer and longer. If this pattern is continued, it will increase memory usage at an exponential rate until the GC kicks in to clean it up.

趴在窗边数星星i 2024-07-13 00:34:28

我不知道字符串生成器的确切内存模式,但公共字符串不是一个选项。

当您使用公共字符串时,每次连接都会创建另外几个字符串对象,并且内存消耗会急剧上升,从而导致垃圾收集器被频繁调用。

string a = "a";

//creates object with a

a += "b"

/creates object with b, creates object with ab, assings object with ab to "a" pointer

I don't know about the exactly memory pattern of string builder but the common string is not an option.

When you use the common string every concatenation creates another couple of string objects, and the memory consumption skyrocket, making the garbage collector being called too often.

string a = "a";

//creates object with a

a += "b"

/creates object with b, creates object with ab, assings object with ab to "a" pointer
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文