StringBuilder 在 C# 内部如何工作?

发布于 2024-11-17 21:26:01 字数 111 浏览 2 评论 0原文

StringBuilder 如何工作?

它在内部做什么??它使用不安全的代码吗? 为什么它这么快(与 + 运算符相比)?

How does StringBuilder work?

What does it do internally? Does it use unsafe code?
And why is it so fast (compared to the + operator)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

绝情姑娘 2024-11-24 21:26:01

当您使用 + 运算符构建字符串时:

string s = "01";
s += "02";
s += "03";
s += "04";

然后在第一次串联时,我们创建一个长度为 4 的新字符串,并将“01”和“02”复制到其中 - 复制四个字符。在第二次连接中,我们创建一个长度为 6 的新字符串,并将“0102”和“03”复制到其中 - 复制了六个字符。在第三次连接中,我们创建一个长度为 8 的字符串,并将“010203”和“04”复制到其中——复制了 8 个字符。到目前为止,这个八字符字符串总共复制了 4 + 6 + 8 = 18 个字符。继续前进。

...
s += "99";

在第 98 次连接中,我们创建一个长度为 198 的字符串,并将“010203...98”和“99”复制到其中。为了组成这个 198 个字符的字符串,总共需要 4 + 6 + 8 + ... + 198 = 很多。

字符串生成器不会执行所有复制操作。相反,它维护一个可变数组,希望该数组大于最终字符串,并根据需要将新内容填充到数组中。

当猜测错误并且数组已满时会发生什么?有两种策略。在框架的早期版本中,字符串生成器在数组已满时重新分配并复制数组,并将其大小加倍。在新的实现中,字符串生成器维护一个相对较小数组的链接列表,并在旧数组已满时将新数组附加到列表的末尾。

此外,正如您所猜测的,字符串生成器可以使用“不安全”代码来提高其性能。例如,将新数据写入数组的代码可能已经检查了数组写入是否在范围内。通过关闭安全系统,可以避免抖动可能插入的每次写入检查,以验证对阵列的每次写入是否安全。字符串生成器执行许多此类技巧来执行诸如确保重用而不是重新分配缓冲区、确保避免不必要的安全检查等操作。我建议不要进行此类恶作剧,除非您真的擅长正确编写不安全的代码,并且确实需要尽力提高性能。

When you use the + operator to build up a string:

string s = "01";
s += "02";
s += "03";
s += "04";

then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.

...
s += "99";

On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.

A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.

What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.

Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.

拥抱影子 2024-11-24 21:26:01

我相信,StringBuilder 的实现在版本之间发生了变化。但从根本上来说,它维持某种形式的可变结构。我相信它曾经使用仍在变异的字符串(使用内部方法),并且只是确保它在返回后永远不会变异。

StringBuilder 比在循环中使用字符串连接更快的原因正是因为可变性 - 它不需要在每次突变后构造新字符串,这会导致意味着复制字符串中的所有数据等。

对于单个串联,使用 + 实际上比使用 StringBuilder 更有效。只有当您执行多个操作并且您并不真正需要中间结果时,StringBuilder才会发挥作用。

请参阅我关于的文章StringBuilder 了解更多信息。

StringBuilder's implementation has changed between versions, I believe. Fundamentally though, it maintains a mutable structure of some form. I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.

The reason StringBuilder is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc.

For just a single concatenation, it's actually slightly more efficient to use + than to use StringBuilder. It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder shines.

See my article on StringBuilder for more information.

若沐 2024-11-24 21:26:01

Microsoft CLR 确实通过内部调用执行一些操作(与不安全代码不太一样)。与一堆 + 连接字符串相比,最大的性能优势是它写入 char[] 并且不会创建那么多中间字符串。当您调用 ToString() 时,它会根据您的内容构建一个完整的、不可变的字符串。

The Microsoft CLR does do some operations with internal call (not quite the same as unsafe code). The biggest performance benefit over a bunch of + concatenated strings is that it writes to a char[] and doesn't create as many intermediate strings. When you call ToString (), it builds a completed, immutable string from your contents.

静若繁花 2024-11-24 21:26:01

与不能更改的常规 String 相比,StringBuilder 使用可以更改的字符串缓冲区。当您调用 StringBuilderToString 方法时,它只会冻结字符串缓冲区并将其转换为常规字符串,因此不必将所有数据复制一次额外的时间。

由于 StringBuilder 可以更改字符串缓冲区,因此它不必为字符串数据的每次更改创建新的字符串值。当您使用 + 运算符时,编译器会将其转换为创建新字符串对象的 String.Concat 调用。这段看似无辜的代码:

str += ",";

编译成这样:

str = String.Concat(str, ",");

The StringBuilder uses a string buffer that can be altered, compared to a regular String that can't be. When you call the ToString method of the StringBuilder it will just freeze the string buffer and convert it into a regular string, so it doesn't have to copy all the data one extra time.

As the StringBuilder can alter the string buffer, it doesn't have to create a new string value for each and every change to the string data. When you use the + operator, the compiler turns that into a String.Concat call that creates a new string object. This seemingly innocent piece of code:

str += ",";

compiles into this:

str = String.Concat(str, ",");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文