C#中添加字符串,编译器是如何做到的?
A = string.Concat("abc","def")
B = "abc" + "def"
A vs. B
最近我很困惑为什么很多人会说 A 的处理速度肯定比 B 快得多。但是,问题是他们只是说因为有人这么说或者因为事情就是这样。我想我可以从这里听到更好的解释。
编译器如何处理这些字符串?
谢谢你!
A = string.Concat("abc","def")
B = "abc" + "def"
A vs. B
Lately I have been confused why many would say that definitely A does a much faster processing compared to B. But, the thing is they would just say because somebody said so or because it is just the way it is. I suppose I can hear a much better explaination from here.
How does the compiler treats these strings?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
当我加入 C# 编译器团队时,我做的第一件事就是重写了字符串连接的优化器。美好时光。
如前所述,常量字符串的字符串连接是在编译时完成的。非常量字符串做了一些奇特的事情:
这些优化的好处是 String.Concat 方法可以查看所有参数,确定它们的长度总和,然后生成一个可以容纳所有结果的大字符串。
这是一个有趣的。假设您有一个返回字符串的方法 M:
如果 M() 返回 null,则结果是空字符串。 (null +empty 为空。)如果 M 不返回 null,则空字符串的串联不会改变结果。因此,这实际上是经过优化的,根本不是对 String.Concat 的调用!它变得
整洁,是吗?
The very first thing I did when I joined the C# compiler team was I rewrote the optimizer for string concatenations. Good times.
As already noted, string concats of constant strings are done at compile time. Non-constant strings do some fancy stuff:
The benefits of these optimizations are that the String.Concat method can look at all the arguments, determine the sum of their lengths, and then make one big string that can hold all the results.
Here's an interesting one. Suppose you have a method M that returns a string:
If M() returns null then the result is the empty string. (null + empty is empty.) If M does not return null then the result is unchanged by the concatenation of the empty string. Therefore, this is actually optimized as not a call to String.Concat at all! It becomes
Neat, eh?
阅读此内容:悲伤的悲剧微观优化剧场(编码恐怖)
Read this: The Sad Tragedy of Micro-Optimization Theater (Coding Horror)
在 C# 中,字符串的加法运算符只是 String.Concat 的语法糖。您可以通过打开反射器中的输出组件来验证这一点。
另一件需要注意的事情是,如果代码中有字符串文字(或常量)(例如示例中所示),编译器甚至会将其更改为
B = "abcdef"
。但是,如果您将
String.Concat
与两个字符串文字或常量一起使用,则仍会调用 String.Concat,跳过优化,因此+
操作实际上会更快。因此,总结一下:
stringA + stringB
变为String.Concat(stringA, stringB)
。“abc”+“def”
变为“abcdef
”def") 保持
不变
String.Concat("abc", " code>" 实际上被翻译为
String.Concat(String.Concat("abc", "def"), "ghi")
In C#, the addition operator for strings is just syntactic sugar for String.Concat. You can verify that by opening the output assembly in reflector.
Another thing to note is, if you have string literals (or constants) in your code, such as in the example, the compiler even changes this to
B = "abcdef"
.But, if you use
String.Concat
with two string literals or constants, String.Concat will still be called, skipping the optimization, and so the+
operation would actually be faster.So, to sum it up:
stringA + stringB
becomesString.Concat(stringA, stringB)
."abc" + "def"
becomes"abcdef
"String.Concat("abc", "def")
stays the sameSomething else i just had to try:
In C++/CLI,
"abc" + "def" + "ghi
" is actually translated toString.Concat(String.Concat("abc", "def"), "ghi")
实际上,B是在编译时解析的。您最终会得到
B = "abcdef"
而对于 A,串联会推迟到执行时。Actually, B is resolved during compile time. You will end up with
B = "abcdef"
whereas for A, the concatenation is postponed until execution time.在这种特殊情况下,两者实际上是相同的。编译器会将第二个变体(使用
+
运算符的变体)转换为对 Concat(第一个变体)的调用。好吧,也就是说,如果两者实际上包含连接的字符串变量。
这段代码:
实际上转换成这样,根本不需要连接:
这是可以做到的,因为加法的结果可以在编译时计算出来,所以编译器会这样做。
但是,如果您要使用这样的东西:
那么这两个将生成相同的代码。
然而,我想确切地知道那些“许多”所说的,因为我认为这是不同的。
我认为他们说的是字符串连接不好,你应该使用 StringBuilder 或类似的。
例如,如果您这样做:
那么会发生的情况是,对于循环中的每次迭代,您将构建一个新字符串,并让旧字符串有资格进行垃圾回收。
此外,每个这样的新字符串都会将旧字符串的所有内容复制到其中,这意味着您将移动大量内存。
而以下代码:
将使用内部缓冲区,该缓冲区大于所需的大小,以防万一您需要向其中附加更多文本。当该缓冲区已满时,将分配一个更大的新缓冲区,并将旧缓冲区留给垃圾回收。
所以在内存使用和CPU使用方面,后面的版本要好得多。
除此之外,我会尽量避免过多关注“代码变体 X 是否比 Y 更好”,超出您已有的经验。例如,我现在使用 StringBuilder 只是因为我知道这种情况,但这并不是说我编写的所有使用它的代码实际上都需要它。
尽量避免花时间对代码进行微优化,直到您知道存在瓶颈。那时,通常的先测量、后切割的提示仍然有效。
In this particular case, the two are actually identical. The compiler will transform the second variant, the one using the
+
operator, into a call to Concat, the first variant.Well, that is, if the two actually contained string variables that was concatenated.
This code:
actually transforms into this, without concatenation at all:
This can be done because the result of the addition can be computed at compile-time, so the compiler does this.
However, if you were to use something like this:
Then those two will generate the same code.
However, I would like to know exactly what those "many" said, as i think it is something different.
What I think they said is that string concatenation is bad, and you should use StringBuilder or similar.
For instance, if you do this:
Then what happens is that for each iteration through the loop, you'll build one new string, and let the old one be eligible for garbage collection.
Additionally, each such new string will have all the contents of the old one copied into it, which means you'll be moving a large amount of memory around.
Whereas the following code:
Will instead use an internal buffer, that is larger than what needs be, just in case you need to append more text into it. When that buffer becomes full, a new one that is larger will be allocated, and the old one left for garbage collection.
So in terms of memory use and CPU usage, the later variant is much better.
Other than that, I would try to avoid focusing too much on "is code variant X better than Y", beyond what you already have experience with. For instance, I use StringBuilder now just because I'm aware of the case, but that isn't to say that all the code I write that use it actually needs it.
Try to avoid spending time micro-optimizing your code, until you know you have a bottleneck. At that time, the usual tip about measure first, cut later, is still in effect.
如果字符串是文字,就像您的问题一样,那么分配给
B
的字符串的串联将在编译时完成。您的示例转换为:如果字符串不是文字,则编译器会将
+
运算符转换为Concat
调用。所以这......
在编译时被翻译成这样:
If the strings are literals, as in your question, then the concatenation of the strings assigned to
B
will be done at compile-time. Your example translates to:If the strings aren't literals then the compiler will translate the
+
operator into aConcat
call.So this...
...is translated to this at compile-time: