假装 .NET 字符串是值类型
在 .NET 中,字符串是不可变的,并且是引用类型变量。这通常会让新的 .NET 开发人员感到惊讶,因为他们的行为可能会将它们误认为是值类型对象。但是,除了使用 StringBuilder
进行长连接(尤其是)的做法之外。在循环中,在实践中是否有任何理由需要知道这种区别?
通过了解 .NET 字符串的值引用区别与仅仅假装/误解它们是值类型,可以帮助或避免哪些现实场景?
In .NET, strings are immutable and are reference type variables. This often comes as a surprise to newer .NET developers who may mistake them for value type objects due to their behavior. However, other than the practice of using StringBuilder
for long concatenation esp. in loops, is there any reason in practice that one needs to know this distinction?
What real-world scenarios are helped or avoided by understanding the value-reference distinction with regard to .NET strings vs. just pretending/misunderstanding them to be value types?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
string
的设计是经过深思熟虑的,因此作为程序员,您不需要太担心它。在许多情况下,这意味着您可以只分配、移动、复制、更改字符串,而不必过多考虑如果存在对字符串的另一个引用并且同时更改(如对象引用时发生的情况)可能出现的复杂后果。方法调用中的字符串参数
(编辑:本节稍后添加)
当字符串传递给方法时,它们是通过引用传递的。当它们仅在方法体中读取时,不会发生任何特殊情况。但是,当它们发生更改时,会创建一个副本,并在方法的其余部分中使用临时变量。这个过程称为写时复制。
让初级者感到困扰的是,他们习惯了对象是引用的事实,并且它们在改变传递参数的方法中被改变。要对字符串执行相同的操作,需要使用 ref 关键字。这实际上允许更改字符串引用并将其返回到调用函数。如果不这样做,则字符串无法通过方法主体更改:
在 StringBuilder 上
这种区别很重要,但初学者程序员通常最好不要了解太多。当您进行大量字符串“构建”时,使用 StringBuilder 固然很好,但通常情况下,您的应用程序会有更多的鱼要煎,而
StringBuilder
的性能提升几乎可以忽略不计。 。警惕那些告诉您所有字符串操作都应该使用 StringBuilder 完成的程序员。作为一个非常粗略的经验法则:StringBuilder 有一些创建成本,但附加成本很低。字符串的创建成本较低,但串联的成本相对较高。 转折点约为 400-500 个串联,具体取决于大小:之后,StringBuilder 变得更加高效。
有关 StringBuilder 与字符串性能的更多信息
编辑:根据 Konrad Rudolph 的评论,我添加了 具有
如果前面的经验法则让您感到疑惑,请考虑以下稍微更详细的解释:
+
、StringBuilder.Append
、String.Format
、ToString
等字符串操作,使用StringBuilder在这种情况下几乎不会有效。那么,什么时候有效呢?在附加许多小字符串的情况下,例如,将数据序列化到文件中,以及当您不需要更改“写入”到 StringBuilder 后的“写入”数据时。在许多方法需要附加某些内容的情况下,因为 StringBuilder 是引用类型,并且字符串在更改时会被复制。
在实习字符串上,
当他们尝试进行参考比较并发现在看似相同的情况下有时结果是正确的,有时结果是错误的时,就会出现问题(不仅对于初级程序员而言)。发生了什么?当字符串被编译器保留并添加到全局静态字符串池时,两个字符串之间的比较可以指向相同的内存地址。当(参考!)比较两个相等的字符串时,一个被保留,一个未被保留,将产生 false。处理字符串时,请使用
=
比较或Equals
,并且不要使用ReferenceEquals
。关于 String.Empty
在同一联盟中,使用 String.Empty 时有时会出现奇怪的行为:静态 String.Empty 始终被保留,但带有分配的变量值不是。但是,默认情况下,编译器将分配 String.Empty 并指向相同的内存地址。结果:与
ReferenceEquals
相比,可变字符串变量返回 true,而您可能期望返回 false。深入
你基本上问的是外行可能会发生什么情况。我认为我的观点归结为避免
object.ReferenceEquals
因为它与字符串一起使用时不可信。原因是当代码中的字符串是常量时,会使用字符串驻留,但并非总是如此。您不能依赖此行为。尽管String.Empty
和""
始终被保留,但当编译器认为该值是可变的时,它就不会被保留。不同的优化选项(调试与发布等)将产生不同的结果。无论如何,什么时候您需要
ReferenceEquals
?对于对象来说这是有意义的,但是对于字符串来说就没有意义。教导任何使用字符串的人避免使用它,除非他们也了解不安全
和固定对象。性能
当性能很重要时,您会发现字符串实际上不是是不可变的,并且使用
StringBuilder
并不总是最快的方法。我在这里使用的很多信息都在这篇关于字符串的优秀文章中进行了详细介绍 ,以及就地操作字符串(可变字符串)的“如何”。
更新:添加了代码示例
更新:添加了“深入”部分(希望有人觉得这很有用;)
更新:添加了一些链接,添加了有关字符串参数的部分
更新:添加了何时从字符串切换到字符串生成器的估计
更新:在 Konrad Rudolph 发表评论后,添加了关于 StringBuilder 与 String 性能的额外部分
The design of
string
s was deliberately such that you shouldn't need to worry too much about it as a programmer. In many situations, this means that you can just assign, move, copy, change strings without thinking too much of the possible intricate consequences if another reference to your string existed and would be changed at the same time (as happens with object references).String parameters in a method call
(EDIT: this section added later)
When strings are passed to a method, they are passed by reference. When they are only read in the method body, nothing special happens. But when they are changed, a copy is created and the temporary variable is used in the rest of the method. This process is called copy-on-write.
What troubles juniors is that they are used to the fact that objects are references and they are changed in a method which changes the passed parameter. To do the same with strings, they need to use the
ref
keyword. This actually allows the string reference to be changed and returned to the calling function. If you don't, the string cannot be changed by the method body:On StringBuilder
This distinction is important, but beginner programmers are usually better off not knowing too much about it. Using
StringBuilder
when you do a lot of string "building" is good, but often, your application will have much more fish to fry and the little performance gain ofStringBuilder
is negligible. Be wary of programmers that tell you that all string manipulation should be done using StringBuilder.As a very rough rule of thumb: StringBuilder has some creation cost, but appending is cheap. String has a cheap creation cost, but concatenation is relatively expensive. The turning point is around 400-500 concatenations, depending on size: after that, StringBuilder becomes more efficient.
More on StringBuilder vs string performance
EDIT: based on a comment from Konrad Rudolph, I added this section.
If the previous rule of thumb makes you wonder, consider the following slightly more detailed explanations:
+
,StringBuilder.Append
,String.Format
,ToString
and other string operations, using StringBuilder in such cases is hardly ever effective.So, when is it efficient? In cases where many small strings are appended, i.e., to serialize data to a file, for instance and when you don't need to change the "written" data once "written" to StringBuilder. And in cases where many methods need to append something, because StringBuilder is a reference type and strings are copied when they are changed.
On interned strings
A problem rises — not only with junior programmers — when they try to do a reference comparison and find out that sometimes the result is true, and sometimes it is false, in seemingly the same situations. What happened? When the strings were interned by the compiler and added to the global static interned pool of strings, comparison between two strings can point to the same memory address. When (reference!)comparing two equal strings, one interned and one not, will yield false. Use
=
comparison, orEquals
and do not play around withReferenceEquals
when dealing with strings.On String.Empty
In the same league fits a strange behavior that sometimes occurs when using
String.Empty
: the staticString.Empty
is always interned, but a variable with an assigned value is not. However, by default the compiler will assignString.Empty
and point to the same memory address. Result: a mutable string variable, when compared withReferenceEquals
, returns true, while you might expect false instead.In depth
You basically asked about what situations can occur to the uninitiated. I think my point boils down to avoiding
object.ReferenceEquals
because it cannot be trusted when used with strings. The reason is that string interning is used when the string is constant in the code, but not always. You cannot rely on this behavior. ThoughString.Empty
and""
are always interned, it is not when the compiler believes the value is changeable. Different optimization options (debug vs release and others) will yield different results.When do you need
ReferenceEquals
anyway? With objects it makes sense, but with strings it does not. Teach anybody working with strings to avoid its usage unless they also understandunsafe
and pinned objects.Performance
When performance is important, you can find out that strings are actually not immutable and that using
StringBuilder
is not always the fastest approach.A lot of the information I used here is detailed in this excellent article on strings, along with a "how to" for manipulating string in-place (mutable strings).
Update: added code sample
Update: added 'in depth' section (hope someone find this useful ;)
Update: added some links, added section on string params
Update: added estimation for when to switch from strings to stringbuilder
Update: added an extra section on StringBuilder vs String performance, after a remark by Konrad Rudolph
对于大多数代码来说真正重要的唯一区别是可以将
null
分配给字符串变量。The only distinction that really matters for most code is the fact that
null
can be assigned to string variables.在所有常见情况下,不可变类的行为就像值类型,您可以进行大量编程,而不必太关心其中的差异。
只有当您更深入地挖掘并关心性能时,您才能真正利用这种区别。例如,要知道虽然将字符串作为参数传递给方法就像创建了该字符串的副本一样,但实际上并没有发生复制。对于习惯了字符串实际上是值类型的语言(例如 VB6?)的人来说,这可能会感到惊讶,并且传递大量字符串作为参数对性能不利。
An immutable class acts like a value type in all common situations, and you can do quite a lot of programming without caring much about the difference.
It's when you dig a little deeper and care about performance that you have real use for the distinction. For example to know that although passing a string as a parameter to a method acts as if a copy of the string is created, the copying doesn't actually take place. This might be a surprise for people used to languages where strings actually are value types (like VB6?), and passing a lot of strings as parameters would not be good for performance.
弦是一个特殊的品种。它们是引用类型,但被大多数编码人员用作值类型。通过使其不可变并使用实习池,它优化了内存使用量,如果它是纯值类型,内存使用量将会很大。
更多读物请点击这里:
C# .NET String 对象真的是通过引用吗?就这样
MSDN 上的 String.Intern 方法
MSDN 上的字符串(C# 参考)
更新:
请参阅
abel
对此帖子的评论。它纠正了我的误导性陈述。String is a special breed. They are reference type yet used by most coders as a value type. By making it immutable and using the intern pool, it optimizes memory usage which will be huge if it's a pure value type.
More readings here:
C# .NET String object is really by reference? on SO
String.Intern Method on MSDN
string (C# Reference) on MSDN
Update:
Please refer to
abel
's comment to this post. It corrected my misleading statement.