假装 .NET 字符串是值类型

发布于 2024-08-10 03:23:46 字数 213 浏览 13 评论 0原文

在 .NET 中，字符串是不可变的，并且是引用类型变量。这通常会让新的 .NET 开发人员感到惊讶，因为他们的行为可能会将它们误认为是值类型对象。但是，除了使用 StringBuilder 进行长连接（尤其是）的做法之外。在循环中，在实践中是否有任何理由需要知道这种区别？

通过了解 .NET 字符串的值引用区别与仅仅假装/误解它们是值类型，可以帮助或避免哪些现实场景？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

弥繁 2024-08-17 03:23:46

string 的设计是经过深思熟虑的，因此作为程序员，您不需要太担心它。在许多情况下，这意味着您可以只分配、移动、复制、更改字符串，而不必过多考虑如果存在对字符串的另一个引用并且同时更改（如对象引用时发生的情况）可能出现的复杂后果。

方法调用中的字符串参数

（编辑：本节稍后添加）
当字符串传递给方法时，它们是通过引用传递的。当它们仅在方法体中读取时，不会发生任何特殊情况。但是，当它们发生更改时，会创建一个副本，并在方法的其余部分中使用临时变量。这个过程称为写时复制。

让初级者感到困扰的是，他们习惯了对象是引用的事实，并且它们在改变传递参数的方法中被改变。要对字符串执行相同的操作，需要使用 ref 关键字。这实际上允许更改字符串引用并将其返回到调用函数。如果不这样做，则字符串无法通过方法主体更改：

void ChangeBad(string s)      { s = "hello world"; }
void ChangeGood(ref string s) { s = "hello world"; }

// in calling method:
string s1 = "hi";
ChangeBad(s1);       // s1 remains "hi" on return, this is often confusing
ChangeGood(ref s1);  // s1 changes to "hello world" on return

在 StringBuilder 上

这种区别很重要，但初学者程序员通常最好不要了解太多。当您进行大量字符串“构建”时，使用 StringBuilder 固然很好，但通常情况下，您的应用程序会有更多的鱼要煎，而 StringBuilder 的性能提升几乎可以忽略不计。。警惕那些告诉您所有字符串操作都应该使用 StringBuilder 完成的程序员。

作为一个非常粗略的经验法则：StringBuilder 有一些创建成本，但附加成本很低。字符串的创建成本较低，但串联的成本相对较高。 转折点约为 400-500 个串联，具体取决于大小：之后，StringBuilder 变得更加高效。

有关 StringBuilder 与字符串性能的更多信息

编辑：根据 Konrad Rudolph 的评论，我添加了具有

如果前面的经验法则让您感到疑惑，请考虑以下稍微更详细的解释：

许多小字符串附加的 StringBuilder 比字符串连接速度更快（30、50 个附加），但在 2μs 上，甚至 100%性能提升通常可以忽略不计（对于某些罕见情况是安全的）；
具有一些大型字符串附加（80 个字符或更大的字符串）的 StringBuilder 仅在数千次（有时是数十万次迭代）之后才超过字符串连接，并且差异通常只有几个百分点；
混合字符串操作（替换、插入、子字符串、正则表达式等）通常使使用 StringBuilder 或字符串连接变得相同；
常量的字符串连接可以通过编译器、CLR 或 JIT 进行优化，但 StringBuilder 则不能；
代码中经常混合使用+、StringBuilder.Append、String.Format、ToString等字符串操作，使用StringBuilder在这种情况下几乎不会有效。

那么，什么时候有效呢？在附加许多小字符串的情况下，例如，将数据序列化到文件中，以及当您不需要更改“写入”到 StringBuilder 后的“写入”数据时。在许多方法需要附加某些内容的情况下，因为 StringBuilder 是引用类型，并且字符串在更改时会被复制。

在实习字符串上，

当他们尝试进行参考比较并发现在看似相同的情况下有时结果是正确的，有时结果是错误的时，就会出现问题（不仅对于初级程序员而言）。发生了什么？当字符串被编译器保留并添加到全局静态字符串池时，两个字符串之间的比较可以指向相同的内存地址。当（参考！）比较两个相等的字符串时，一个被保留，一个未被保留，将产生 false。处理字符串时，请使用 = 比较或 Equals，并且不要使用 ReferenceEquals。

关于 String.Empty

在同一联盟中，使用 String.Empty 时有时会出现奇怪的行为：静态 String.Empty 始终被保留，但带有分配的变量值不是。但是，默认情况下，编译器将分配 String.Empty 并指向相同的内存地址。结果：与 ReferenceEquals 相比，可变字符串变量返回 true，而您可能期望返回 false。

// emptiness is treated differently:
string empty1 = String.Empty;
string empty2 = "";
string nonEmpty1 = "something";
string nonEmpty2 = "something";

// yields false (debug) true (release)
bool compareNonEmpty = object.ReferenceEquals(nonEmpty1, nonEmpty2);

// yields true (debug) false (release, depends on .NET version and how it's assigned)
bool compareEmpty = object.ReferenceEquals(empty1, empty2);

深入

你基本上问的是外行可能会发生什么情况。我认为我的观点归结为避免 object.ReferenceEquals 因为它与字符串一起使用时不可信。原因是当代码中的字符串是常量时，会使用字符串驻留，但并非总是如此。您不能依赖此行为。尽管 String.Empty 和 "" 始终被保留，但当编译器认为该值是可变的时，它就不会被保留。不同的优化选项（调试与发布等）将产生不同的结果。

无论如何，什么时候您需要ReferenceEquals？对于对象来说这是有意义的，但是对于字符串来说就没有意义。教导任何使用字符串的人避免使用它，除非他们也了解不安全和固定对象。

性能

当性能很重要时，您会发现字符串实际上不是是不可变的，并且使用StringBuilder并不总是最快的方法。

我在这里使用的很多信息都在这篇关于字符串的优秀文章中进行了详细介绍，以及就地操作字符串（可变字符串）的“如何”。

更新：添加了代码示例
更新：添加了“深入”部分（希望有人觉得这很有用；）
更新：添加了一些链接，添加了有关字符串参数的部分
更新：添加了何时从字符串切换到字符串生成器的估计
更新：在 Konrad Rudolph 发表评论后，添加了关于 StringBuilder 与 String 性能的额外部分

The design of strings was deliberately such that you shouldn't need to worry too much about it as a programmer. In many situations, this means that you can just assign, move, copy, change strings without thinking too much of the possible intricate consequences if another reference to your string existed and would be changed at the same time (as happens with object references).

String parameters in a method call

(EDIT: this section added later)
When strings are passed to a method, they are passed by reference. When they are only read in the method body, nothing special happens. But when they are changed, a copy is created and the temporary variable is used in the rest of the method. This process is called copy-on-write.

What troubles juniors is that they are used to the fact that objects are references and they are changed in a method which changes the passed parameter. To do the same with strings, they need to use the ref keyword. This actually allows the string reference to be changed and returned to the calling function. If you don't, the string cannot be changed by the method body:

void ChangeBad(string s)      { s = "hello world"; }
void ChangeGood(ref string s) { s = "hello world"; }

// in calling method:
string s1 = "hi";
ChangeBad(s1);       // s1 remains "hi" on return, this is often confusing
ChangeGood(ref s1);  // s1 changes to "hello world" on return

On StringBuilder

This distinction is important, but beginner programmers are usually better off not knowing too much about it. Using StringBuilder when you do a lot of string "building" is good, but often, your application will have much more fish to fry and the little performance gain of StringBuilder is negligible. Be wary of programmers that tell you that all string manipulation should be done using StringBuilder.

As a very rough rule of thumb: StringBuilder has some creation cost, but appending is cheap. String has a cheap creation cost, but concatenation is relatively expensive. The turning point is around 400-500 concatenations, depending on size: after that, StringBuilder becomes more efficient.

More on StringBuilder vs string performance

EDIT: based on a comment from Konrad Rudolph, I added this section.

If the previous rule of thumb makes you wonder, consider the following slightly more detailed explanations:

StringBuilder with many small string appends outruns string concatenation rather quickly (30, 50 appends), but on 2µs, even 100% performance gain is often negligible (safe for some rare situations);
StringBuilder with some large string appends (80 characters or larger strings) outruns string concatenation only after thousands, sometimes hundredths of thousands iterations and the difference is often just a few percents;
Mixing string actions (replace, insert, substring, regex etc) often makes using StringBuilder or string concatenation equal;
String concatenation of constants can be optimized away by the compiler, the CLR or the JIT, it can't for StringBuilder;
Code often mixes concatenation +, StringBuilder.Append, String.Format, ToString and other string operations, using StringBuilder in such cases is hardly ever effective.

So, when is it efficient? In cases where many small strings are appended, i.e., to serialize data to a file, for instance and when you don't need to change the "written" data once "written" to StringBuilder. And in cases where many methods need to append something, because StringBuilder is a reference type and strings are copied when they are changed.

On interned strings

A problem rises — not only with junior programmers — when they try to do a reference comparison and find out that sometimes the result is true, and sometimes it is false, in seemingly the same situations. What happened? When the strings were interned by the compiler and added to the global static interned pool of strings, comparison between two strings can point to the same memory address. When (reference!)comparing two equal strings, one interned and one not, will yield false. Use = comparison, or Equals and do not play around with ReferenceEquals when dealing with strings.

On String.Empty

In the same league fits a strange behavior that sometimes occurs when using String.Empty: the static String.Empty is always interned, but a variable with an assigned value is not. However, by default the compiler will assign String.Empty and point to the same memory address. Result: a mutable string variable, when compared with ReferenceEquals, returns true, while you might expect false instead.

// emptiness is treated differently:
string empty1 = String.Empty;
string empty2 = "";
string nonEmpty1 = "something";
string nonEmpty2 = "something";

// yields false (debug) true (release)
bool compareNonEmpty = object.ReferenceEquals(nonEmpty1, nonEmpty2);

// yields true (debug) false (release, depends on .NET version and how it's assigned)
bool compareEmpty = object.ReferenceEquals(empty1, empty2);

In depth

You basically asked about what situations can occur to the uninitiated. I think my point boils down to avoiding object.ReferenceEquals because it cannot be trusted when used with strings. The reason is that string interning is used when the string is constant in the code, but not always. You cannot rely on this behavior. Though String.Empty and "" are always interned, it is not when the compiler believes the value is changeable. Different optimization options (debug vs release and others) will yield different results.

When do you need ReferenceEquals anyway? With objects it makes sense, but with strings it does not. Teach anybody working with strings to avoid its usage unless they also understand unsafe and pinned objects.

Performance

When performance is important, you can find out that strings are actually not immutable and that using StringBuilder is not always the fastest approach.

A lot of the information I used here is detailed in this excellent article on strings, along with a "how to" for manipulating string in-place (mutable strings).

Update: added code sample
Update: added 'in depth' section (hope someone find this useful ;)
Update: added some links, added section on string params
Update: added estimation for when to switch from strings to stringbuilder
Update: added an extra section on StringBuilder vs String performance, after a remark by Konrad Rudolph

回复收藏 0 原文