C# 中的字符串不变性

发布于 2024-09-16 01:48:17 字数 831 浏览 9 评论 0原文

我很好奇 StringBuilder 类的内部是如何实现的,所以我决定查看 Mono 的源代码,并将其与 Microsoft 实现的 Reflector 反汇编代码进行比较。本质上,微软的实现使用 char[] 来内部存储字符串表示形式,并使用一堆不安全的方法来操作它。这很简单,没有提出任何问题。但当我发现 Mono 在 StringBuilder 中使用字符串时,我很困惑:

private int _length;
private string _str;

第一个想法是:“多么无意义的 StringBuilder”。但后来我发现可以使用指针来改变字符串:

public StringBuilder Append (string value) 
{
     // ...
     String.CharCopy (_str, _length, value, 0, value.Length);
}

internal static unsafe void CharCopy (char *dest, char *src, int count) 
{
    // ...
    ((short*)dest) [0] = ((short*)src) [0]; dest++; src++;
}    

我曾经用 C/C++ 编程过一点,所以我不能说这段代码让我很困惑,但我认为字符串是完全不可变的(即绝对没有办法改变它)。所以实际的问题是:

  • 我可以创建一个完全不可变的类型吗?
  • 除了性能问题之外,还有什么理由使用这样的代码吗? (更改不可变类型的不安全代码)
  • 字符串本质上是线程安全的吗?

I was curious how the StringBuilder class is implemented internally, so I decided to check out Mono's source code and compare it with Reflector's disassembled code of the Microsoft's implementation. Essentially, Microsoft's implementation uses char[] to store a string representation internally, and a bunch of unsafe methods to manipulate it. This is straightforward and did not raise any questions. But I was confused, when I found that Mono uses a string inside StringBuilder:

private int _length;
private string _str;

The first thought was: "What a senseless StringBuilder". But then I figured out that it is possible to mutate a string using pointers:

public StringBuilder Append (string value) 
{
     // ...
     String.CharCopy (_str, _length, value, 0, value.Length);
}

internal static unsafe void CharCopy (char *dest, char *src, int count) 
{
    // ...
    ((short*)dest) [0] = ((short*)src) [0]; dest++; src++;
}    

I used to program in C/C++ a little, so I can't say that this code confused me much, but I thought that strings are completely immutable (i.e there is absolutely no way to mutate it). So the actual questions are:

  • Can I create a completely immutable type?
  • Is there any reason to use such code apart from performance concerns?
    (unsafe code to change immutable types)
  • Are strings then inherently thread-safe or not?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

記憶穿過時間隧道 2024-09-23 01:48:17

我可以创建一个完全不可变的类型吗?

您可以创建一个 CLR 强制其不变性的类型。然后,您可以使用“unsafe”来关闭 CLR 强制机制。这就是为什么“不安全”被称为“不安全”——因为它关闭了安全系统。在不安全代码中,如果您足够努力,进程中的每个内存字节都可以写入,包括不可变字节和 CLR 中强制不变性的代码

您还可以使用反射来打破不变性。反射和不安全代码都需要授予极高级别的信任。

除了性能问题之外,还有什么理由使用此类代码吗?

当然,使用不可变数据结构的原因有很多。不可变的数据结构岩石。使用不可变数据结构的一些充分理由:

  • 不可变数据结构比可变数据结构更容易推理。当你问“这个列表是空的吗?”当你得到一个答案时,你就知道这个答案不仅是现在正确的,而且是永远正确的。对于可变数据结构,您实际上不能问“这个列表是空的吗?”您只能问“这个列表现在是空的吗?”然后答案逻辑上回答了“这个列表在过去某个时刻是否为空?”的问题。

关于不可变类型的问题的答案永远保持正确这一事实具有安全隐患。假设您有这样的代码:

void Frob(Bar bar)
{
    if (!IsSafe(bar)) throw something;
    DoSomethingDangerous(bar);
}

如果 Bar 是可变类型,那么这里存在竞争条件;检查之后但在危险发生之前,另一个线程上的 bar 可能会变得不安全。如果 Bar 是不可变类型,那么问题的答案始终保持不变,这更安全。 (例如,想象一下,如果您可以在安全检查之后但在打开文件之前更改包含路径的字符串。)

  • 将不可变数据结构视为它们的参数并将它们作为结果返回并且不执行任何副作用称为“纯方法”。纯方法可以被记忆,这会增加内存使用以提高速度,通常会极大地提高速度。

  • 不可变数据结构通常可以在多个线程上同时使用而无需锁定。锁定是为了防止对象在发生突变时创建不一致的状态,但不可变对象没有突变。 (一些所谓的不可变数据结构在逻辑上是不可变的,但实际上在其内部进行了突变;例如,想象一个查找表,它不会更改其内容,但如果它可以推断出下一个查询可能是什么,则它会重新组织其内部结构。这样的数据结构不会自动成为线程安全的。)

  • 当从旧结构构建新结构时,不可变数据结构可以有效地重用其内部部分,从而可以轻松地“拍摄程序状态的快照”而不浪费大量内存。这使得撤消重做操作的实施变得微不足道。它使编写调试工具变得更加容易,这些工具可以向您展示如何到达特定的程序状态。

  • 等等。

字符串本质上是线程安全的吗?

如果每个人都遵守规则,那就是。如果有人使用不安全的代码或私有反射,那么就不再执行规则。您必须相信,如果有人使用高权限代码,那么他们的做法是正确的,并且不会改变字符串。使用你的权力来运行不安全的代码只是为了好的目的;权力越大,责任越大。

那么我是否需要使用锁?

这是一个奇怪的问题。请记住,锁是协作的。仅当访问特定对象的每个人都同意必须使用的锁定策略时,锁才起作用。

如果访问特定存储位置中的特定对象的商定锁定策略是使用锁,则必须使用锁。如果这不是商定的锁定策略,那么使用锁就没有意义;当其他人走进打开的后门时,您正在小心地锁上和打开前门。

如果您知道一个字符串正在被不安全代码突变,并且您不希望看到不一致的部分突变,并且执行不安全突变的代码记录了它在该突变期间取出了特定的锁,那么是的,访问该字符串时需要使用锁。但这种情况非常罕见;理想情况下,没有人会使用不安全的代码来操作另一个线程上的其他代码可访问的字符串,因为这样做是一个非常糟糕的主意。这就是为什么我们要求执行此操作的代码是完全可信的。这就是为什么我们要求此类函数的 C# 源代码发出一个大红旗,上面写着“此代码不安全,请仔细检查!”

Can i create a completely immutable type?

You can create a type where the CLR enforces immutability on it. You can then use "unsafe" to turn off the CLR enforcement mechanisms. That's why "unsafe" is called "unsafe" - because it turns off the safety system. In unsafe code every single byte of memory in the process can be writable if you try hard enough, including both the immutable bytes and the code in the CLR which enforces immutability.

You can also use Reflection to break immutability. Both Reflection and unsafe code require an extremely high level of trust to be granted.

Is there any reason to use such code apart from performance concerns?

Sure, there are lots of reasons to use immutable data structures. Immutable data structures rock. Some good reasons to use immutable data structures:

  • immutable data structures are easier to reason about than mutable data structures. When you ask "is this list empty?" and you get an answer then you know that answer is correct not just now, but forever. With mutable data structures you cannot actually ask "is this list empty?" All you can ask is "is this list empty right now?" and then the answer logically answers the question "was this list empty at some point in the past?"

The fact that the answer to a question about an immutable type stays true forever has security implications. Suppose you have code like this:

void Frob(Bar bar)
{
    if (!IsSafe(bar)) throw something;
    DoSomethingDangerous(bar);
}

If Bar is a mutable type then there is a race condition here; bar could be made unsafe on another thread after the check but before something dangerous happens. If Bar is an immutable type then the answer to the question stays the same throughout, which is much safer. (Imagine if you could mutate a string containing a path after the security check but before the file was opened, for example.)

  • methods which take immutable data structures as their arguments and return them as their results and perform no side effects are called "pure methods". Pure methods can be memoized, which trades increased memory use for increased speed, often enormously increased speed.

  • immutable data structures can often be used on multiple threads simultaneously without locking. Locking is there to prevent creation of inconsistent state of an object in the face of a mutation, but immutable objects don't have mutations. (Some so-called immutable data structures are logically immutable but actually do mutations inside themselves; imagine for example a lookup table which does not change its contents, but does reorganize its internal structure if it can deduce what the next query is likely to be. Such a data structure would not be automatically threadsafe.)

  • immutable data structures that efficiently re-use their internal parts when a new structure is built from an old one make it easy to "take a snapshot" of the state of a program without wasting lots of memory. That makes undo-redo operations trivial to implement. It makes it easier to write debugging tools that can show you how you got to a particular program state.

  • and so on.

Are strings then inherently thread-safe or not?

If everyone plays by the rules, they are. If someone uses unsafe code or private reflection then there is no rule enforcement anymore. You have to trust that if someone is using high-privilege code then they are doing so correctly and not mutating a string. Use your power to run unsafe code only for good; with great power comes great responsibility.

So do I need to use locks or not?

That is a strange question. Remember, locks are co-operative. Locks only work if everyone accessing a particular object agrees upon the locking strategy that must be used.

You have to use locks if the agreed-upon locking strategy for accessing particular object in a particular storage location is to use locks. If that isn't the agreed-upon locking strategy then using locks is pointless; you're carefully locking and unlocking the front door while someone else is walking in the open back door.

If you have a string which you know is being mutated by unsafe code, and you don't want to see inconsistent partial mutations, and the code which is doing the unsafe mutation documents that it takes out a particular lock during that mutation, then yes, you need to use locks when accessing that string. But this situation is very rare; ideally no one would use unsafe code to manipulate a string accessible by other code on another thread, because doing so is an incredibly bad idea. That's why we require that code that does so is fully trusted. And that's why we require that the C# source code for such a function wave a big red flag that says "this code is unsafe, review it carefully!"

稀香 2024-09-23 01:48:17

如果不安全,也有可能改变 C# 中的字符串 (IIRC)。

If you go unsafe, it is possible to mutate strings in C# too (IIRC).

如何视而不见 2024-09-23 01:48:17

不存在完全不可变的类型,不可变的类是因为它不允许任何外部代码更改它。使用反射或不安全的代码您仍然可以更改它的值。

您可以使用readonly关键字创建不可变变量,但这仅适用于值类型。如果在引用类型上使用它,则只有引用受到保护,而不是它指向的对象。

使用不可变类型有几个原因,例如性能和鲁棒性。

事实上,已知字符串是不可变的(在 StringBuilder 之外),这意味着编译器可以基于此进行优化。编译器永远不需要生成代码来复制字符串,以保护它在作为参数传递时不被更改。

从不可变类型创建的对象也可以在线程之间安全地传递。由于它们无法更改,因此不存在不同线程同时更改它们的风险,因此无需同步对它们的访问。

不可变类型可用于避免编码错误。如果您知道不应更改某个值,那么通常最好确保它不会被错误地更改。

There is no completely immutable type, a class that is immutable is that because it doesn't allow any outside code to alter it. Using reflection or unsafe code you can still change it's values.

You can use the readonly keyword to create an immutable variable, but that works only for value types. If you use it on a reference type, it's only the reference that is protected, not the object that it points to.

There are several reasons for immutable types, like performance and robustness.

The fact that strings are known to be immutable (outside the StringBuilder) means that the compiler can make optimisations based on that. The compiler never has to produce code to copy a string to protect it from being changed when it's passed as a parameter.

Objects created from immutable types can also be safely passed between threads. As they can't be changed, there is no risk for different threads changing them at the same time, so there is no need to synchonise access to them.

Immutable types can be used to avoid coding errors. If you know that a value should not be changed, it's generally a good idea to make sure that it can't be changed by mistake.

执笔绘流年 2024-09-23 01:48:17

这里没有黑魔法在起作用。字符串类是不可变的,因为它没有任何允许您修改内部字符串的公共字段、属性或方法。任何改变字符串的方法都会返回一个新的字符串实例。当然,您也可以在自己的课程中执行此操作。

There is no black magic at work here. The string class is immutable simply because it doesn't have any public fields, properties or methods that allows you to modify the internal string. Any method that mutates a string returns a new string instance. You of course can do this as well with your own classes.

夏末染殇 2024-09-23 01:48:17

我可以创建一个完全不可变的类型吗?

是的。有一个构造函数来设置私有字段,仅获取属性而不获取方法。

除了性能问题之外,还有什么理由使用此类代码吗?

一个例子:这种类型不需要从多个并发线程安全地使用锁,这使得正确的代码更容易编写(没有锁会出错)。

另外:足够特权的代码总是有可能绕过 .NET 保护:要么反射读取和写入私有字段,要么不安全代码直接操作对象的内存。

在 .NET 之外也是如此,特权进程(即具有“上帝”特权之一的进程或线程令牌,例如启用取得所有权)可以侵入任何其他进程加载 dll、注入运行任意代码的线程、读取或写内存(包括覆盖执行预防等)。系统的完整性取决于系统所有者的合作。

Can i create a completely immutable type?

Yes. Have a constructor to set private fields, get only properties and no methods.

Is there any reason to use such code apart from performance concerns?

One example: such types don't require locks to be safely used from multiple concurrent threads, this makes correct code easier to write (no locks to get wrong).

Additional: it is always possible for sufficiently privileged code to bypass .NET protections: either reflection to read and write to private fields, or unsafe code to directly manipulate an object's memory.

This is true outside of .NET, a privileged process (i.e. with a process or thread token with one of the "God" privileges, e.g. Take Ownership enabled) can break into any other process load dlls, inject threads running arbitrary code, read or write memory (including overriding execute prevention etc.). The integrity of the system is only as strong as the cooperation of the owner of the system.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文