相互引用的不可变对象？

发布于 2024-12-08 10:38:02 字数 498 浏览 0 评论 0原文

今天，我试图了解相互引用的不可变对象。我得出的结论是，如果不使用惰性求值，您就不可能做到这一点，但在这个过程中，我编写了这段（在我看来）有趣的代码。

public class A
{
    public string Name { get; private set; }
    public B B { get; private set; }
    public A()
    {
        B = new B(this);
        Name = "test";
    }
}

public class B
{
    public A A { get; private set; }
    public B(A a)
    {
        //a.Name is null
        A = a;
    }
}

我发现有趣的是，我想不出另一种方法来观察处于尚未完全构造且包含线程的状态的 A 类型对象。为什么这甚至是有效的？还有其他方法可以观察未完全构造的对象的状态吗？

原文

Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.

public class A
{
    public string Name { get; private set; }
    public B B { get; private set; }
    public A()
    {
        B = new B(this);
        Name = "test";
    }
}

public class B
{
    public A A { get; private set; }
    public B(A a)
    {
        //a.Name is null
        A = a;
    }
}

What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

公布 2024-12-15 10:38:02

为什么这有效？

为什么你认为它是无效的？

因为构造函数应该保证它包含的代码在外部代码可以观察到对象的状态之前执行。

正确的。但是编译器不负责维护该不变性。你是。如果您编写的代码破坏了该不变性，并且这样做时会很痛苦，那么停止这样做。

还有其他方法可以观察未完全构造的对象的状态吗？

当然。对于引用类型，显然，所有这些都涉及以某种方式将“this”从构造函数中传递出来，因为保存对存储的引用的唯一用户代码是构造函数。构造函数可能泄漏“this”的一些方式是：

将“this”放在静态字段中并从另一个线程引用它
进行方法调用或构造函数调用并将“this”作为参数
传递进行虚拟调用 - 特别是如果虚方法被派生类重写，因为它在派生类构造函数主体运行之前运行。

我说过唯一拥有引用的用户代码是ctor，但是当然垃圾收集器也拥有一个引用。因此，观察对象处于半构造状态的另一种有趣方式是，该对象是否具有析构函数，并且构造函数抛出异常（或获取异步异常，如线程中止；稍后会详细介绍）。 ) 在这种情况下，对象即将死亡，因此需要被终结，但终结器线程可以看到对象的半初始化状态。现在我们回到了可以看到半构造对象的用户代码！

面对这种情况，析构函数必须具有鲁棒性。析构函数不得依赖于所维护的构造函数设置的对象的任何不变量，因为被销毁的对象可能永远不会被完全构造。

外部代码可以观察到半构造对象的另一种疯狂方式当然是，如果析构函数在上面的场景中看到半初始化的对象，然后将该对象的引用复制到静态对象场，从而确保半构建、半完成的物体免于死亡。 请不要这样做。就像我说的，如果受伤，就不要这样做。

如果您处于值类型的构造函数中，那么事情基本上是相同的，但机制上有一些细微的差异。该语言要求对值类型的构造函数调用创建一个只有构造函数可以访问的临时变量，对该变量进行变异，然后将变异值的结构复制到实际存储中。这确保了如果构造函数抛出异常，那么最终的存储不会处于半变异状态。

请注意，由于结构副本不能保证是原子的，因此另一个线程可能会看到处于半变异状态的存储；如果您处于这种情况，请正确使用锁。此外，还可能在结构体复制过程中途抛出异步异常（例如线程中止）。无论副本是来自临时副本还是“常规”副本，都会出现这些非原子性问题。一般来说，如果存在异步异常，则维持很少的不变量。

实际上，如果 C# 编译器确定不会出现这种情况，则会优化临时分配和复制。例如，如果新值正在初始化一个未被 lambda 封闭且不在迭代器块中的局部变量，则 S s = new S(123); 只是改变 s< /code> 直接。

有关值类型构造函数如何工作的更多信息，请参阅：

揭穿关于值类型的另一个神话

有关 C# 语言语义如何试图让您摆脱困境的更多信息，请参阅：

为什么初始化程序与构造函数以相反的顺序运行？第一部分

为什么初始化程序与构造函数以相反的顺序运行？第二部分

我似乎偏离了当前的主题。在结构体中，您当然可以以相同的方式观察到半构造的对象——将半构造的对象复制到静态字段，以“this”作为参数调用方法，等等。（显然，在更派生的类型上调用虚拟方法对于结构来说不是问题。）并且，正如我所说，从临时存储到最终存储的复制不是原子的，因此另一个线程可以观察半复制的结构。

现在让我们考虑问题的根本原因：如何创建相互引用的不可变对象？

正如您所发现的，通常情况下您不会这样做。如果您有两个相互引用的不可变对象，那么从逻辑上讲，它们形成一个有向循环图。您可能会考虑简单地构建一个不可变的有向图！这样做非常容易。不可变有向图由以下部分组成：

不可变节点的不可变列表，每个节点都包含一个值。
不可变节点对的不可变列表，每个节点对都有图边的起点和终点。

现在让节点 A 和 B 相互“引用”的方法是：

A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);

这样就完成了，您已经得到了一个 A 和 B 相互“引用”的图表。

当然，问题是如果没有G在手，你就无法从A到达B。拥有额外的间接级别可能是不可接受的。

Why is this even valid?

Why do you expect it to be invalid?

Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.

Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.

Are there any other ways to observe the state of an object that is not fully constructed?

Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:

Put "this" in a static field and reference it from another thread
make a method call or constructor call and pass "this" as an argument
make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.

I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!

Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.

Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.

If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.

Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.

In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.

For more information on how value type constructors work, see:

Debunking another myth about value types

And for more information on how C# language semantics try to save you from yourself, see:

Why Do Initializers Run In The Opposite Order As Constructors? Part One

Why Do Initializers Run In The Opposite Order As Constructors? Part Two

I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.

Now let's consider the root cause of your question: how do you make immutable objects that reference each other?

Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:

An immutable list of immutable nodes, each of which contains a value.
An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.

Now the way you make nodes A and B "reference" each other is:

A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);

And you're done, you've got a graph where A and B "reference" each other.

The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.

回复收藏 0 原文

眼趣 2024-12-15 10:38:02

是的，这是两个不可变对象相互引用的唯一方式 - 至少其中一个必须以不完全构造的方式看到另一个。

通常是一个坏主意让 this 从你的构造函数中转义，但如果你对两个构造函数的作用都充满信心，并且它是可变性的唯一替代方案，我不认为这太糟糕了。

回复收藏 0 原文

浅暮の光 2024-12-15 10:38:02

“完全构造”是由您的代码定义的，而不是由语言定义的。

这是从构造函数调用虚拟方法的变体，
一般准则是：不要这样做。

要正确实现“完全构造”的概念，请勿将 this 从构造函数中传递出来。

回复收藏 0 原文

请别遗忘我 2024-12-15 10:38:02

事实上，在构造函数期间泄漏 this 引用将允许您执行此操作；显然，如果在不完整的对象上调用方法，可能会导致问题。至于“观察未完全构造的对象状态的其他方法”：

在构造函数中调用虚拟方法；子类构造函数尚未被调用，因此覆盖可能会尝试访问不完整状态（在子类中声明或初始化的字段等）
反射，可能使用FormatterServices.GetUninitializedObject > （它创建一个对象而根本不调用构造函数）

回复收藏 0 原文

天暗了我发光 2024-12-15 10:38:02

如果您考虑初始化顺序

派生静态字段
派生静态构造
函数派生实例
字段
基静态字段基静态构造
函数基
实例字段基实例构造函数
派生实例构造函数

清楚地通过向上转换，您可以在调用派生实例构造函数之前访问该类（这是您不应该使用构造函数中的虚拟方法的原因。它们可以轻松访问未由构造函数初始化的派生字段/派生类中的构造函数无法使派生类处于“一致”状态）

回复收藏 0 原文

夜无邪 2024-12-15 10:38:02

您可以通过在构造函数中最后实例化 B 来避免该问题：

 public A() 
    { 
        Name = "test"; 
        B = new B(this); 
    }

如果您的建议不可能，那么 A 就不是不可变的。

编辑：已修复，感谢 leppie。

You can avoid the problem by instancing B last in your constuctor:

 public A() 
    { 
        Name = "test"; 
        B = new B(this); 
    }

If what you suggest was not possible, then A would not be immutable.

Edit: fixed, thanks to leppie.

回复收藏 0 原文

掀纱窥君容 2024-12-15 10:38:02

原则是不要让您的 this 对象从构造函数主体中逃逸。

观察此类问题的另一种方法是在构造函数内调用虚方法。

回复收藏 0 原文

暮倦 2024-12-15 10:38:02

如前所述，编译器无法知道对象在什么时候已经构造得足够好而可以使用；因此，它假设从构造函数传递 this 的程序员将知道对象是否已构造得足以满足他的需求。

然而，我想补充一点，对于真正不可变的对象，必须避免将 this 传递给任何在为字段分配最终值之前检查字段状态的代码。这意味着 this 不会传递给任意外部代码，但并不意味着让正在构造的对象将其自身传递给另一个对象以存储后台数据有任何问题。直到第一个构造函数完成后才会实际使用引用。

如果设计一种语言来促进不可变对象的构造和使用，那么将方法声明为仅在构造期间、仅在构造之后或两者之一可用可能会有所帮助；字段可以在构造期间声明为不可取消引用，然后声明为只读；参数同样可以被标记以指示应该是不可取消引用的。在这样的系统下，编译器可以允许构建相互引用的数据结构，但在观察到属性后，任何属性都不会改变。至于这种静态检查的好处是否会超过成本，我不确定，但这可能很有趣。

顺便说一句，一个有用的相关功能是将参数和函数返回声明为短暂的、可返回的或（默认）可持久的。如果参数或函数返回被声明为临时的，则无法将其复制到任何字段，也无法作为可持久参数传递给任何方法。此外，将临时值或可返回值作为可返回参数传递给方法将导致函数的返回值继承该值的限制（如果函数有两个可返回参数，则其返回值将继承其更严格的约束）参数）。 Java 和 .net 的一个主要弱点是所有对象引用都是混杂的。一旦外部代码得到了它，就无法知道谁最终会得到它。如果可以将参数声明为临时参数，则持有对某些内容的唯一引用的代码通常更有可能知道它持有唯一的引用，从而避免不必要的防御性复制操作。此外，如果编译器知道在返回后不存在对闭包之类的引用，则可以回收它们。

As noted, the compiler has no means of knowing at what point an object has been constructed well enough to be useful; it therefore assumes that a programmer who passes this from a constructor will know whether an object has been constructed well enough to satisfy his needs.

I would add, however, that for objects which are intended to be truly immutable, one must avoid passing this to any code which will examine the state of a field before it has been assigned its final value. This implies that this not be passed to arbitrary outside code, but does not imply that there is anything wrong with having an object under construction pass itself to another object for the purpose of storing a back-reference which will not actually be used until after the first constructor has completed.

If one were designing a language to facilitate the construction and use of immutable objects, it may be helpful for it to declare methods as being usable only during construction, only after construction, or either; fields could be declared as being non-dereferenceable during construction and read-only afterward; parameters could likewise be tagged to indicate that should be non-dereferenceable. Under such a system, it would be possible for a compiler to allow the construction of data structures which referred to each other, but where no property could ever change after it was observed. As to whether the benefits of such static checking would outweigh the cost, I'm not sure, but it might be interesting.

Incidentally, a related feature which would be helpful would be the ability to declare parameters and function returns as ephemeral, returnable, or (the default) persistable. If a parameter or function return were declared ephemeral, it could not be copied to any field nor passed as a persistable parameter to any method. Additionally, passing an ephemeral or returnable value as a returnable parameter to a method would cause the return value of the function to inherit the restrictions of that value (if a function has two returnable parameters, its return value would inherit the more restrictive constraint from its parameters). A major weakness with Java and .net is that all object references are promiscuous; once outside code gets its hands on one, there's no telling who may end up with it. If parameters could be declared ephemeral, it would more often be possible for code which held the only reference to something to know it held the only reference, and thus avoid needless defensive copy operations. Additionally, things like closures could be recycled if the compiler could know that no references to them existed after they returned.

回复收藏 0 原文

~没有更多了~