Microsoft CLR 中异步方法调用的 ref 值类型参数存储在哪里?

发布于 2024-09-26 23:19:16 字数 1860 浏览 11 评论 0原文

我知道这是一个实施细节。我实际上很好奇 Microsoft 的 CLR 中的实现细节是什么。

现在,请耐心等待,因为我在大学里没有学过计算机科学,所以我可能错过了一些基本原理。

但我认为,我对目前在 CLR 中实现的“堆栈”和“堆”的理解是可靠的。我不会做出一些不准确的概括性声明,例如“值类型存储在堆栈上”。但是,在最常见的情况下——值类型的普通局部变量,要么作为参数传递,要么在方法中声明,并且不包含在闭包内——值类型变量存储在堆栈中(再次,在 Microsoft 的 CLR 中)。

我想我不确定的是 ref 值类型参数的来源。

最初我的想法是,如果调用堆栈看起来像这样(左=底部):

A() -> B() -> C()

...那么在 A 范围内声明并作为 ref 参数传递给 B 的局部变量仍然可以存储在堆叠——不能吗? B 只需要在 A 的框架内存储该局部变量的内存位置(如果这不是正确的术语,请原谅我;我认为我很清楚我要做什么)无论如何,意思是)。

然而,当我想到我可以这样做时,我意识到这不可能是严格正确的:

delegate void RefAction<T>(ref T arg);

void A()
{
    int x = 100;

    RefAction<int> b = B;

    // This is a non-blocking call; A will return immediately
    // after this.
    b.BeginInvoke(ref x, C, null);
}

void B(ref int arg)
{
    // Putting a sleep here to ensure that A has exited by the time
    // the next line gets executed.
    Thread.Sleep(1000);

    // Where is arg stored right now? The "x" variable
    // from the "A" method should be out of scope... but its value
    // must somehow be known here for this code to make any sense.
    arg += 1;
}

void C(IAsyncResult result)
{
    var asyncResult = (AsyncResult)result;
    var action = (RefAction<int>)asyncResult.AsyncDelegate;

    int output = 0;

    // This variable originally came from A... but then
    // A returned, it got updated by B, and now it's still here.
    action.EndInvoke(ref output, result);

    // ...and this prints "101" as expected (?).
    Console.WriteLine(output);
}

所以在上面的例子中,哪里是x(在A'中) s范围)存储?这是如何运作的?是盒装的吗?如果不是,尽管它是值类型,但现在是否会受到垃圾回收的影响?还是可以立即回收内存?

我对这个冗长的问题表示歉意。但即使答案非常简单,也许这会对将来发现自己想知道同样事情的其他人提供信息。

I understand that this is an implementation detail. I'm actually curious what that implementation detail is in Microsoft's CLR.

Now, bear with me as I did not study CS in college, so I might have missed out on some fundamental principles.

But my understanding of the "stack" and the "heap" as implemented in the CLR as it stands today is, I think, solid. I'm not going to make some inaccurate umbrella statement such as "value types are stored on the stack," for example. But, in most common scenarios -- plain vanilla local variables, of value type, either passed as parameters or declared within the method and not contained inside a closure -- value type variables are stored on the stack (again, in Microsoft's CLR).

I guess what I'm unsure of is where ref value type parameters come in.

Originally what I was thinking was that, if the call stack looks like this (left = bottom):

A() -> B() -> C()

...then a local variable declared within the scope of A and passed as a ref parameter to B could still be stored on the stack--couldn't it? B would simply need the memory location where that local variable was stored within A's frame (forgive me if that isn't the right terminology; I think it's clear what I mean, anyway).

I realized this couldn't be strictly true, though, when it occurred to me that I could do this:

delegate void RefAction<T>(ref T arg);

void A()
{
    int x = 100;

    RefAction<int> b = B;

    // This is a non-blocking call; A will return immediately
    // after this.
    b.BeginInvoke(ref x, C, null);
}

void B(ref int arg)
{
    // Putting a sleep here to ensure that A has exited by the time
    // the next line gets executed.
    Thread.Sleep(1000);

    // Where is arg stored right now? The "x" variable
    // from the "A" method should be out of scope... but its value
    // must somehow be known here for this code to make any sense.
    arg += 1;
}

void C(IAsyncResult result)
{
    var asyncResult = (AsyncResult)result;
    var action = (RefAction<int>)asyncResult.AsyncDelegate;

    int output = 0;

    // This variable originally came from A... but then
    // A returned, it got updated by B, and now it's still here.
    action.EndInvoke(ref output, result);

    // ...and this prints "101" as expected (?).
    Console.WriteLine(output);
}

So in the example above, where is x (in A's scope) stored? And how does this work? Is it boxed? If not, is it subject to garbage collection now, despite being a value type? Or can the memory immediately be reclaimed?

I apologize for the long-winded question. But even if the answer is quite simple, maybe this will be informative to others who find themselves wondering the same thing in the future.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

别闹i 2024-10-03 23:19:16

我不相信当您将 BeginInvoke()EndInvoke()refout 一起使用时> 参数,您真正通过 ref 传递变量。 事实上,我们必须使用 ref 参数调用 EndInvoke()也应该是这方面的线索。

让我们更改您的示例来演示我所描述的行为:

void A()
{
    int x = 100;
    int z = 400;

    RefAction<int> b = B;

    //b.BeginInvoke(ref x, C, null);
    var ar = b.BeginInvoke(ref x, null, null);
    b.EndInvoke(ref z, ar);

    Console.WriteLine(x);  // outputs '100'
    Console.WriteLine(z);  // outputs '101'
}

如果您现在检查输出,您将看到 x 的值实际上没有变化。但是 z 确实 现在包含更新值。

我怀疑编译器改变了通过 ref 传递变量的语义当您使用异步 Begin/EndInvoke 方法时。

查看此代码生成的 IL 后,BeginInvoke()ref 参数似乎是仍然通过通过引用。虽然 Reflector 没有显示此方法的 IL,但我怀疑它只是不将参数作为 ref 参数传递,而是在幕后创建一个单独的变量以传递给 B()。当您随后调用 EndInvoke() 时,您必须再次提供 ref 参数以从异步状态检索值。这些参数很可能实际上存储为最终检索其值所需的 IAsyncResult 对象的一部分(或与其结合)。

让我们考虑一下为什么该行为可能会以这种方式工作。当您对方法进行异步调用时,您是在单独的线程上执行此操作。该线程有自己的堆栈,因此不能使用别名 ref/out 变量的典型机制。但是,为了从异步方法获取任何返回值,您最终需要调用 EndInvoke() 来完成操作并检索这些值。但是,对 EndInvoke() 的调用可能很容易发生在与对 BeginInvoke() 的原始调用或方法的实际主体完全不同的线程上。显然,调用堆栈不是存储此类数据的好地方 - 特别是因为一旦异步操作完成,用于异步调用的线程可能会重新用于不同的方法。因此,需要除堆栈之外的某种机制来将返回值和 out/ref 参数从被回调的方法“编组”到最终使用它们的站点。

我相信这种机制(在 Microsoft .NET 实现中)是 IAsyncResult 对象。事实上,如果您在调试器中检查 IAsyncResult 对象,您会注意到在非公共成员中存在 _replyMsg,其中包含 Properties代码>集合。该集合包含 __OutArgs__Return 等元素,其数据似乎反映了它们的同名元素。

编辑: 这是我想到的关于异步委托设计的理论。 看起来很可能是 BeginInvoke() 的签名EndInvoke() 被选择为彼此尽可能相似,以避免混淆并提高清晰度。 BeginInvoke() 方法实际上并不需要接受 ref/out 参数 - 因为它只需要它们的值...而不是它们的标识(因为它永远不会将任何东西分配给他们)。然而,如果有一个采用 intBeginInvoke() 调用和采用 intEndInvoke() 调用,这确实很奇怪(例如)一个ref int。现在,可能存在技术原因导致开始/结束调用应该具有相同的签名 - 但我认为清晰和对称的好处足以验证这样的设计。

当然,所有这些都是 CLR 和 C# 编译器的实现细节,并且将来可能会发生变化。然而,有趣的是,如果您期望传递给 BeginInvoke() 的原始变量实际上会被修改,则可能会出现混淆。它还强调了调用 EndInvoke() 来完成异步操作的重要性。

也许 C# 团队的某人(如果他们看到这个问题)可以提供有关此功能背后的细节和设计选择的更多见解。

I don't believe that when you use BeginInvoke() and EndInvoke() with ref or out arguments you are truly passing the variables by ref. The fact that we have to call EndInvoke() with a ref parameter as well should be a clue to this.

Let's change your example to demonstrate the behavior I describe:

void A()
{
    int x = 100;
    int z = 400;

    RefAction<int> b = B;

    //b.BeginInvoke(ref x, C, null);
    var ar = b.BeginInvoke(ref x, null, null);
    b.EndInvoke(ref z, ar);

    Console.WriteLine(x);  // outputs '100'
    Console.WriteLine(z);  // outputs '101'
}

If you examine the output now, you will see that the value of x is actually unchanged. But z does now contain the update value.

I suspect that the compiler alters the semantics of passing variables by ref when you use the asynchronous Begin/EndInvoke methods.

After taking a look at the IL produced by this code, it appears that ref arguments to BeginInvoke() are still passed by ref. While Reflector doesn't show the IL for this method, I suspect that it simply doesn't pass along the parameter as a ref argument, but instead creates a separate variable behind the scenes to pass to B(). When you then call EndInvoke() you must supply a ref argument again to retrieve the value from the async state. It's likely that such arguments are actually stored as part of (or in conjunction with) the IAsyncResult object which is needed to ultimately retrieve their values.

Let's think about why the behavior likely works this way. When you make an async call to a method, you are doing so on a separate thread. This thread has its own stack and so cannot use the typical mechanism of aliasing ref/out variables. However, in order to get any returned values from an async method, you need to eventually call EndInvoke() to complete the operation and retrieve these values. However, the call to EndInvoke() could just as easily occur on a completely different thread than the original call to BeginInvoke() or the actual body of the method. Clearly the call stack is not a good place to store such data - especially since the thread used for the async call could be re-purposed for a different method once the async operation completes. As a result, some mechanism other than the stack is needed to "marshal" the return value and out/ref arguments from the method being called back to the site where they will ultimately be used.

I believe this mechanism (in the Microsoft .NET implementation) is the IAsyncResult object. In fact, if you examine the IAsyncResult object in the debugger, you will notice that in the non-public members there exists _replyMsg, which contains a Properties collection. This collection contains elements like __OutArgs and __Return whose data appear to reflect their namesakes.

EDIT: Here's a theory about the async delegate design, that occurs to me. It seems likely that the signatures of BeginInvoke() and EndInvoke() were chosen to be as similar as possible to each other to avoid confusion and improve clarity. The BeginInvoke() method doesn't actually need to accept ref/out arguments - since it only needs their value ... not their identify (as it's never going to assign anything back to them). However it would be really odd (for example) to have a BeginInvoke() call that takes an int and an EndInvoke() call that takes a ref int. Now, it's possible that there are technical reasons why begin/end calls should have identical signatures - but I think that the benefits of clarity and symmetry are sufficient to validate such a design.

All of this is, of course, an implementation detail of the CLR and C# compiler and could change in the future. It is interesting, however, that there is the possibility for confusion - if you expect that the original variable passed to BeginInvoke() will actually be modified. It also underscores the importance of calling EndInvoke() to complete an async operation.

Perhaps someone from the C# team (if they see this question) could offer more insight into the details and design choices behind this functionality.

烦人精 2024-10-03 23:19:16

CLR 对此完全不关心,JIT 编译器的工作是生成适当的机器代码来获取通过引用传递的参数。这本身就是一个实现细节,不同的机器架构有不同的抖动。

但常见的做法与 C 程序员的做法完全一样,他们传递一个指向变量的指针。该指针在 CPU 寄存器或堆栈帧中传递,具体取决于方法采用的参数数量。

变量所在的位置并不重要,指向调用者堆栈帧中变量的指针与指向存储在堆上的引用类型对象的成员的指针一样有效。垃圾收集器通过指针值知道它们之间的区别,并在移动对象时根据需要调整指针。

您的代码片段会调用 .NET 框架内的魔法,该框架需要从一个线程到另一线程进行封送调用。这与远程处理工作的管道相同。要进行此类调用,必须在执行调用的线程上创建一个新的堆栈帧。远程处理代码使用委托的类型定义来了解堆栈帧应该是什么样子。它可以处理通过引用传递的参数,它知道它需要在堆栈帧中分配一个槽来存储指向的变量,在您的情况下i。 BeginInvoke 调用初始化远程堆栈帧中的 i 变量的副本。

同样的事情发生在 EndInvoke() 调用上,结果从线程池线程中的堆栈帧复制回来。关键点是,实际上并没有指向 i 变量的指针,而是指向它的副本的指针。

不太确定这个答案是否非常清楚,了解 CPU 的工作原理和一点 C 知识,所以指针的概念是水晶的,会有很大帮助。

The CLR is completely out of the loop on this, it is the job of the JIT compiler to generate the appropriate machine code to get an argument passed by reference. Which is an implementation detail in itself, there are different jitters for different machine architectures.

But the common ones do it exactly the way a C programmer does it, they pass a pointer to the variable. That pointer is passed in a CPU register or on the stack frame, depending on how many arguments the method takes.

Where the variable lives doesn't matter, a pointer to a variable in the stack frame of the caller is just as valid as a pointer to member of a reference type object that's stored on the heap. The garbage collector knows the difference between them, by virtue of the pointer value, adjusting the pointer if necessary when it moves an object.

Your code snippet invokes magic inside the .NET framework that's required to make marshaling calls from one thread to another work. This is the same kind of plumbing that makes Remoting works. To make such a call, a new stack frame has to be created on the thread where the call is performed. The remoting code uses the type definition of the delegate to know what that stack frame should look like. And it can deal with arguments passed by reference, it knows that it needs to allocate a slot in the stack frame to store the pointed-to variable, i in your case. The BeginInvoke call initializes the copy of the i variable in the remoted stack frame.

The same thing happens on the EndInvoke() call, the results are copied back from the stack frame in the threadpool thread. Key point is that there isn't actually a pointer to the i variable, there's a pointer to the copy of it.

Not so sure this answer is very clear, having some understanding of how CPUs work and a bit of C knowledge so the concept of a pointer is crystal can help a lot.

别在捏我脸啦 2024-10-03 23:19:16

查看使用 Reflector 生成的代码即可找到答案。 我的猜测是,会生成一个包含 x 的匿名类,就像使用闭包(引用当前堆栈帧中的变量的 lambda 表达式)时一样。 忘记这一点并阅读其他答案。

Look at the code generated with reflector to find out. My guess is that an anonymous class containing x is generated, like when you use closures (lambda expressions that reference variables in the current stack frame). Forget about this and read the other answers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文