callvirt 的底层是如何工作的?
我试图了解 CLR 如何实现引用类型和多态性。我参考了 Don Box 的 Essential .Net Vol 1,它对于简化大部分内容有很大帮助。但是,当我尝试使用一些 IL 代码以更好地理解时,我对以下问题感到困惑/困惑。
我会尽力解释这个问题。 考虑以下代码
class Base
{
public void m()
{
Console.WriteLine("Base.m");
}
}
class Derived : Base
{
public void m()
{
Console.WriteLine("Derived.m");
}
}
现在考虑一个简单的控制台应用程序,其主方法的 IL 如下所示。 我手动调整了编译器创建的 IL,以理解并再次使用 ILAsm.exe 进行组装。
.class private auto ansi beforefieldinit Console1.Program
extends [mscorlib]System.Object
{
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 44 (0x2c)
.maxstack 1
.locals init ([0] class Console1.Base d)
nop
newobj instance void Console1.Base::.ctor()
stloc.0
ldloc.0
callvirt instance void Console1.Derived::m()
nop
call string [mscorlib]System.Console::ReadLine()
pop
ret
} // end of method Program::Main
} // end of class Console1.Program
我希望此代码不运行,因为对象引用指向 Base 的对象,并且该方法无法运行基础对象的表将有一个在派生类中定义的方法 m() 的条目。
但神奇的是,这段代码执行了 Derived.m()!
所以,上面的代码中有两个问题我不明白:
下面IL代码中指定的Type有什么意义?我尝试通过将其更改为不同类型(例如System.Exception!!)进行实验,并且没有报告错误。为什么??
.locals init ([0] class Console1.Base d)
- callvirt 到底是如何工作的?调用是如何路由到 Derived.m() 的?
提前致谢!!
问候, 阿杰
I am trying to understand how the CLR implements reference types and polymorphism. I have referred to Don Box's Essential .Net Vol 1 which is a great help to calrify most of the stuff. But I am stuck/confused by the following issue when I tried to play around with some IL code to understand better.
I will try to explain the problem as best as I can.
Consider the following code
class Base
{
public void m()
{
Console.WriteLine("Base.m");
}
}
class Derived : Base
{
public void m()
{
Console.WriteLine("Derived.m");
}
}
Now consider a simple console application with IL of the main method shown below.
I tweaked the IL created by compiler manually to understand and assembled again with ILAsm.exe
.class private auto ansi beforefieldinit Console1.Program
extends [mscorlib]System.Object
{
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 44 (0x2c)
.maxstack 1
.locals init ([0] class Console1.Base d)
nop
newobj instance void Console1.Base::.ctor()
stloc.0
ldloc.0
callvirt instance void Console1.Derived::m()
nop
call string [mscorlib]System.Console::ReadLine()
pop
ret
} // end of method Program::Main
} // end of class Console1.Program
I was expecting this code NOT to run as the object reference is pointing to an object of Base and there is no way the method table of a base object will have an entry for the method m() defined in Derived class.
But magically this code executes the Derived.m()!!
So, there are two questions I don't understand in the above code:
What is the significance of the Type specified in the below IL code? I have tried to experiment by changing this to different types (e.g System.Exception!!) and no errors are reported. Why??
.locals init ([0] class Console1.Base d)
- How exactly does callvirt works? How did the call get routed to Derived.m()?
Thanks in advance!!
Regards,
Ajay
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我的猜测是,抖动意识到
Derived.m
不是虚拟的,因此永远无法指向其他任何地方。因此,callvirt 减少为空检查和调用,而不是通过 v 表进行调用。尝试将
Derived.m
设为虚拟。我打赌它会抛出。即使在调用非虚方法时,如果 C# 编译器无法证明
this!=null
,它也会发出callvirt
指令,因此它会进行 null 检查。在这种情况下,抖动足够智能,可以用具有固定地址(甚至内联地址)的正常调用来替换虚拟调用。并且您应该检查您的代码是否可验证。我认为不是。
My guess is that the jitter realizes that
Derived.m
isn't virtual and thus can never point anywhere else. So thecallvirt
reduces to a null-check and a call instead of a call through the v-table.Try making
Derived.m
virtual. I bet it'll throw then.The C# compiler emits
callvirt
instructions even when calling a non virtual methods if it can't prove thatthis!=null
so it gets a null-check. And the jitter is intelligent enough in that case to replace the virtual call by a normal call with a fixed address(or even inline it).And you should check if you're code is verifiable. I think it isn't.
您的代码不可验证(通过
peverify
运行它)。我写了一篇博客文章 有关 callvirt 底层工作原理的信息可能会帮助您了解它的作用以及代码的执行方式。请记住,如果作为普通程序运行,CLR 确实会尝试执行不可验证的代码;只有当它确实引起问题时,它才会失效。
在您的示例中,在
Base
实例上调用Derived.m()
是有效的,因为对象实例的实际运行时二进制表示形式是相同的;this
对象基本相同,没有访问对象的实例字段。尝试将实例字段访问放入这两种方法中,看看会发生什么......
Your code isn't verifiable (run it through
peverify
). I've written a blog post about how callvirt works under-the-hood that might help you understand what it does, and how your code executes.Bear in mind that the CLR does try to execute non-verifiable code if run as a normal program; only if it actually causes a problem does it bork.
In your example, calling
Derived.m()
on an instance ofBase
works because the actual run-time binary representation of the object instances is the same; thethis
object is basically the same, and no instance fields of the objects are accessed.Try putting an instance field access into both methods and see what happens...
请注意,默认情况下,不会验证从本地计算机执行的代码。这意味着可以编写和执行无效代码。我怀疑你的主要功能不会按原样通过。 PEVerify 工具可以检查程序集以确保代码类型安全,或者您可以通过 安全策略管理。
locals 语句中类型的目的是声明局部变量的类型。这提供了类型验证器所需的信息,以验证对局部变量的成员访问是否在正确类型的对象上进行。
Callvirt 可以通过多种方式实现。最可能的方式与 C++ vtable 的实现方式相同:一个对象包含一个函数指针表。每个函数都位于表中预定义的偏移处。要调用该函数,将加载并调用预定义偏移处的地址。请注意,在某些情况下,如果对象的类型已知,则 CLR 可以执行其他优化。是否做到这一点,我不知道。
please note that by default, code executed from the local machine is not verified. This means that invalid code can be written and executed. I suspect your main function will not pass as-is. The PEVerify tool can check an assembly to ensure the code is type-safe, or you can enable these checks for code from the local machine or from a specific location via Security Policy Administration.
The purpose of the type in the locals statement is to declare the type of the local variable. This provides the information needed by the type verifier to verify that member accesses on the local variable are operating on an object of the correct type.
Callvirt could be implemented several ways. The most likely way is in the same way C++ vtables are implemented: An object contains a table of function pointers. Each function is located at a predefined offset in the table. To call the function, the address at the predefined offset is loaded and called. Note that in some cases, the CLR could do additional optimizations if the type of the object is known. Whether this is done, I don't know.
我认为这是 JIT 编译器优化的副作用。如果 m() 方法是虚拟方法,则必须生成机器代码以从对象中挖掘方法表指针,然后进行虚拟调用。但这个方法不是虚拟的,并且 JIT 编译器已经知道派生类的方法表指针。因此它绕过了指针检索并直接提供它。使通话按照您观察到的方式进行。您可以通过检查生成的机器代码来验证我的猜测。
是的,IL 验证器在这里没有得分。您可以通过使用 Derived.m() 方法修改仅在 Derived 中声明的字段来使其变得更有趣。我见过太多因 AccessViolation 而导致 Reflection.Emit 代码崩溃的情况,对此我感到非常惊讶。然而,这很可能是故意的,无论如何都不需要验证 IL 是否崩溃。不确定,利用此类验证漏洞并不常见。值得庆幸的是。
I think this is a side-effect of a JIT compiler optimization. If the m() method was virtual, it would have to generate the machine code to dig the method table pointer out of the object, then make the virtual call. But this method isn't virtual and the JIT compiler already knows the method table pointer for the Derived class. So it bypasses the pointer retrieval and supplies it directly. Making the call work as you observed. You can verify my guess by checking the generated machine code.
Yeah, the IL verifier isn't scoring any points here. You could make it more interesting by having the Derived.m() method tinker with a field that's only declared in Derived. I've seen too much Reflection.Emit code crash with an AccessViolation to be greatly surprised by this. It however may well be intentional, no need to verify IL that crashes anyway. Not sure, exploiting these kind of verification loopholes isn't (yet) common. Thankfully.
有关其如何在幕后更深入地工作的更多信息,请查看此 StackExchange 问题/答案:
callvirt .NET 指令如何用于接口?
For more information about how this works even deeper under the hood, check out this StackExchange question/answer:
How does the callvirt .NET instruction work for interfaces?