为什么 Calli 比委托调用更快？

发布于 2024-11-05 18:59:19 字数 2879 浏览 9 评论 0原文

我正在使用 Reflection.Emit 并发现了很少使用的 EmitCalli。出于好奇，我想知道它与常规方法调用是否有什么不同，所以我编写了下面的代码：

using System;
using System.Diagnostics;
using System.Reflection.Emit;
using System.Runtime.InteropServices;
using System.Security;

[SuppressUnmanagedCodeSecurity]
static class Program
{
    const long COUNT = 1 << 22;
    static readonly byte[] multiply = IntPtr.Size == sizeof(int) ?
      new byte[] { 0x8B, 0x44, 0x24, 0x04, 0x0F, 0xAF, 0x44, 0x24, 0x08, 0xC3 }
    : new byte[] { 0x0f, 0xaf, 0xca, 0x8b, 0xc1, 0xc3 };

    static void Main()
    {
        var handle = GCHandle.Alloc(multiply, GCHandleType.Pinned);
        try
        {
            //Make the native method executable
            uint old;
            VirtualProtect(handle.AddrOfPinnedObject(),
                (IntPtr)multiply.Length, 0x40, out old);
            var mulDelegate = (BinaryOp)Marshal.GetDelegateForFunctionPointer(
                handle.AddrOfPinnedObject(), typeof(BinaryOp));

            var T = typeof(uint); //To avoid redundant typing

            //Generate the method
            var method = new DynamicMethod("Mul", T,
                new Type[] { T, T }, T.Module);
            var gen = method.GetILGenerator();
            gen.Emit(OpCodes.Ldarg_0);
            gen.Emit(OpCodes.Ldarg_1);
            gen.Emit(OpCodes.Ldc_I8, (long)handle.AddrOfPinnedObject());
            gen.Emit(OpCodes.Conv_I);
            gen.EmitCalli(OpCodes.Calli, CallingConvention.StdCall,
                T, new Type[] { T, T });
            gen.Emit(OpCodes.Ret);

            var mulCalli = (BinaryOp)method.CreateDelegate(typeof(BinaryOp));

            var sw = Stopwatch.StartNew();
            for (int i = 0; i < COUNT; i++) { mulDelegate(2, 3); }
            Console.WriteLine("Delegate: {0:N0}", sw.ElapsedMilliseconds);
            sw.Reset();

            sw.Start();
            for (int i = 0; i < COUNT; i++) { mulCalli(2, 3); }
            Console.WriteLine("Calli:    {0:N0}", sw.ElapsedMilliseconds);
        }
        finally { handle.Free(); }
    }

    delegate uint BinaryOp(uint a, uint b);

    [DllImport("kernel32.dll", SetLastError = true)]
    static extern bool VirtualProtect(
        IntPtr address, IntPtr size, uint protect, out uint oldProtect);
}

我在 x86 模式和 x64 模式下运行了代码。结果？

32 位：
代理版本：994
审美干扰镜版本：46
64 位：
代理版本：326
审美干扰镜版本：83

我想现在问题已经很明显了......为什么会有如此巨大的速度差异？

更新：

我还创建了一个 64 位 P/Invoke 版本：

代理版本：284
审美干扰镜版本：77
P/调用版本：31

显然，P/Invoke 更快...这是我的基准测试的问题，还是发生了我不明白的事情？（顺便说一下，我正处于发布模式。）

原文

I was playing around with Reflection.Emit and found about about the little-used EmitCalli. Intrigued, I wondered if it's any different from a regular method call, so I whipped up the code below:

using System;
using System.Diagnostics;
using System.Reflection.Emit;
using System.Runtime.InteropServices;
using System.Security;

[SuppressUnmanagedCodeSecurity]
static class Program
{
    const long COUNT = 1 << 22;
    static readonly byte[] multiply = IntPtr.Size == sizeof(int) ?
      new byte[] { 0x8B, 0x44, 0x24, 0x04, 0x0F, 0xAF, 0x44, 0x24, 0x08, 0xC3 }
    : new byte[] { 0x0f, 0xaf, 0xca, 0x8b, 0xc1, 0xc3 };

    static void Main()
    {
        var handle = GCHandle.Alloc(multiply, GCHandleType.Pinned);
        try
        {
            //Make the native method executable
            uint old;
            VirtualProtect(handle.AddrOfPinnedObject(),
                (IntPtr)multiply.Length, 0x40, out old);
            var mulDelegate = (BinaryOp)Marshal.GetDelegateForFunctionPointer(
                handle.AddrOfPinnedObject(), typeof(BinaryOp));

            var T = typeof(uint); //To avoid redundant typing

            //Generate the method
            var method = new DynamicMethod("Mul", T,
                new Type[] { T, T }, T.Module);
            var gen = method.GetILGenerator();
            gen.Emit(OpCodes.Ldarg_0);
            gen.Emit(OpCodes.Ldarg_1);
            gen.Emit(OpCodes.Ldc_I8, (long)handle.AddrOfPinnedObject());
            gen.Emit(OpCodes.Conv_I);
            gen.EmitCalli(OpCodes.Calli, CallingConvention.StdCall,
                T, new Type[] { T, T });
            gen.Emit(OpCodes.Ret);

            var mulCalli = (BinaryOp)method.CreateDelegate(typeof(BinaryOp));

            var sw = Stopwatch.StartNew();
            for (int i = 0; i < COUNT; i++) { mulDelegate(2, 3); }
            Console.WriteLine("Delegate: {0:N0}", sw.ElapsedMilliseconds);
            sw.Reset();

            sw.Start();
            for (int i = 0; i < COUNT; i++) { mulCalli(2, 3); }
            Console.WriteLine("Calli:    {0:N0}", sw.ElapsedMilliseconds);
        }
        finally { handle.Free(); }
    }

    delegate uint BinaryOp(uint a, uint b);

    [DllImport("kernel32.dll", SetLastError = true)]
    static extern bool VirtualProtect(
        IntPtr address, IntPtr size, uint protect, out uint oldProtect);
}

I ran the code in x86 mode and x64 mode. The results?

32-bit:
Delegate version: 994
Calli version: 46
64-bit:
Delegate version: 326
Calli version: 83

I guess the question's obvious by now... why is there such a huge speed difference?

Update:

I created a 64-bit P/Invoke version as well:

Delegate version: 284
Calli version: 77
P/Invoke version: 31

Apparently, P/Invoke is faster... is this a problem with my benchmarking, or is there something going on I don't understand? (I'm in release mode, by the way.)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘虞初梦 2024-11-12 18:59:19

鉴于您的性能数据，我假设您一定使用 2.0 框架或类似的框架？ 4.0 中的数字要好得多，但“Marshal.GetDelegate”版本仍然较慢。

问题是，并非所有代表都是生而平等的。

托管代码函数的委托本质上只是一个直接函数调用（在 x86 上，这是一个 __fastcall），如果您调用静态函数，则添加一点“switcheroo”（但在 x86 上这只是 3 或 4 条指令）。

另一方面，由“Marshal.GetDelegateForFunctionPointer”创建的委托是对“存根”函数的直接函数调用，它在调用非托管函数之前会产生一些开销（编组等）。在这种情况下，几乎没有编组，并且此调用的编组似乎在 4.0 中得到了相当多的优化（但很可能仍然通过 2.0 上的 ML 解释器） - 但即使在 4.0 中，也有一个 stackWalk 要求非托管代码权限，不是您的审美干扰镜代表的一部分。

我通常发现，如果不了解 .NET 开发团队中的某个人，要了解托管/非托管互操作的情况，最好的办法就是使用 WinDbg 和 SOS 进行一些挖掘。

回复收藏 0 原文