好奇心:为什么 Expression<...>编译时运行速度比最小的 DynamicMethod 更快?
我目前正在做一些最后的优化,主要是为了乐趣和学习,并发现了一些给我留下了几个问题的东西。
首先,问题:
- 当我通过使用 DynamicMethod,并使用调试器,当我在反汇编器视图中查看代码时,有什么方法可以单步执行生成的汇编代码吗?调试器似乎只是为我跳过整个方法,
- 或者,如果不可能,我是否可以以某种方式将生成的 IL 代码作为程序集保存到磁盘,以便我可以使用 反射器?
- 为什么我的简单加法方法 (Int32+Int32 => Int32) 的
Expression<...>
版本比最小 DynamicMethod 版本运行得更快?
这是一个简短而完整的演示程序。在我的系统上,输出是:
DynamicMethod: 887 ms
Lambda: 1878 ms
Method: 1969 ms
Expression: 681 ms
我期望 lambda 和方法调用具有更高的值,但 DynamicMethod 版本始终慢 30-50% 左右(变化可能是由于 Windows 和其他程序造成的)。有人知道原因吗?
这是程序:
using System;
using System.Linq.Expressions;
using System.Reflection.Emit;
using System.Diagnostics;
namespace Sandbox
{
public class Program
{
public static void Main(String[] args)
{
DynamicMethod method = new DynamicMethod("TestMethod",
typeof(Int32), new Type[] { typeof(Int32), typeof(Int32) });
var il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Ret);
Func<Int32, Int32, Int32> f1 =
(Func<Int32, Int32, Int32>)method.CreateDelegate(
typeof(Func<Int32, Int32, Int32>));
Func<Int32, Int32, Int32> f2 = (Int32 a, Int32 b) => a + b;
Func<Int32, Int32, Int32> f3 = Sum;
Expression<Func<Int32, Int32, Int32>> f4x = (a, b) => a + b;
Func<Int32, Int32, Int32> f4 = f4x.Compile();
for (Int32 pass = 1; pass <= 2; pass++)
{
// Pass 1 just runs all the code without writing out anything
// to avoid JIT overhead influencing the results
Time(f1, "DynamicMethod", pass);
Time(f2, "Lambda", pass);
Time(f3, "Method", pass);
Time(f4, "Expression", pass);
}
}
private static void Time(Func<Int32, Int32, Int32> fn,
String name, Int32 pass)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (Int32 index = 0; index <= 100000000; index++)
{
Int32 result = fn(index, 1);
}
sw.Stop();
if (pass == 2)
Debug.WriteLine(name + ": " + sw.ElapsedMilliseconds + " ms");
}
private static Int32 Sum(Int32 a, Int32 b)
{
return a + b;
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过
DynamicMethod
创建的方法会经历两次重击,而通过Expression<>
创建的方法不会经历任何重击。这是它的工作原理。 ,我将参数硬编码为 0 和 1):
以下是在
Time
方法中调用fn(0, 1)
的调用顺序(为了便于调试 在我调查的调用DynamicMethod
中,call eax
行如下所示:这似乎是在进行一些堆栈调整以重新排列参数。我推测这是由于使用隐式“this”参数的代表与不使用隐式“this”参数的代表之间的差异造成的。
最后的跳转解析如下:
0098c098 处的代码的其余部分看起来像一个 JIT thunk,其开头在 JIT 之后用
jmp
重写。只有在这个跳转之后,我们才能到达真正的代码:通过
Expression<>
创建的方法的调用序列是不同的 - 它缺少堆栈调配代码。就是这样,从通过eax
的第一次跳转开始:现在,事情是怎么变成这样的呢?
我不知道 LINQ 如何强制 JIT,但我自己知道如何强制 JIT - 通过至少调用该函数一次。更新:我找到了另一种强制 JIT 的方法:对构造函数使用
restrictedSkipVisibility
参数并传递true
。因此,这里是修改后的代码,通过使用隐式“this”参数来消除堆栈调配,并使用备用构造函数进行预编译,以便绑定地址是真实地址,而不是 thunk:这是我系统上的运行时:
<强>更新添加:
我尝试在我的新系统上运行此代码,该系统是运行 Windows 7 x64 的 Core i7 920,安装了 .NET 4 beta 2(mscoree.dll 版本 4.0.30902),并且结果是可变的。
也许这是 Intel SpeedStep 影响结果,或者可能是 Turbo Boost。无论如何,这都是非常烦人的。
其中许多结果都是时序意外,无论是什么导致了 C# 3.5/运行时 v2.0 场景中的随机加速。我必须重新启动才能查看 SpeedStep 或 Turbo Boost 是否造成了这些影响。
The method created via
DynamicMethod
goes through two thunks, while the method created viaExpression<>
doesn't go through any.Here's how it works. Here's the calling sequence for invoking
fn(0, 1)
in theTime
method (I hard-coded the arguments to 0 and 1 for ease of debugging):For the first invocation I investigated,
DynamicMethod
, thecall eax
line comes up like so:This appears to be doing some stack swizzling to rearrange arguments. I speculate that it's owing to the difference between delegates that use the implicit 'this' argument and those that don't.
That jump at the end resolves like so:
The remainder of the code at 0098c098 looks like a JIT thunk, whose start got rewritten with a
jmp
after the JIT. It's only after this jump that we get to real code:The invocation sequence for the method created via
Expression<>
is different - it's missing the stack swizzling code. Here it is, from the first jump viaeax
:Now, how did things get like this?
I don't know how the LINQ forced the JIT, but I do know how to force a JIT myself - by calling the function at least once. UPDATE: I found another way to force a JIT: use the
restrictedSkipVisibility
argumetn to the constructor and passtrue
. So, here's modified code that eliminates stack swizzling by using the implicit 'this' parameter, and uses the alternate constructor to pre-compile so that the bound address is the real address, rather than the thunk:Here's the runtimes on my system:
UPDATED TO ADD:
I tried running this code on my new system, which is an Core i7 920 running Windows 7 x64 with .NET 4 beta 2 installed (mscoree.dll ver. 4.0.30902), and the results are, well, variable.
Perhaps this is Intel SpeedStep affecting results, or possibly Turbo Boost. In any case, it's very annoying.
Many of these results will be accidents of timing, whatever it is that is causing the random speedups in the C# 3.5 / runtime v2.0 scenario. I'll have to reboot to see if SpeedStep or Turbo Boost is responsible for these effects.