为什么 Func<>从 Expression> 创建比 Func<> 慢直接声明?

发布于 2024-10-03 10:09:26 字数 1378 浏览 8 评论 0 原文

为什么通过 .Compile() 从 Expression> 创建 Func<> 比仅使用 Func<> 要慢得多?直接声明?

我刚刚从使用直接声明的 Func 更改为在我正在开发的应用程序中从 Expression> 创建的函数我注意到性能下降了。

我刚刚做了一个小测试,从表达式创建的 Func<> 所花费的时间“几乎”是直接声明的 Func<> 时间的两倍。

在我的机器上,Direct Func<> 大约需要 7.5 秒,Expression> 大约需要 12.6 秒。

这是我使用的测试代码(运行 Net 4.0)

// Direct
Func<int, Foo> test1 = x => new Foo(x * 2);

int counter1 = 0;

Stopwatch s1 = new Stopwatch();
s1.Start();
for (int i = 0; i < 300000000; i++)
{
 counter1 += test1(i).Value;
}
s1.Stop();
var result1 = s1.Elapsed;



// Expression . Compile()
Expression<Func<int, Foo>> expression = x => new Foo(x * 2);
Func<int, Foo> test2 = expression.Compile();

int counter2 = 0;

Stopwatch s2 = new Stopwatch();
s2.Start();
for (int i = 0; i < 300000000; i++)
{
 counter2 += test2(i).Value;
}
s2.Stop();
var result2 = s2.Elapsed;



public class Foo
{
 public Foo(int i)
 {
  Value = i;
 }
 public int Value { get; set; }
}

我怎样才能恢复性能?

我可以做些什么来让从 Expression> 创建的 Func<> 像直接声明的那样执行吗?

Why is a Func<> created from an Expression<Func<>> via .Compile() considerably slower than just using a Func<> declared directly ?

I just changed from using a Func<IInterface, object> declared directly to one created from an Expression<Func<IInterface, object>> in an app i am working on and i noticed that the performance went down.

I have just done a little test, and the Func<> created from an Expression takes "almost" double the time of an Func<> declared directly.

On my machine the Direct Func<> takes about 7.5 seconds and the Expression<Func<>> takes about 12.6 seconds.

Here is the test code I used (running Net 4.0)

// Direct
Func<int, Foo> test1 = x => new Foo(x * 2);

int counter1 = 0;

Stopwatch s1 = new Stopwatch();
s1.Start();
for (int i = 0; i < 300000000; i++)
{
 counter1 += test1(i).Value;
}
s1.Stop();
var result1 = s1.Elapsed;



// Expression . Compile()
Expression<Func<int, Foo>> expression = x => new Foo(x * 2);
Func<int, Foo> test2 = expression.Compile();

int counter2 = 0;

Stopwatch s2 = new Stopwatch();
s2.Start();
for (int i = 0; i < 300000000; i++)
{
 counter2 += test2(i).Value;
}
s2.Stop();
var result2 = s2.Elapsed;



public class Foo
{
 public Foo(int i)
 {
  Value = i;
 }
 public int Value { get; set; }
}

How can i get the performance back ?

Is there anything i can do to get the Func<> created from the Expression<Func<>> to perform like one declared directly ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

抽个烟儿 2024-10-10 10:09:27

我已将其放入 BenchmarkDotNet 中以获得一些更可靠的数字。据我所知,Expression 比 .NET 7 上的 Func 快一点。 这个答案可能解释了原因。经过一些重复的基准测试运行后,我得到了一个典型结果:

Method Mean Error StdDev
Func 4.274 ns 0.1302 ns 0.1447 ns
Expression 3.598 ns 0.1055 ns 0.1903 ns

这是我的硬件和软件:

BenchmarkDotNet v0.13.9+228a464e8be6c580ad9408e98f18813f6407fb5a, Windows 11 (10.0.22631.2338)
AMD Ryzen 9 5900HX with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.100-rc.1.23455.8
  [Host]     : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2

这是基准代码:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq.Expressions;

BenchmarkRunner.Run<FuncVsExpression>();

public class FuncVsExpression
{
    private Func<int, Foo> myFunc;
    private Func<int, Foo> myExpressionFunc;

    [GlobalSetup]
    public void Setup()
    {
        this.myFunc = x => new Foo(x * 2);

        Expression<Func<int, Foo>> expression = x => new Foo(x * 2);
        this.myExpressionFunc = expression.Compile();
    }

    [Benchmark]
    public Foo Func() => myFunc(42);

    [Benchmark]
    public Foo Expression() => myExpressionFunc(42);
}

public class Foo
{
    public Foo(int i)
    {
        Value = i;
    }

    public int Value { get; set; }
}

I've tossed this into BenchmarkDotNet to get some more reliable numbers. As far as I can tell the Expression is a bit faster than Func on .NET 7. This answer potentially explains why. After doing some repeated benchmark runs I get this as a typical result:

Method Mean Error StdDev
Func 4.274 ns 0.1302 ns 0.1447 ns
Expression 3.598 ns 0.1055 ns 0.1903 ns

Here's my hardware and software:

BenchmarkDotNet v0.13.9+228a464e8be6c580ad9408e98f18813f6407fb5a, Windows 11 (10.0.22631.2338)
AMD Ryzen 9 5900HX with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.100-rc.1.23455.8
  [Host]     : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2

Here is the benchmark code:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq.Expressions;

BenchmarkRunner.Run<FuncVsExpression>();

public class FuncVsExpression
{
    private Func<int, Foo> myFunc;
    private Func<int, Foo> myExpressionFunc;

    [GlobalSetup]
    public void Setup()
    {
        this.myFunc = x => new Foo(x * 2);

        Expression<Func<int, Foo>> expression = x => new Foo(x * 2);
        this.myExpressionFunc = expression.Compile();
    }

    [Benchmark]
    public Foo Func() => myFunc(42);

    [Benchmark]
    public Foo Expression() => myExpressionFunc(42);
}

public class Foo
{
    public Foo(int i)
    {
        Value = i;
    }

    public int Value { get; set; }
}
您的好友蓝忘机已上羡 2024-10-10 10:09:26

正如其他人提到的,调用动态委托的开销导致速度变慢。在我的计算机上,CPU 频率为 3GHz 时,该开销约为 12ns。解决这个问题的方法是从已编译的程序集中加载该方法,如下所示:

var ab = AppDomain.CurrentDomain.DefineDynamicAssembly(
             new AssemblyName("assembly"), AssemblyBuilderAccess.Run);
var mod = ab.DefineDynamicModule("module");
var tb = mod.DefineType("type", TypeAttributes.Public);
var mb = tb.DefineMethod(
             "test3", MethodAttributes.Public | MethodAttributes.Static);
expression.CompileToMethod(mb);
var t = tb.CreateType();
var test3 = (Func<int, Foo>)Delegate.CreateDelegate(
                typeof(Func<int, Foo>), t.GetMethod("test3"));

int counter3 = 0;
Stopwatch s3 = new Stopwatch();
s3.Start();
for (int i = 0; i < 300000000; i++)
{
    counter3 += test3(i).Value;
}
s3.Stop();
var result3 = s3.Elapsed;

当我添加上述代码时,result3 始终只比 result1 高几分之一秒。 code>,大约有 1ns 的开销。

那么,当您可以拥有更快的委托 (test3) 时,为什么还要费心编译 lambda (test2)呢?因为创建动态程序集通常会产生更多开销,并且每次调用只能节省 10-20 纳秒。

As others have mentioned, the overhead of calling a dynamic delegate is causing your slowdown. On my computer that overhead is about 12ns with my CPU at 3GHz. The way to get around that is to load the method from a compiled assembly, like this:

var ab = AppDomain.CurrentDomain.DefineDynamicAssembly(
             new AssemblyName("assembly"), AssemblyBuilderAccess.Run);
var mod = ab.DefineDynamicModule("module");
var tb = mod.DefineType("type", TypeAttributes.Public);
var mb = tb.DefineMethod(
             "test3", MethodAttributes.Public | MethodAttributes.Static);
expression.CompileToMethod(mb);
var t = tb.CreateType();
var test3 = (Func<int, Foo>)Delegate.CreateDelegate(
                typeof(Func<int, Foo>), t.GetMethod("test3"));

int counter3 = 0;
Stopwatch s3 = new Stopwatch();
s3.Start();
for (int i = 0; i < 300000000; i++)
{
    counter3 += test3(i).Value;
}
s3.Stop();
var result3 = s3.Elapsed;

When I add the above code, result3 is always just a fraction of a second higher than result1, for about a 1ns overhead.

So why even bother with a compiled lambda (test2) when you can have a faster delegate (test3)? Because creating the dynamic assembly is much more overhead in general, and only saves you 10-20ns on each invocation.

天冷不及心凉 2024-10-10 10:09:26

(这不是正确的答案,但旨在帮助发现答案。)

从 Mono 2.6.7 - Debian Lenny - Linux 2.6.26 i686 - 2.80GHz 单核收集的统计数据:

      Func: 00:00:23.6062578
Expression: 00:00:23.9766248

因此,在 Mono 上至少两种机制似乎生成等价的 IL。

这是 Mono 的 gmcs 为匿名方法生成的 IL:

// method line 6
.method private static  hidebysig
       default class Foo '<Main>m__0' (int32 x)  cil managed
{
    .custom instance void class [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::'.ctor'() =  (01 00 00 00 ) // ....

    // Method begins at RVA 0x2204
    // Code size 9 (0x9)
    .maxstack 8
    IL_0000:  ldarg.0
    IL_0001:  ldc.i4.2
    IL_0002:  mul
    IL_0003:  newobj instance void class Foo::'.ctor'(int32)
    IL_0008:  ret
} // end of method Default::<Main>m__0

我将致力于提取表达式编译器生成的 IL。

(This is not a proper answer, but is material intended to help discover the answer.)

Statistics gathered from Mono 2.6.7 - Debian Lenny - Linux 2.6.26 i686 - 2.80GHz single core:

      Func: 00:00:23.6062578
Expression: 00:00:23.9766248

So on Mono at least both mechanisms appear to generate equivalent IL.

This is the IL generated by Mono's gmcs for the anonymous method:

// method line 6
.method private static  hidebysig
       default class Foo '<Main>m__0' (int32 x)  cil managed
{
    .custom instance void class [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::'.ctor'() =  (01 00 00 00 ) // ....

    // Method begins at RVA 0x2204
    // Code size 9 (0x9)
    .maxstack 8
    IL_0000:  ldarg.0
    IL_0001:  ldc.i4.2
    IL_0002:  mul
    IL_0003:  newobj instance void class Foo::'.ctor'(int32)
    IL_0008:  ret
} // end of method Default::<Main>m__0

I will work on extracting the IL generated by the expression compiler.

远山浅 2024-10-10 10:09:26

最终的结果是 Expression 不是预编译的委托。它只是一个表达式树。在 LambdaExpression(实际上是 Expression)上调用 Compile 会在运行时生成 IL 代码,并创建类似于 DynamicMethod 的内容它。

如果您仅在代码中使用 Func,它会像任何其他委托引用一样对其进行预编译。

因此,这里有两个缓慢的来源:

  1. Expression 编译为委托的初始编译时间。这是巨大的。如果您对每次调用都执行此操作 - 绝对不会(但事实并非如此,因为您在调用编译后使用秒表。

  2. < p>基本上,在调用 Compile 后,它是一个 DynamicMethod(即使是强类型委托),其执行速度实际上比直接调用要慢。编译时解析的是直接调用,动态发出的 IL 和编译时发出的随机 URL 之间存在性能比较:http://www.codeproject.com/KB/cs/dynamicmethoddelegates.aspx?msg=1160046

...另外,在 Expression 的秒表测试中,您应该在 i = 1 而不是 0 时启动计时器...我相信您编译的 Lambda 在第一次调用之前不会进行 JIT 编译,因此第一次调用将会影响性能。

Ultimately what it comes down to is that Expression<T> is not a pre compiled delegate. It's only an expression tree. Calling Compile on a LambdaExpression (which is what Expression<T> actually is) generates IL code at runtime and creates something akin to a DynamicMethod for it.

If you just use a Func<T> in code, it pre compiles it just like any other delegate reference.

So there are 2 sources of slowness here:

  1. The initial compilation time to compile Expression<T> into a delegate. This is huge. If you're doing this for every invocation - definitely don't (but this isn't the case, since you're using your Stopwatch after you call compile.

  2. It's a DynamicMethod basically after you call Compile. DynamicMethods (even strongly typed delegates for ones) ARE in fact slower to execute than direct calls. Func<T>s resolved at compile time are direct calls. There's performance comparisons out there between dynamically emitted IL and compile time emitted IL. Random URL: http://www.codeproject.com/KB/cs/dynamicmethoddelegates.aspx?msg=1160046

...Also, in your stopwatch test for the Expression<T>, you should start your timer when i = 1, not 0... I believe your compiled Lambda will not be JIT compiled until the first invocation, so there will be a performance hit for that first call.

人│生佛魔见 2024-10-10 10:09:26

仅供记录:我可以用上面的代码重现这些数字。

需要注意的一件事是,两个委托都会为每次迭代创建一个新的 Foo 实例。这可能比如何创建代表更重要。这不仅会导致大量的堆分配,而且 GC 也可能会影响这里的数字。

如果我将代码更改为

Func<int, int> test1 = x => x * 2;

Expression<Func<int, int>> expression = x => x * 2;
Func<int, int> test2 = expression.Compile();

性能数字几乎相同(实际上 result2 比 result1 好一点)。这支持了这样的理论:昂贵的部分是堆分配和/或集合,而不是委托的构造方式。

更新

根据 Gabe 的评论,我尝试将 Foo 更改为结构。不幸的是,这或多或少会产生与原始代码相同的数字,因此也许堆分配/垃圾收集毕竟不是原因。

不过,我还验证了 Func 类型的委托的数量,它们非常相似,并且远低于原始代码的数量。

我将继续挖掘并期待看到更多/更新的答案。

Just for the record: I can reproduce the numbers with the code above.

One thing to note is that both delegates create a new instance of Foo for every iteration. This could be more important than how the delegates are created. Not only does that lead to a lot of heap allocations, but GC may also affect the numbers here.

If I change the code to

Func<int, int> test1 = x => x * 2;

and

Expression<Func<int, int>> expression = x => x * 2;
Func<int, int> test2 = expression.Compile();

The performance numbers are virtually identical (actually result2 is a little better than result1). This supports the theory that the expensive part is heap allocations and/or collections and not how the delegate is constructed.

UPDATE

Following the comment from Gabe, I tried changing Foo to be a struct. Unfortunately this yields more or less the same numbers as the original code, so perhaps heap allocation/garbage collection is not the cause after all.

However, I also verified the numbers for delegates of the type Func<int, int> and they are quite similar and much lower than the numbers for the original code.

I'll keep digging and look forward to seeing more/updated answers.

似梦非梦 2024-10-10 10:09:26

这很可能是因为代码的第一次调用没有被抖动。
我决定查看 IL,它们几乎是相同的。

Func<int, Foo> func = x => new Foo(x * 2);
Expression<Func<int, Foo>> exp = x => new Foo(x * 2);
var func2 = exp.Compile();
Array.ForEach(func.Method.GetMethodBody().GetILAsByteArray(), b => Console.WriteLine(b));

var mtype = func2.Method.GetType();
var fiOwner = mtype.GetField("m_owner", BindingFlags.Instance | BindingFlags.NonPublic);
var dynMethod = fiOwner.GetValue(func2.Method) as DynamicMethod;
var ilgen = dynMethod.GetILGenerator();


byte[] il = ilgen.GetType().GetMethod("BakeByteArray", BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ilgen, null) as byte[];
Console.WriteLine("Expression version");
Array.ForEach(il, b => Console.WriteLine(b));

此代码获取字节数组并将它们打印到控制台。这是我机器上的输出::

2
24
90
115
13
0
0
6
42
Expression version
3
24
90
115
2
0
0
6
42

这是第一个函数的反射器版本::

   L_0000: ldarg.0 
    L_0001: ldc.i4.2 
    L_0002: mul 
    L_0003: newobj instance void ConsoleApplication7.Foo::.ctor(int32)
    L_0008: ret 

整个方法中只有 2 个字节不同!
它们是第一个操作码,用于第一个方法 ldarg0(加载第一个参数),但用于第二个方法 ldarg1(加载第二个参数)。这里的区别是因为表达式生成的对象实际上有一个 Closure 对象的目标。这也可以考虑在内。

两者的下一个操作码都是ldc.i4.2(24),这意味着将2加载到堆栈上,下一个是mul(90)的操作码,下一个操作码是newobj 操作码 (115)。接下来的 4 个字节是 .ctor 对象的元数据标记。它们是不同的,因为这两种方法实际上托管在不同的程序集中。匿名方法位于匿名程序集中。不幸的是,我还没有完全弄清楚如何解析这些令牌。最终的操作码是 42,即 ret。每个 CLI 函数都必须以 ret 结尾,即使函数不返回任何内容。

可能性很小,闭包对象以某种方式导致速度变慢,这可能是真的(但不太可能),抖动没有抖动该方法,并且由于您连续快速旋转发射,因此不必花时间jit 该路径,调用较慢的路径。 vs 中的 C# 编译器也可能会发出不同的调用约定和 MethodAttributes,它们可能充当抖动提示以执行不同的优化。

最终,我根本不会担心这种差异。如果您确实在应用程序过程中调用您的函数 30 亿次,并且所产生的差异是 5 整秒,那么您可能会没事。

It is most likely because the first invocation of the code was not jitted.
I decided to look at the IL and they are virtually identical.

Func<int, Foo> func = x => new Foo(x * 2);
Expression<Func<int, Foo>> exp = x => new Foo(x * 2);
var func2 = exp.Compile();
Array.ForEach(func.Method.GetMethodBody().GetILAsByteArray(), b => Console.WriteLine(b));

var mtype = func2.Method.GetType();
var fiOwner = mtype.GetField("m_owner", BindingFlags.Instance | BindingFlags.NonPublic);
var dynMethod = fiOwner.GetValue(func2.Method) as DynamicMethod;
var ilgen = dynMethod.GetILGenerator();


byte[] il = ilgen.GetType().GetMethod("BakeByteArray", BindingFlags.NonPublic | BindingFlags.Instance).Invoke(ilgen, null) as byte[];
Console.WriteLine("Expression version");
Array.ForEach(il, b => Console.WriteLine(b));

This code gets us the byte arrays and prints them to the console. Here is the output on my machine::

2
24
90
115
13
0
0
6
42
Expression version
3
24
90
115
2
0
0
6
42

And here is reflector's version of the first function::

   L_0000: ldarg.0 
    L_0001: ldc.i4.2 
    L_0002: mul 
    L_0003: newobj instance void ConsoleApplication7.Foo::.ctor(int32)
    L_0008: ret 

There are only 2 bytes different in the entire method!
They are the first opcode, which is for the first method, ldarg0 (load the first argument), but on the second method ldarg1 (load the second argument). The difference here is because an expression generated object actually has a target of a Closure object. This can also factor in.

The next opcode for both is ldc.i4.2 (24) which means load 2 onto the stack, the next is the opcode for mul (90), the next opcode is the newobj opcode (115). The next 4 bytes are the metadata token for the .ctor object. They are different as the two methods are actually hosted in different assemblies. The anonymous method is in an anonymous assembly. Unfortunately, I haven't quite gotten to the point of figuring out how to resolve these tokens. The final opcode is 42 which is ret. Every CLI function must end with ret even functions that don't return anything.

There are few possibilities, the closure object is somehow causing things to be slower, which might be true (but unlikely), the jitter didn't jit the method and since you were firing in rapid spinning succession it didn't have to time to jit that path, invoking a slower path. The C# compiler in vs may also be emitting different calling conventions, and MethodAttributes which may act as hints to the jitter to perform different optimizations.

Ultimately, I would not even remotely worry about this difference. If you really are invoking your function 3 billion times in the course of your application, and the difference being incurred is 5 whole seconds, you're probably going to be ok.

夜空下最亮的亮点 2024-10-10 10:09:26

我对 Michael B. 的答案很感兴趣,所以我在秒表开始之前在每种情况下都添加了额外的调用。在调试模式下,编译(案例 2)方法快了近两倍(6 秒到 10 秒),而在发布模式下,两个版本都处于同等水平(差异约为 0.2 秒)。

现在,令我惊讶的是,如果排除 JIT,我得到了与 Martin 相反的结果。

编辑:最初我错过了 Foo,所以上面的结果是带有字段的 Foo,而不是属性,与原始 Foo 的比较是相同的,只是时间更大 - 直接 func 15 秒,编译版本 12 秒。同样,在发布模式下,时间相似,现在差异约为 0.5。

然而,这表明,如果您的表达式更复杂,即使在发布模式下也会有真正的差异。

I was interested in the answer by Michael B. so I added in each case extra call before stopwatch even started. In debug mode the compile (case 2) method was faster nearly two times (6 seconds to 10 seconds), and in release mode both versions both version was on par (the difference was about ~0.2 second).

Now, what is striking to me, that with JIT put out of the equation I got the opposite results than Martin.

Edit: Initially I missed the Foo, so the results above are for Foo with field, not a property, with original Foo the comparison is the same, only times are bigger -- 15 seconds for direct func, 12 seconds for compiled version. Again, in release mode the times are similar, now the difference is about ~0.5.

However this indicates, that if your expression is more complex, even in release mode there will be real difference.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文