这个虚拟方法调用如何比密封方法调用更快?

发布于 2024-10-07 06:50:56 字数 1667 浏览 4 评论 0原文

我正在对虚拟成员与密封成员的性能进行一些修改。

下面是我的测试代码。

输出是

virtual total 3166ms
per call virtual 3.166ns
sealed total 3931ms
per call sealed 3.931ns

我一定做错了什么,因为根据这个虚拟调用比密封调用更快。

我正在发布模式下运行,并打开“优化代码”。

编辑:当在 VS 之外运行(作为控制台应用程序)时,时间接近于白热化。但虚拟几乎总是排在前面。

[TestFixture]
public class VirtTests
{

    public class ClassWithNonEmptyMethods
    {
        private double x;
        private double y;

        public virtual void VirtualMethod()
        {
            x++;
        }
        public void SealedMethod()
        {
            y++;
        }
    }

    const int iterations = 1000000000;


    [Test]
    public void NonEmptyMethodTest()
    {

        var foo = new ClassWithNonEmptyMethods();
        //Pre-call
        foo.VirtualMethod();
        foo.SealedMethod();

        var virtualWatch = new Stopwatch();
        virtualWatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            foo.VirtualMethod();
        }
        virtualWatch.Stop();
        Console.WriteLine("virtual total {0}ms", virtualWatch.ElapsedMilliseconds);
        Console.WriteLine("per call virtual {0}ns", ((float)virtualWatch.ElapsedMilliseconds * 1000000) / iterations);


        var sealedWatch = new Stopwatch();
        sealedWatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            foo.SealedMethod();
        }
        sealedWatch.Stop();
        Console.WriteLine("sealed total {0}ms", sealedWatch.ElapsedMilliseconds);
        Console.WriteLine("per call sealed {0}ns", ((float)sealedWatch.ElapsedMilliseconds * 1000000) / iterations);

    }

}

I am doing some tinkering on the performance of virtual vs sealed members.

Below is my test code.

The output is

virtual total 3166ms
per call virtual 3.166ns
sealed total 3931ms
per call sealed 3.931ns

I must be doing something wrong because according to this the virtual call is faster than the sealed call.

I am running in Release mode with "Optimize code" turned on.

Edit: when running outside of VS (as a console app) the times are close to a dead heat. but the virtual almost always comes out in front.

[TestFixture]
public class VirtTests
{

    public class ClassWithNonEmptyMethods
    {
        private double x;
        private double y;

        public virtual void VirtualMethod()
        {
            x++;
        }
        public void SealedMethod()
        {
            y++;
        }
    }

    const int iterations = 1000000000;


    [Test]
    public void NonEmptyMethodTest()
    {

        var foo = new ClassWithNonEmptyMethods();
        //Pre-call
        foo.VirtualMethod();
        foo.SealedMethod();

        var virtualWatch = new Stopwatch();
        virtualWatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            foo.VirtualMethod();
        }
        virtualWatch.Stop();
        Console.WriteLine("virtual total {0}ms", virtualWatch.ElapsedMilliseconds);
        Console.WriteLine("per call virtual {0}ns", ((float)virtualWatch.ElapsedMilliseconds * 1000000) / iterations);


        var sealedWatch = new Stopwatch();
        sealedWatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            foo.SealedMethod();
        }
        sealedWatch.Stop();
        Console.WriteLine("sealed total {0}ms", sealedWatch.ElapsedMilliseconds);
        Console.WriteLine("per call sealed {0}ns", ((float)sealedWatch.ElapsedMilliseconds * 1000000) / iterations);

    }

}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

笨笨の傻瓜 2024-10-14 06:50:56

您正在测试内存对齐对代码效率的影响。 32 位 JIT 编译器无法为 C# 代码中大小超过 32 位、long 和 double 的值类型生成有效的代码。问题的根源是 32 位 GC 堆分配器,它只承诺在 4 的倍数地址上对齐分配的内存。这是这里的一个问题,您正在递增双精度数。 double 仅当在 8 的倍数地址上对齐时才有效。与堆栈相同的问题,在局部变量的情况下,它在 32 位机器上也仅与 4 对齐。

L1 CPU 高速缓存在内部以称为“高速缓存线”的块进行组织。当程序读取未对齐的双精度值时会受到惩罚。特别是跨越缓存行末端的缓存行,必须读取来自两个缓存行的字节并将其粘合在一起。未对齐在 32 位抖动中并不罕见,“x”字段碰巧分配在 8 的倍数的地址上的几率仅为 50-50。如果不是,则“x”和“y”将不对齐,其中之一很可能跨越缓存线。您编写测试的方式会使 VirtualMethod 或 SealedMethod 变慢。确保让他们使用相同的字段以获得可比较的结果。

对于代码来说也是如此。交换虚拟和密封测试的代码以任意改变结果。这样我就可以毫不费力地使密封测试变得更快。鉴于速度差异不大,您可能正在考虑代码对齐问题。 x64 抖动会努力插入 NOP 来对齐分支目标,而 x86 抖动则不会。

您还应该在循环中运行计时测试几次,至少 20 次。然后您可能还会观察到垃圾收集器移动类对象的效果。之后,双打可能会有不同的排列,从而极大地改变时间。访问 64 位值类型值(如 long 或 double)有 3 个不同的时序:在缓存行内按 8 对齐、在缓存行内按 4 对齐以及跨两个缓存行按 4 对齐。按快到慢的顺序。

代价是陡峭的,读取跨缓存线的双精度数大约比读取对齐的双精度数慢倍。这也是为什么在大对象堆中分配 double[](双精度数组)的核心原因,即使它只有 1000 个元素(远远超出 80KB 的正常阈值),LOH 的对齐保证为 8。这些对齐问题在 x64 抖动生成的代码中完全消失,堆栈和 GC 堆的对齐方式都是 8。

You are testing the effects of memory alignment on code efficiency. The 32-bit JIT compiler has trouble generating efficient code for value types that are more than 32 bits in size, long and double in C# code. The root of the problem is the 32-bit GC heap allocator, it only promises alignment of allocated memory on addresses that are a multiple of 4. That's an issue here, you are incrementing doubles. A double is efficient only when it is aligned on an address that's a multiple of 8. Same issue with the stack, in case of local variables, it is also aligned only to 4 on a 32-bit machine.

The L1 CPU cache is internally organized in blocks called a "cache line". There is a penalty when the program reads a mis-aligned double. Especially one that straddles the end of a cache line, bytes from two cache lines have to be read and glued together. Mis-alignment isn't uncommon in the 32-bit jitter, it is merely 50-50 odds that the 'x' field happens to be allocated on an address that's a multiple of 8. If it isn't then 'x' and 'y' are going to be misaligned and one of them may well straddle the cache line. The way you wrote the test, that's going to either make VirtualMethod or SealedMethod slower. Make sure you let them use the same field to get comparable results.

The same is true for code. Swap the code for the virtual and sealed test to arbitrarily change the outcome. I had no trouble making the sealed test quite a bit faster that way. Given the modest difference in speed, you are probably looking at a code alignment issue. The x64 jitter makes an effort to insert NOPs to get a branch target aligned, the x86 jitter doesn't.

You should also run the timing test several times in a loop, at least 20. You are likely to then also observe the effect of the garbage collector moving the class object. The double may have a different alignment afterward, dramatically changing the timing. Accessing a 64-bit value type value like long or double has 3 distinct timings, aligned on 8, aligned on 4 within a cache line, and aligned on 4 across two cache lines. In fast to slow order.

The penalty is steep, reading a double that straddles a cache line is roughly three times slower than reading an aligned one. Also the core reason why a double[] (array of doubles) is allocated in the Large Object Heap even when it has only 1000 elements, well south of the normal threshold of 80KB, the LOH has an alignment guarantee of 8. These alignment problems entirely disappear in code generated by the x64 jitter, both the stack and the GC heap have an alignment of 8.

飘过的浮云 2024-10-14 06:50:56

首先,您必须将方法标记为sealed

其次,为虚拟方法提供重写。创建派生类的实例。

作为第三个测试,创建一个密封覆盖方法。

现在你可以开始比较了。

编辑:您可能应该在 VS 之外运行它。

更新:

我的意思的例子。

abstract class Foo
{
  virtual void Bar() {}
}

class Baz : Foo
{
  sealed override void Bar() {}
}

class Woz : Foo
{
  override void Bar() {}
}

现在测试 BazWoz 实例的 Bar 调用速度。
我还怀疑程序集外部的成员和类可见性可能会影响 JIT 分析。

First, you have to mark the method sealed.

Secondly, provide an override to the virtual method. Create an instance of the derived class.

As a third test, create a sealed override method.

Now you can start comparing.

Edit: You should probably run this outside VS.

Update:

Example of what I mean.

abstract class Foo
{
  virtual void Bar() {}
}

class Baz : Foo
{
  sealed override void Bar() {}
}

class Woz : Foo
{
  override void Bar() {}
}

Now test the call speed of Bar for an instance of Baz and Woz.
I also suspect member and class visibility outside the assembly could affect JIT analysis.

情话难免假 2024-10-14 06:50:56

您可能会看到一些启动成本。尝试将 Test-A/Test-B 代码包装在循环中并运行几次。您可能还会看到某种排序效果。为了避免这种情况(以及循环效果的顶部/底部),请将其展开 2-3 次。

You might be seeing some start up cost. Try wrapping the Test-A/Test-B code in a loop and run it several times. You might also be seeing some kind of ordering effects. To avoid that (and top/bottom of loop effects), unroll it 2-3 times.

转角预定愛 2024-10-14 06:50:56

以下面的代码作为测试参考,我们使用Ildasm.exe(IL反汇编器)工具来分析编译器生成的Microsoft中间语言(MSIL)信息。

public sealed class Sealed
{
    public string Message { get; set; }
    public void DoStuff() { }
}
public class Derived : Base
{
    public sealed override void DoStuff() { }
}
public class Base
{
    public string Message { get; set; }
    public virtual void DoStuff() { }
}
static void Main()
{
    Sealed sealedClass = new Sealed();
    sealedClass.DoStuff();
    Derived derivedClass = new Derived();
    derivedClass.DoStuff();
    Base BaseClass = new Base();
    BaseClass.DoStuff();
}

要运行此工具,请打开 Visual Studio 的开发人员命令提示符并执行命令 ildasm

**********************************************************************
** Visual Studio 2017 Developer Command Prompt v15.9.13
** Copyright (c) 2017 Microsoft Corporation
**********************************************************************


C:\Program Files (x86)\Microsoft Visual Studio\2017\Community>ildasm

应用程序启动后,加载前一个应用程序的可执行文件(或程序集)

没有为此图像提供替代文本
双击Main方法可以查看Microsoft中间语言(MSIL)信息。

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       41 (0x29)
  .maxstack  8
  IL_0000:  newobj     instance void ConsoleApp1.Program/Sealed::.ctor()
  IL_0005:  callvirt   instance void ConsoleApp1.Program/Sealed::DoStuff()
  IL_000a:  newobj     instance void ConsoleApp1.Program/Derived::.ctor()
  IL_000f:  callvirt   instance void ConsoleApp1.Program/Base::DoStuff()
  IL_0014:  newobj     instance void ConsoleApp1.Program/Base::.ctor()
  IL_0019:  callvirt   instance void ConsoleApp1.Program/Base::DoStuff()
  IL_0028:  ret
} // end of method Program::Main

正如您所看到的,每个类都使用 newobj 通过将对象引用压入堆栈来创建新实例,并使用 callvirt 来调用 DoStuff() 方法的后期绑定其各自的对象。

从这些信息来看,编译器似乎以相同的方式管理密封类、派生类和基类。为了确定起见,让我们通过使用 Visual Studio 中的“反汇编”窗口分析JIT 编译的代码来更深入地了解。

通过选择工具 > 下的启用地址级调试来启用反汇编。选项>调试>一般。

没有为此图像提供替代文本
在应用程序开始时设置制动点并开始调试。一旦应用程序到达制动点,通过选择调试>打开“反汇编”窗口。窗口>反汇编。

--- C:\Users\Ivan Porta\source\repos\ConsoleApp1\Program.cs --------------------
        {
0066084A  in          al,dx  
0066084B  push        edi  
0066084C  push        esi  
0066084D  push        ebx  
0066084E  sub         esp,4Ch  
00660851  lea         edi,[ebp-58h]  
00660854  mov         ecx,13h  
00660859  xor         eax,eax  
0066085B  rep stos    dword ptr es:[edi]  
0066085D  cmp         dword ptr ds:[5842F0h],0  
00660864  je          0066086B  
00660866  call        744CFAD0  
0066086B  xor         edx,edx  
0066086D  mov         dword ptr [ebp-3Ch],edx  
00660870  xor         edx,edx  
00660872  mov         dword ptr [ebp-48h],edx  
00660875  xor         edx,edx  
00660877  mov         dword ptr [ebp-44h],edx  
0066087A  xor         edx,edx  
0066087C  mov         dword ptr [ebp-40h],edx  
0066087F  nop  
            Sealed sealedClass = new Sealed();
00660880  mov         ecx,584E1Ch  
00660885  call        005730F4  
0066088A  mov         dword ptr [ebp-4Ch],eax  
0066088D  mov         ecx,dword ptr [ebp-4Ch]  
00660890  call        00660468  
00660895  mov         eax,dword ptr [ebp-4Ch]  
00660898  mov         dword ptr [ebp-3Ch],eax  
            sealedClass.DoStuff();
0066089B  mov         ecx,dword ptr [ebp-3Ch]  
0066089E  cmp         dword ptr [ecx],ecx  
006608A0  call        00660460  
006608A5  nop  
            Derived derivedClass = new Derived();
006608A6  mov         ecx,584F3Ch  
006608AB  call        005730F4  
006608B0  mov         dword ptr [ebp-50h],eax  
006608B3  mov         ecx,dword ptr [ebp-50h]  
006608B6  call        006604A8  
006608BB  mov         eax,dword ptr [ebp-50h]  
006608BE  mov         dword ptr [ebp-40h],eax  
            derivedClass.DoStuff();
006608C1  mov         ecx,dword ptr [ebp-40h]  
006608C4  mov         eax,dword ptr [ecx]  
006608C6  mov         eax,dword ptr [eax+28h]  
006608C9  call        dword ptr [eax+10h]  
006608CC  nop  
            Base BaseClass = new Base();
006608CD  mov         ecx,584EC0h  
006608D2  call        005730F4  
006608D7  mov         dword ptr [ebp-54h],eax  
006608DA  mov         ecx,dword ptr [ebp-54h]  
006608DD  call        00660490  
006608E2  mov         eax,dword ptr [ebp-54h]  
006608E5  mov         dword ptr [ebp-44h],eax  
            BaseClass.DoStuff();
006608E8  mov         ecx,dword ptr [ebp-44h]  
006608EB  mov         eax,dword ptr [ecx]  
006608ED  mov         eax,dword ptr [eax+28h]  
006608F0  call        dword ptr [eax+10h]  
006608F3  nop  
        }
0066091A  nop  
0066091B  lea         esp,[ebp-0Ch]  
0066091E  pop         ebx  
0066091F  pop         esi  
00660920  pop         edi  
00660921  pop         ebp  

00660922  ret  

正如我们在前面的代码中看到的,虽然对象的创建是相同的,但调用密封类和派生/基类的方法所执行的指令略有不同。将数据移入 RAM 寄存器(mov 指令)后,调用密封方法,在实际调用该方法之前执行 dword ptr [ecx] 和 ecx(cmp 指令)之间的比较。

根据 Torbjöorn Granlund 撰写的报告《AMD 和 Intel x86 处理器的指令延迟和吞吐量》,Intel Pentium 4 中以下指令的速度为:

  • mov:延迟为 1 个周期,处理器可以维持 2.5 个周期
    这种类型 cmp 的每个周期指令
  • :有 1 个周期作为延迟,处理器可以维持 2 个周期
    此类类型每个周期的指令数

总之,当今编译器和处理器的优化使得密封类和非密封类之间的性能基本上很小,与大多数应用程序无关。

参考

Using as reference for our test the following code, let's analyze the Microsoft intermediate language (MSIL) information generated by the compiler by using the Ildasm.exe (IL Disassembler) tool.

public sealed class Sealed
{
    public string Message { get; set; }
    public void DoStuff() { }
}
public class Derived : Base
{
    public sealed override void DoStuff() { }
}
public class Base
{
    public string Message { get; set; }
    public virtual void DoStuff() { }
}
static void Main()
{
    Sealed sealedClass = new Sealed();
    sealedClass.DoStuff();
    Derived derivedClass = new Derived();
    derivedClass.DoStuff();
    Base BaseClass = new Base();
    BaseClass.DoStuff();
}

To run this tool, open the Developer Command Prompt for Visual Studio and execute the command ildasm.

**********************************************************************
** Visual Studio 2017 Developer Command Prompt v15.9.13
** Copyright (c) 2017 Microsoft Corporation
**********************************************************************


C:\Program Files (x86)\Microsoft Visual Studio\2017\Community>ildasm

Once the application is started, load the executable (or assembly) of the previous application

No alt text provided for this image
Double click on the Main method to view the Microsoft intermediate language (MSIL) information.

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       41 (0x29)
  .maxstack  8
  IL_0000:  newobj     instance void ConsoleApp1.Program/Sealed::.ctor()
  IL_0005:  callvirt   instance void ConsoleApp1.Program/Sealed::DoStuff()
  IL_000a:  newobj     instance void ConsoleApp1.Program/Derived::.ctor()
  IL_000f:  callvirt   instance void ConsoleApp1.Program/Base::DoStuff()
  IL_0014:  newobj     instance void ConsoleApp1.Program/Base::.ctor()
  IL_0019:  callvirt   instance void ConsoleApp1.Program/Base::DoStuff()
  IL_0028:  ret
} // end of method Program::Main

As you can see each class use newobj to create a new instance by pushing an object reference onto the stack and callvirt to calls a late-bound of the DoStuff() method of its respective object.

Judging on this information seems that both sealed, derived and base classes are managed in the same way by the compiler. Just to be sure, let's get deeper by analyzing the JIT-compiled code with the Disassembly window in Visual Studio.

Enable the Disassembly by selecting Enable address-level debugging, under Tools > Options > Debugging > General.

No alt text provided for this image
Set the a brake point at the beginning of the application and start the debug. Once the application hits the brake-point open the Disassembly window by selecting Debug > Windows > Disassembly.

--- C:\Users\Ivan Porta\source\repos\ConsoleApp1\Program.cs --------------------
        {
0066084A  in          al,dx  
0066084B  push        edi  
0066084C  push        esi  
0066084D  push        ebx  
0066084E  sub         esp,4Ch  
00660851  lea         edi,[ebp-58h]  
00660854  mov         ecx,13h  
00660859  xor         eax,eax  
0066085B  rep stos    dword ptr es:[edi]  
0066085D  cmp         dword ptr ds:[5842F0h],0  
00660864  je          0066086B  
00660866  call        744CFAD0  
0066086B  xor         edx,edx  
0066086D  mov         dword ptr [ebp-3Ch],edx  
00660870  xor         edx,edx  
00660872  mov         dword ptr [ebp-48h],edx  
00660875  xor         edx,edx  
00660877  mov         dword ptr [ebp-44h],edx  
0066087A  xor         edx,edx  
0066087C  mov         dword ptr [ebp-40h],edx  
0066087F  nop  
            Sealed sealedClass = new Sealed();
00660880  mov         ecx,584E1Ch  
00660885  call        005730F4  
0066088A  mov         dword ptr [ebp-4Ch],eax  
0066088D  mov         ecx,dword ptr [ebp-4Ch]  
00660890  call        00660468  
00660895  mov         eax,dword ptr [ebp-4Ch]  
00660898  mov         dword ptr [ebp-3Ch],eax  
            sealedClass.DoStuff();
0066089B  mov         ecx,dword ptr [ebp-3Ch]  
0066089E  cmp         dword ptr [ecx],ecx  
006608A0  call        00660460  
006608A5  nop  
            Derived derivedClass = new Derived();
006608A6  mov         ecx,584F3Ch  
006608AB  call        005730F4  
006608B0  mov         dword ptr [ebp-50h],eax  
006608B3  mov         ecx,dword ptr [ebp-50h]  
006608B6  call        006604A8  
006608BB  mov         eax,dword ptr [ebp-50h]  
006608BE  mov         dword ptr [ebp-40h],eax  
            derivedClass.DoStuff();
006608C1  mov         ecx,dword ptr [ebp-40h]  
006608C4  mov         eax,dword ptr [ecx]  
006608C6  mov         eax,dword ptr [eax+28h]  
006608C9  call        dword ptr [eax+10h]  
006608CC  nop  
            Base BaseClass = new Base();
006608CD  mov         ecx,584EC0h  
006608D2  call        005730F4  
006608D7  mov         dword ptr [ebp-54h],eax  
006608DA  mov         ecx,dword ptr [ebp-54h]  
006608DD  call        00660490  
006608E2  mov         eax,dword ptr [ebp-54h]  
006608E5  mov         dword ptr [ebp-44h],eax  
            BaseClass.DoStuff();
006608E8  mov         ecx,dword ptr [ebp-44h]  
006608EB  mov         eax,dword ptr [ecx]  
006608ED  mov         eax,dword ptr [eax+28h]  
006608F0  call        dword ptr [eax+10h]  
006608F3  nop  
        }
0066091A  nop  
0066091B  lea         esp,[ebp-0Ch]  
0066091E  pop         ebx  
0066091F  pop         esi  
00660920  pop         edi  
00660921  pop         ebp  

00660922  ret  

As we can see in the previous code, while the creation of the objects is the same, the instruction executed to invoke the methods of the sealed and derived/base class are slightly different. After moving data into registers of the RAM (mov instruction), the invoke of the sealed method, execute a comparison between dword ptr [ecx] and ecx (cmp instruction) before actually call the method.

According to the report written by Torbj¨orn Granlund, Instruction latencies and throughput for AMD and Intel x86 processors, the speed of the following instruction in a Intel Pentium 4 are:

  • mov: has 1 cycle as latency and the processor can sustain 2.5
    instructions per cycle of this type
  • cmp: has 1 cycle as latency and the processor can sustain 2
    instructions per cycle of this type

In conclusion, the optimization of the now days compilers and processors have made the performances between sealed and not-sealed classed basically so little that are irrelevant to the majority of the applications.

References

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文