异常、移动语义和优化:受编译器支配(MSVC2010)?

发布于 2024-12-29 10:39:20 字数 5385 浏览 3 评论 0原文

在对我的旧异常类层次结构进行一些升级以利用一些 C++11 功能时,我做了一些速度测试并发现了一些令人沮丧的结果。所有这一切都是用x64位MSVC++2010编译器、最大速度优化/O2完成的。

两个非常简单的结构体,都是按位复制语义。一个不带移动赋值运算符(为什么需要一个?),另一个 - 带。两个简单的内联函数按值返回这些结构的新创建实例,这些实例被分配给局部变量。另外,请注意周围的 try/catch 块。下面是代码:

#include <iostream>
#include <windows.h>

struct TFoo
{
  unsigned long long int m0;
  unsigned long long int m1;

  TFoo( unsigned long long int f ) : m0( f ), m1( f / 2 ) {}
};

struct TBar
{
  unsigned long long int m0;
  unsigned long long int m1;

  TBar( unsigned long long int f ) : m0( f ), m1( f / 2 ) {}
  TBar & operator=( TBar && f )
  {
   m0 = f.m0;
   m1 = f.m1;
   f.m0 = f.m1 = 0;

   return ( *this );
  }
};

TFoo MakeFoo( unsigned long long int f )
{
 return ( TFoo( f ) );
}

TBar MakeBar( unsigned long long int f )
{
 return ( TBar( f ) );
}

int main( void )
{
 try
 {
  unsigned long long int lMin = 0;
  unsigned long long int lMax = 20000000;
  LARGE_INTEGER lStart = { 0 };
  LARGE_INTEGER lEnd = { 0 };
  TFoo lFoo( 0 );
  TBar lBar( 0 );

  ::QueryPerformanceCounter( &lStart );
  for( auto i = lMin; i < lMax; i++ )
  {
   lFoo = MakeFoo( i );
  }
  ::QueryPerformanceCounter( &lEnd );
  std::cout << "lFoo = ( " << lFoo.m0 << " , " << lFoo.m1 << " )\t\tMakeFoo count : " << lEnd.QuadPart - lStart.QuadPart << std::endl;

  ::QueryPerformanceCounter( &lStart );
  for( auto i = lMin; i < lMax; i++ )
  {
   lBar = MakeBar( i );
  }
  ::QueryPerformanceCounter( &lEnd );
  std::cout << "lBar = ( " << lBar.m0 << " , " << lBar.m1 << " )\t\tMakeBar count : " << lEnd.QuadPart - lStart.QuadPart << std::endl;
 }
 catch( ... ){}

 return ( 0 );
}

程序输出:

lFoo = ( 19999999 , 9999999 )       MakeFoo count : 428652
lBar = ( 19999999 , 9999999 )       MakeBar count : 74518

两个循环的汇编器(显示周围的计数器调用):

//- MakeFoo loop START --------------------------------
00000001`3f4388aa 488d4810        lea     rcx,[rax+10h]
00000001`3f4388ae ff1594db0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]

00000001`3f4388b4 448bdf          mov     r11d,edi
00000001`3f4388b7 48897c2428      mov     qword ptr [rsp+28h],rdi
00000001`3f4388bc 0f1f4000        nop     dword ptr [rax]
00000001`3f4388c0 4981fb002d3101  cmp     r11,1312D00h
00000001`3f4388c7 732a            jae     Prototype_Console!main+0x83 (00000001`3f4388f3)
00000001`3f4388c9 4c895c2450      mov     qword ptr [rsp+50h],r11
00000001`3f4388ce 498bc3          mov     rax,r11
00000001`3f4388d1 48d1e8          shr     rax,1
00000001`3f4388d4 4889442458      mov     qword ptr [rsp+58h],rax       // these 3 lines
00000001`3f4388d9 0f28442450      movaps  xmm0,xmmword ptr [rsp+50h]    // are of interest 
00000001`3f4388de 660f7f442430    movdqa  xmmword ptr [rsp+30h],xmm0    // see MakeBar
00000001`3f4388e4 49ffc3          inc     r11
00000001`3f4388e7 4c895c2428      mov     qword ptr [rsp+28h],r11        
00000001`3f4388ec 4c8b6c2438      mov     r13,qword ptr [rsp+38h]       // this one too
00000001`3f4388f1 ebcd            jmp     Prototype_Console!main+0x50 (00000001`3f4388c0)

00000001`3f4388f3 488d8c24c0000000 lea     rcx,[rsp+0C0h]
00000001`3f4388fb ff1547db0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]
//- MakeFoo loop END --------------------------------

//- MakeBar loop START --------------------------------
00000001`3f4389d1 488d8c24c8000000 lea     rcx,[rsp+0C8h]
00000001`3f4389d9 ff1569da0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]

00000001`3f4389df 4c8bdf          mov     r11,rdi
00000001`3f4389e2 48897c2440      mov     qword ptr [rsp+40h],rdi
00000001`3f4389e7 4981fb002d3101  cmp     r11,1312D00h
00000001`3f4389ee 7322            jae     Prototype_Console!main+0x1a2 (00000001`3f438a12)
00000001`3f4389f0 4c895c2478      mov     qword ptr [rsp+78h],r11
00000001`3f4389f5 498bf3          mov     rsi,r11
00000001`3f4389f8 48d1ee          shr     rsi,1
00000001`3f4389fb 4d8be3          mov     r12,r11                     // these 3 lines
00000001`3f4389fe 4c895c2468      mov     qword ptr [rsp+68h],r11     // are of interest
00000001`3f438a03 48897c2478      mov     qword ptr [rsp+78h],rdi     // see MakeFoo
00000001`3f438a08 49ffc3          inc     r11
00000001`3f438a0b 4c895c2440      mov     qword ptr [rsp+40h],r11
00000001`3f438a10 ebd5            jmp     Prototype_Console!main+0x177 (00000001`3f4389e7)

00000001`3f438a12 488d8c24c0000000 lea     rcx,[rsp+0C0h]
00000001`3f438a1a ff1528da0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]
//- MakeBar loop END --------------------------------

如果我删除 try/catch 块,则两次都是相同的。但在存在它的情况下,编译器显然可以通过冗余移动运算符=更好地优化struct代码。另外,MakeFoo 时间确实取决于 TFoo 的大小及其布局,但总的来说,时间比 MakeBar 差一些。时间并不取决于微小的尺寸变化。

问题

  1. 这是MSVC++2010的编译器特定功能(有人可以检查GCC吗?)?

  2. 是否因为编译器必须保留临时数据直到调用完成,所以在 MakeFoo 的情况下它不能“将其分解”,而在 MakeBar 的情况下它知道我们允许它使用移动语义,并且它“将其分开”,生成更快的代码?

  3. 在没有 try\catch 块的情况下,我可以期望类似的事情有相同的行为,但在更复杂的场景中吗?

While doing some upgrades to my old exception classes hierarchy to utilize some of C++11 features, I did some speed tests and came across results that are somewhat frustrating. All of this was done with x64bit MSVC++2010 compiler, maximum speed optimization /O2.

Two very simple struct's, both bitwise copy semantics. One without move assignment operator (why would you need one?), another - with. Two simple inlined function returning by value newly created instances of these structs, which get assigned to local variable. Also, notice try/catch block around. Here is the code:

#include <iostream>
#include <windows.h>

struct TFoo
{
  unsigned long long int m0;
  unsigned long long int m1;

  TFoo( unsigned long long int f ) : m0( f ), m1( f / 2 ) {}
};

struct TBar
{
  unsigned long long int m0;
  unsigned long long int m1;

  TBar( unsigned long long int f ) : m0( f ), m1( f / 2 ) {}
  TBar & operator=( TBar && f )
  {
   m0 = f.m0;
   m1 = f.m1;
   f.m0 = f.m1 = 0;

   return ( *this );
  }
};

TFoo MakeFoo( unsigned long long int f )
{
 return ( TFoo( f ) );
}

TBar MakeBar( unsigned long long int f )
{
 return ( TBar( f ) );
}

int main( void )
{
 try
 {
  unsigned long long int lMin = 0;
  unsigned long long int lMax = 20000000;
  LARGE_INTEGER lStart = { 0 };
  LARGE_INTEGER lEnd = { 0 };
  TFoo lFoo( 0 );
  TBar lBar( 0 );

  ::QueryPerformanceCounter( &lStart );
  for( auto i = lMin; i < lMax; i++ )
  {
   lFoo = MakeFoo( i );
  }
  ::QueryPerformanceCounter( &lEnd );
  std::cout << "lFoo = ( " << lFoo.m0 << " , " << lFoo.m1 << " )\t\tMakeFoo count : " << lEnd.QuadPart - lStart.QuadPart << std::endl;

  ::QueryPerformanceCounter( &lStart );
  for( auto i = lMin; i < lMax; i++ )
  {
   lBar = MakeBar( i );
  }
  ::QueryPerformanceCounter( &lEnd );
  std::cout << "lBar = ( " << lBar.m0 << " , " << lBar.m1 << " )\t\tMakeBar count : " << lEnd.QuadPart - lStart.QuadPart << std::endl;
 }
 catch( ... ){}

 return ( 0 );
}

Program output:

lFoo = ( 19999999 , 9999999 )       MakeFoo count : 428652
lBar = ( 19999999 , 9999999 )       MakeBar count : 74518

Assembler for both loops (showing surrounding counter calls ) :

//- MakeFoo loop START --------------------------------
00000001`3f4388aa 488d4810        lea     rcx,[rax+10h]
00000001`3f4388ae ff1594db0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]

00000001`3f4388b4 448bdf          mov     r11d,edi
00000001`3f4388b7 48897c2428      mov     qword ptr [rsp+28h],rdi
00000001`3f4388bc 0f1f4000        nop     dword ptr [rax]
00000001`3f4388c0 4981fb002d3101  cmp     r11,1312D00h
00000001`3f4388c7 732a            jae     Prototype_Console!main+0x83 (00000001`3f4388f3)
00000001`3f4388c9 4c895c2450      mov     qword ptr [rsp+50h],r11
00000001`3f4388ce 498bc3          mov     rax,r11
00000001`3f4388d1 48d1e8          shr     rax,1
00000001`3f4388d4 4889442458      mov     qword ptr [rsp+58h],rax       // these 3 lines
00000001`3f4388d9 0f28442450      movaps  xmm0,xmmword ptr [rsp+50h]    // are of interest 
00000001`3f4388de 660f7f442430    movdqa  xmmword ptr [rsp+30h],xmm0    // see MakeBar
00000001`3f4388e4 49ffc3          inc     r11
00000001`3f4388e7 4c895c2428      mov     qword ptr [rsp+28h],r11        
00000001`3f4388ec 4c8b6c2438      mov     r13,qword ptr [rsp+38h]       // this one too
00000001`3f4388f1 ebcd            jmp     Prototype_Console!main+0x50 (00000001`3f4388c0)

00000001`3f4388f3 488d8c24c0000000 lea     rcx,[rsp+0C0h]
00000001`3f4388fb ff1547db0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]
//- MakeFoo loop END --------------------------------

//- MakeBar loop START --------------------------------
00000001`3f4389d1 488d8c24c8000000 lea     rcx,[rsp+0C8h]
00000001`3f4389d9 ff1569da0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]

00000001`3f4389df 4c8bdf          mov     r11,rdi
00000001`3f4389e2 48897c2440      mov     qword ptr [rsp+40h],rdi
00000001`3f4389e7 4981fb002d3101  cmp     r11,1312D00h
00000001`3f4389ee 7322            jae     Prototype_Console!main+0x1a2 (00000001`3f438a12)
00000001`3f4389f0 4c895c2478      mov     qword ptr [rsp+78h],r11
00000001`3f4389f5 498bf3          mov     rsi,r11
00000001`3f4389f8 48d1ee          shr     rsi,1
00000001`3f4389fb 4d8be3          mov     r12,r11                     // these 3 lines
00000001`3f4389fe 4c895c2468      mov     qword ptr [rsp+68h],r11     // are of interest
00000001`3f438a03 48897c2478      mov     qword ptr [rsp+78h],rdi     // see MakeFoo
00000001`3f438a08 49ffc3          inc     r11
00000001`3f438a0b 4c895c2440      mov     qword ptr [rsp+40h],r11
00000001`3f438a10 ebd5            jmp     Prototype_Console!main+0x177 (00000001`3f4389e7)

00000001`3f438a12 488d8c24c0000000 lea     rcx,[rsp+0C0h]
00000001`3f438a1a ff1528da0400    call    qword ptr [Prototype_Console!_imp_QueryPerformanceCounter (00000001`3f486448)]
//- MakeBar loop END --------------------------------

Both times are the same if I remove try/catch block. But in presence of it, compiler clearly optimizes code better for struct with redundant move operator=. Also, MakeFoo time does dependent on the size of TFoo and its layout, but in, general, time is several worse than for MakeBar for which time does not depend on small size changes.

Questions:

  1. Is it compiler specific feature ofy MSVC++2010 (could someone check for GCC?)?

  2. Is it because compiler has to preserve temporary until the call finishes, it cannot "rip it apart" in case of MakeFoo, and in case of MakeBar it knows that we allow it to use move semantics and it "rips it apart", generating faster code?

  3. Can I expect same behavior for similar things without try\catch block, but in more complicated scenarios?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爺獨霸怡葒院 2025-01-05 10:39:20

你的测试有缺陷。当使用 /O2 /EHsc 进行编译时,它会在不到一秒的时间内完成运行,并且测试结果存在很大的可变性。

我重试了相同的测试,但运行了 100 倍的迭代,结果如下(多次运行测试的结果相似):

lFoo = ( 1999999999 , 999999999 )               MakeFoo count : 16584927
lBar = ( 1999999999 , 999999999 )               MakeBar count : 16613002

您的测试没有显示两种类型的赋值性能之间存在任何差异。

Your test is flawed. When compiled with /O2 /EHsc, it runs to completion in a fraction of a second and there is high variability in the results of the test.

I retried the same test, but ran it for 100x as many iterations, with the following results (the results were similar across several runs of the test):

lFoo = ( 1999999999 , 999999999 )               MakeFoo count : 16584927
lBar = ( 1999999999 , 999999999 )               MakeBar count : 16613002

Your test does not show any difference between the performance of assignment of the two types.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文