C++ 有何不同？ math.h abs() 与我的 abs() 相比

发布于 2024-09-02 04:37:09 字数 903 浏览 12 评论 0原文

我目前正在用 C++ 编写一些类似于向量数学类的 glsl，并且我刚刚实现了一个 abs() 函数，如下所示：

template<class T>
static inline T abs(T _a)
{
    return _a < 0 ? -_a : _a;
}

我将其速度与来自 abs 的默认 C++ abs 进行了比较。 code>math.h 像这样：

clock_t begin = clock();
for(int i=0; i<10000000; ++i)
{
    float a = abs(-1.25);
};

clock_t end = clock();
unsigned long time1 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));

begin = clock();
for(int i=0; i<10000000; ++i)
{
    float a  = myMath::abs(-1.25);
};
end = clock();
unsigned long time2 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));

std::cout<<time1<<std::endl;
std::cout<<time2<<std::endl;

现在默认的abs大约需要25ms，而我的需要60ms。我猜正在进行一些低级优化。有谁知道 math.h abs 内部是如何工作的？性能差异并不明显，但我只是很好奇！

原文

I am currently writing some glsl like vector math classes in C++, and I just implemented an abs() function like this:

template<class T>
static inline T abs(T _a)
{
    return _a < 0 ? -_a : _a;
}

I compared its speed to the default C++ abs from math.h like this:

clock_t begin = clock();
for(int i=0; i<10000000; ++i)
{
    float a = abs(-1.25);
};

clock_t end = clock();
unsigned long time1 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));

begin = clock();
for(int i=0; i<10000000; ++i)
{
    float a  = myMath::abs(-1.25);
};
end = clock();
unsigned long time2 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));

std::cout<<time1<<std::endl;
std::cout<<time2<<std::endl;

Now the default abs takes about 25ms while mine takes 60. I guess there is some low level optimisation going on. Does anybody know how math.h abs works internally? The performance difference is nothing dramatic, but I am just curious!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

音盲 2024-09-09 04:37:09

由于它们是实现，因此它们可以自由地做出任意多的假设。他们知道 double 的格式，并且可以用它来玩把戏。

可能（几乎不是一个问题），您的 double 是 binary64格式。这意味着符号有自己的位，绝对值只是清除该位。例如，作为专业化，编译器实现者可以执行以下操作：

template <>
double abs<double>(const double x)
{
    // breaks strict aliasing, but compiler writer knows this behavior for the platform
    uint64_t i = reinterpret_cast<const std::uint64_t&>(x);
    i &= 0x7FFFFFFFFFFFFFFFULL; // clear sign bit

    return reinterpret_cast<const double&>(i);
}

这会删除分支并可能运行得更快。

Since they are the implementation, they are free to make as many assumptions as they want. They know the format of the double and can play tricks with that instead.

Likely (as in almost not even a question), your double is the binary64 format. This means the sign has it's own bit, and an absolute value is merely clearing that bit. For example, as a specialization, a compiler implementer may do the following:

template <>
double abs<double>(const double x)
{
    // breaks strict aliasing, but compiler writer knows this behavior for the platform
    uint64_t i = reinterpret_cast<const std::uint64_t&>(x);
    i &= 0x7FFFFFFFFFFFFFFFULL; // clear sign bit

    return reinterpret_cast<const double&>(i);
}

This removes branching and may run faster.

回复收藏 0 原文

太阳公公是暖光 2024-09-09 04:37:09

计算二进制补码有符号数的绝对值有一些众所周知的技巧。如果该数为负数，则翻转所有位并加1，即与-1异或并减-1。如果是正数，则不执行任何操作，即与 0 异或并减去 0。

int my_abs(int x)
{
    int s = x >> 31;
    return (x ^ s) - s;
}

There are well-known tricks for computing the absolute value of a two's complement signed number. If the number is negative, flip all the bits and add 1, that is, xor with -1 and subtract -1. If it is positive, do nothing, that is, xor with 0 and subtract 0.

int my_abs(int x)
{
    int s = x >> 31;
    return (x ^ s) - s;
}

回复收藏 0 原文

放肆 2024-09-09 04:37:09

你的编译器和设置是什么？我确信 MS 和 GCC 为许多数学和字符串操作实现了“内在函数”。

以下行：

printf("%.3f", abs(1.25));

落入以下“fabs”代码路径（在 msvcr90d.dll 中）：

004113DE  sub         esp,8 
004113E1  fld         qword ptr [__real@3ff4000000000000 (415748h)] 
004113E7  fstp        qword ptr [esp] 
004113EA  call        abs (4110FFh)

abs 调用 MSVCR90D 上的 C 运行时“fabs”实现（相当大）：

102F5730  mov         edi,edi 
102F5732  push        ebp  
102F5733  mov         ebp,esp 
102F5735  sub         esp,14h 
102F5738  fldz             
102F573A  fstp        qword ptr [result] 
102F573D  push        0FFFFh 
102F5742  push        133Fh 
102F5747  call        _ctrlfp (102F6140h) 
102F574C  add         esp,8 
102F574F  mov         dword ptr [savedcw],eax 
102F5752  movzx       eax,word ptr [ebp+0Eh] 
102F5756  and         eax,7FF0h 
102F575B  cmp         eax,7FF0h 
102F5760  jne         fabs+0D2h (102F5802h) 
102F5766  sub         esp,8 
102F5769  fld         qword ptr [x] 
102F576C  fstp        qword ptr [esp] 
102F576F  call        _sptype (102F9710h) 
102F5774  add         esp,8 
102F5777  mov         dword ptr [ebp-14h],eax 
102F577A  cmp         dword ptr [ebp-14h],1 
102F577E  je          fabs+5Eh (102F578Eh) 
102F5780  cmp         dword ptr [ebp-14h],2 
102F5784  je          fabs+77h (102F57A7h) 
102F5786  cmp         dword ptr [ebp-14h],3 
102F578A  je          fabs+8Fh (102F57BFh) 
102F578C  jmp         fabs+0A8h (102F57D8h) 
102F578E  push        0FFFFh 
102F5793  mov         ecx,dword ptr [savedcw] 
102F5796  push        ecx  
102F5797  call        _ctrlfp (102F6140h) 
102F579C  add         esp,8 
102F579F  fld         qword ptr [x] 
102F57A2  jmp         fabs+0F8h (102F5828h) 
102F57A7  push        0FFFFh 
102F57AC  mov         edx,dword ptr [savedcw] 
102F57AF  push        edx  
102F57B0  call        _ctrlfp (102F6140h) 
102F57B5  add         esp,8 
102F57B8  fld         qword ptr [x] 
102F57BB  fchs             
102F57BD  jmp         fabs+0F8h (102F5828h) 
102F57BF  mov         eax,dword ptr [savedcw] 
102F57C2  push        eax  
102F57C3  sub         esp,8 
102F57C6  fld         qword ptr [x] 
102F57C9  fstp        qword ptr [esp] 
102F57CC  push        15h  
102F57CE  call        _handle_qnan1 (102F98C0h) 
102F57D3  add         esp,10h 
102F57D6  jmp         fabs+0F8h (102F5828h) 
102F57D8  mov         ecx,dword ptr [savedcw] 
102F57DB  push        ecx  
102F57DC  fld         qword ptr [x] 
102F57DF  fadd        qword ptr [__real@3ff0000000000000 (1022CF68h)] 
102F57E5  sub         esp,8 
102F57E8  fstp        qword ptr [esp] 
102F57EB  sub         esp,8 
102F57EE  fld         qword ptr [x] 
102F57F1  fstp        qword ptr [esp] 
102F57F4  push        15h  
102F57F6  push        8    
102F57F8  call        _except1 (102F99B0h) 
102F57FD  add         esp,1Ch 
102F5800  jmp         fabs+0F8h (102F5828h) 
102F5802  mov         edx,dword ptr [ebp+0Ch] 
102F5805  and         edx,7FFFFFFFh 
102F580B  mov         dword ptr [ebp-0Ch],edx 
102F580E  mov         eax,dword ptr [x] 
102F5811  mov         dword ptr [result],eax 
102F5814  push        0FFFFh 
102F5819  mov         ecx,dword ptr [savedcw] 
102F581C  push        ecx  
102F581D  call        _ctrlfp (102F6140h) 
102F5822  add         esp,8 
102F5825  fld         qword ptr [result] 
102F5828  mov         esp,ebp 
102F582A  pop         ebp  
102F582B  ret

在发布模式下，改为使用 FPU FABS 指令（需要 1）仅在 FPU >= Pentium 上的时钟周期），反汇编输出为：

00401006  fld         qword ptr [__real@3ff4000000000000 (402100h)] 
0040100C  sub         esp,8 
0040100F  fabs             
00401011  fstp        qword ptr [esp] 
00401014  push        offset string "%.3f" (4020F4h) 
00401019  call        dword ptr [__imp__printf (4020A0h)]

What is your compiler and settings? I'm sure MS and GCC implement "intrinsic functions" for many math and string operations.

The following line:

printf("%.3f", abs(1.25));

falls into the following "fabs" code path (in msvcr90d.dll):

004113DE  sub         esp,8 
004113E1  fld         qword ptr [__real@3ff4000000000000 (415748h)] 
004113E7  fstp        qword ptr [esp] 
004113EA  call        abs (4110FFh)

abs call the C runtime 'fabs' implementation on MSVCR90D (rather large):

102F5730  mov         edi,edi 
102F5732  push        ebp  
102F5733  mov         ebp,esp 
102F5735  sub         esp,14h 
102F5738  fldz             
102F573A  fstp        qword ptr [result] 
102F573D  push        0FFFFh 
102F5742  push        133Fh 
102F5747  call        _ctrlfp (102F6140h) 
102F574C  add         esp,8 
102F574F  mov         dword ptr [savedcw],eax 
102F5752  movzx       eax,word ptr [ebp+0Eh] 
102F5756  and         eax,7FF0h 
102F575B  cmp         eax,7FF0h 
102F5760  jne         fabs+0D2h (102F5802h) 
102F5766  sub         esp,8 
102F5769  fld         qword ptr [x] 
102F576C  fstp        qword ptr [esp] 
102F576F  call        _sptype (102F9710h) 
102F5774  add         esp,8 
102F5777  mov         dword ptr [ebp-14h],eax 
102F577A  cmp         dword ptr [ebp-14h],1 
102F577E  je          fabs+5Eh (102F578Eh) 
102F5780  cmp         dword ptr [ebp-14h],2 
102F5784  je          fabs+77h (102F57A7h) 
102F5786  cmp         dword ptr [ebp-14h],3 
102F578A  je          fabs+8Fh (102F57BFh) 
102F578C  jmp         fabs+0A8h (102F57D8h) 
102F578E  push        0FFFFh 
102F5793  mov         ecx,dword ptr [savedcw] 
102F5796  push        ecx  
102F5797  call        _ctrlfp (102F6140h) 
102F579C  add         esp,8 
102F579F  fld         qword ptr [x] 
102F57A2  jmp         fabs+0F8h (102F5828h) 
102F57A7  push        0FFFFh 
102F57AC  mov         edx,dword ptr [savedcw] 
102F57AF  push        edx  
102F57B0  call        _ctrlfp (102F6140h) 
102F57B5  add         esp,8 
102F57B8  fld         qword ptr [x] 
102F57BB  fchs             
102F57BD  jmp         fabs+0F8h (102F5828h) 
102F57BF  mov         eax,dword ptr [savedcw] 
102F57C2  push        eax  
102F57C3  sub         esp,8 
102F57C6  fld         qword ptr [x] 
102F57C9  fstp        qword ptr [esp] 
102F57CC  push        15h  
102F57CE  call        _handle_qnan1 (102F98C0h) 
102F57D3  add         esp,10h 
102F57D6  jmp         fabs+0F8h (102F5828h) 
102F57D8  mov         ecx,dword ptr [savedcw] 
102F57DB  push        ecx  
102F57DC  fld         qword ptr [x] 
102F57DF  fadd        qword ptr [__real@3ff0000000000000 (1022CF68h)] 
102F57E5  sub         esp,8 
102F57E8  fstp        qword ptr [esp] 
102F57EB  sub         esp,8 
102F57EE  fld         qword ptr [x] 
102F57F1  fstp        qword ptr [esp] 
102F57F4  push        15h  
102F57F6  push        8    
102F57F8  call        _except1 (102F99B0h) 
102F57FD  add         esp,1Ch 
102F5800  jmp         fabs+0F8h (102F5828h) 
102F5802  mov         edx,dword ptr [ebp+0Ch] 
102F5805  and         edx,7FFFFFFFh 
102F580B  mov         dword ptr [ebp-0Ch],edx 
102F580E  mov         eax,dword ptr [x] 
102F5811  mov         dword ptr [result],eax 
102F5814  push        0FFFFh 
102F5819  mov         ecx,dword ptr [savedcw] 
102F581C  push        ecx  
102F581D  call        _ctrlfp (102F6140h) 
102F5822  add         esp,8 
102F5825  fld         qword ptr [result] 
102F5828  mov         esp,ebp 
102F582A  pop         ebp  
102F582B  ret

In release mode, the FPU FABS instruction is used instead (takes 1 clock cycle only on FPU >= Pentium), the dissasembly output is:

00401006  fld         qword ptr [__real@3ff4000000000000 (402100h)] 
0040100C  sub         esp,8 
0040100F  fabs             
00401011  fstp        qword ptr [esp] 
00401014  push        offset string "%.3f" (4020F4h) 
00401019  call        dword ptr [__imp__printf (4020A0h)]

回复收藏 0 原文

朮生 2024-09-09 04:37:09

它可能只是使用位掩码将符号位设置为 0。

回复收藏 0 原文

爱的那么颓废 2024-09-09 04:37:09

可能有以下几种情况：

您确定第一次调用使用std::abs吗？它也可以使用 C 中的整数 abs（显式调用 std::abs，或者使用 using std::abs;）
编译器可能具有某些浮点函数的内在实现（例如，将它们直接编译为 FPU 指令）

但是，我令我惊讶的是，编译器没有完全消除循环 - 因为您在循环内没有做任何有任何影响的事情，并且至少在 abs 的情况下，编译器应该知道没有边-效果。

回复收藏 0 原文

溺ぐ爱和你が 2024-09-09 04:37:09

您的ABS版本是内联的，可以计算一次，并且编译器可以简单地知道返回的值不会改变，所以它甚至不需要调用该函数。

您确实需要查看生成的汇编代码（设置断点，然后打开“大”调试器视图，如果没记错的话，反汇编代码将位于左下角），然后您就可以看到发生了什么。

您可以轻松地在线找到有关处理器的文档，它会告诉您所有说明是什么，以便您可以弄清楚发生了什么。或者，将其粘贴到此处，我们会告诉您。 ;)

回复收藏 0 原文

月隐月明月朦胧 2024-09-09 04:37:09

库版本的abs可能是一个内在函数，编译器完全知道它的行为，编译器甚至可以在编译时计算值（因为在你的情况下它是已知的）并优化调用。您应该使用仅在运行时已知的值（由用户提供或在两个周期之前通过 rand() 获得）来尝试基准测试。

如果仍然存在差异，可能是因为库abs是直接用魔术技巧手工伪造的汇编编写的，所以它可能比生成的要快一点。

回复收藏 0 原文

燕归巢 2024-09-09 04:37:09

库abs函数对整数进行操作，而您显然是在测试浮点数。这意味着使用 float 参数调用 abs 涉及从 float 到 int 的转换（可能是无操作，因为您使用的是常量，编译器可能在编译时执行此操作），然后是 INTEGER abs 操作和转换 int->float。您的模板化函数将涉及浮点运算，这可能会产生影响。

回复收藏 0 原文

~没有更多了~