C++ 有何不同? math.h abs() 与我的 abs() 相比
我目前正在用 C++ 编写一些类似于向量数学类的 glsl,并且我刚刚实现了一个 abs()
函数,如下所示:
template<class T>
static inline T abs(T _a)
{
return _a < 0 ? -_a : _a;
}
我将其速度与来自 abs
的默认 C++ abs
进行了比较。 code>math.h 像这样:
clock_t begin = clock();
for(int i=0; i<10000000; ++i)
{
float a = abs(-1.25);
};
clock_t end = clock();
unsigned long time1 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));
begin = clock();
for(int i=0; i<10000000; ++i)
{
float a = myMath::abs(-1.25);
};
end = clock();
unsigned long time2 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));
std::cout<<time1<<std::endl;
std::cout<<time2<<std::endl;
现在默认的abs大约需要25ms,而我的需要60ms。我猜正在进行一些低级优化。有谁知道 math.h
abs
内部是如何工作的?性能差异并不明显,但我只是很好奇!
I am currently writing some glsl like vector math classes in C++, and I just implemented an abs()
function like this:
template<class T>
static inline T abs(T _a)
{
return _a < 0 ? -_a : _a;
}
I compared its speed to the default C++ abs
from math.h
like this:
clock_t begin = clock();
for(int i=0; i<10000000; ++i)
{
float a = abs(-1.25);
};
clock_t end = clock();
unsigned long time1 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));
begin = clock();
for(int i=0; i<10000000; ++i)
{
float a = myMath::abs(-1.25);
};
end = clock();
unsigned long time2 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));
std::cout<<time1<<std::endl;
std::cout<<time2<<std::endl;
Now the default abs takes about 25ms while mine takes 60. I guess there is some low level optimisation going on. Does anybody know how math.h
abs
works internally? The performance difference is nothing dramatic, but I am just curious!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
由于它们是实现,因此它们可以自由地做出任意多的假设。他们知道 double 的格式,并且可以用它来玩把戏。
可能(几乎不是一个问题),您的
double
是 binary64格式。这意味着符号有自己的位,绝对值只是清除该位。例如,作为专业化,编译器实现者可以执行以下操作:这会删除分支并可能运行得更快。
Since they are the implementation, they are free to make as many assumptions as they want. They know the format of the
double
and can play tricks with that instead.Likely (as in almost not even a question), your
double
is the binary64 format. This means the sign has it's own bit, and an absolute value is merely clearing that bit. For example, as a specialization, a compiler implementer may do the following:This removes branching and may run faster.
计算二进制补码有符号数的绝对值有一些众所周知的技巧。如果该数为负数,则翻转所有位并加1,即与-1异或并减-1。如果是正数,则不执行任何操作,即与 0 异或并减去 0。
There are well-known tricks for computing the absolute value of a two's complement signed number. If the number is negative, flip all the bits and add 1, that is, xor with -1 and subtract -1. If it is positive, do nothing, that is, xor with 0 and subtract 0.
你的编译器和设置是什么?我确信 MS 和 GCC 为许多数学和字符串操作实现了“内在函数”。
以下行:
落入以下“fabs”代码路径(在 msvcr90d.dll 中):
abs 调用 MSVCR90D 上的 C 运行时“fabs”实现(相当大):
在发布模式下,改为使用 FPU FABS 指令(需要 1)仅在 FPU >= Pentium 上的时钟周期),反汇编输出为:
What is your compiler and settings? I'm sure MS and GCC implement "intrinsic functions" for many math and string operations.
The following line:
falls into the following "fabs" code path (in msvcr90d.dll):
abs call the C runtime 'fabs' implementation on MSVCR90D (rather large):
In release mode, the FPU FABS instruction is used instead (takes 1 clock cycle only on FPU >= Pentium), the dissasembly output is:
它可能只是使用位掩码将符号位设置为 0。
It probably just uses a bitmask to set the sign bit to 0.
可能有以下几种情况:
您确定第一次调用使用
std::abs
吗?它也可以使用 C 中的整数abs
(显式调用std::abs
,或者使用using std::abs;
)编译器可能具有某些浮点函数的内在实现(例如,将它们直接编译为 FPU 指令)
但是,我令我惊讶的是,编译器没有完全消除循环 - 因为您在循环内没有做任何有任何影响的事情,并且至少在
abs
的情况下,编译器应该知道没有边-效果。There can be several things:
are you sure the first call uses
std::abs
? It could just as well use the integerabs
from C (either callstd::abs
explicitely, or haveusing std::abs;
)the compiler might have intrinsic implementation of some float functions (eg. compile them directly into FPU instructions)
However, I'm surprised the compiler doesn't eliminate the loop altogether - since you don't do anything with any effect inside the loop, and at least in case of
abs
, the compiler should know there are no side-effects.您的ABS版本是内联的,可以计算一次,并且编译器可以简单地知道返回的值不会改变,所以它甚至不需要调用该函数。
您确实需要查看生成的汇编代码(设置断点,然后打开“大”调试器视图,如果没记错的话,反汇编代码将位于左下角),然后您就可以看到发生了什么。
您可以轻松地在线找到有关处理器的文档,它会告诉您所有说明是什么,以便您可以弄清楚发生了什么。或者,将其粘贴到此处,我们会告诉您。 ;)
Your version of abs is inlined and can be computed once and the compiler can trivially know that the value returned isn't going to change, so it doesn't even need to call the function.
You really need to look at the generated assembly code (set a breakpoint, and open the "large" debugger view, this disassembly will be on the bottom left if memory serves), and then you can see what's going on.
You can find documentation on your processor online without too much trouble, it'll tell you what all of the instructions are so you can figure out what's happening. Alternatively, paste it here and we'll tell you. ;)
库版本的abs可能是一个内在函数,编译器完全知道它的行为,编译器甚至可以在编译时计算值(因为在你的情况下它是已知的)并优化调用。您应该使用仅在运行时已知的值(由用户提供或在两个周期之前通过 rand() 获得)来尝试基准测试。
如果仍然存在差异,可能是因为库abs是直接用魔术技巧手工伪造的汇编编写的,所以它可能比生成的要快一点。
Probably the library version of abs is an intrinsic function, whose behavior is exactly known by the compiler, which can even compute the value at compile time (since in your case it's known) and optimize the call away. You should try your benchmark with a value known only at runtime (provided by the user or got with rand() before the two cycles).
If there's still a difference, it may be because the library abs is written directly in hand-forged assembly with magic tricks, so it could be a little faster than the generated one.
库abs函数对整数进行操作,而您显然是在测试浮点数。这意味着使用 float 参数调用 abs 涉及从 float 到 int 的转换(可能是无操作,因为您使用的是常量,编译器可能在编译时执行此操作),然后是 INTEGER abs 操作和转换 int->float。您的模板化函数将涉及浮点运算,这可能会产生影响。
The library abs function operates on integers while you are obviously testing floats. This means that call to abs with float argument involves conversion from float to int (may be a no-op as you are using constant and compiler may do it at compile time), then INTEGER abs operation and conversion int->float. You templated function will involve operations on floats and this is probably making a difference.