如何在处理负零时有效比较两个浮点值的符号
给定两个浮点数,我正在寻找一种有效的方法来检查它们是否具有相同的符号,假设两个值中的任何一个为零(+0.0或-0.0) ),它们应该被认为具有相同的符号。
例如,
- SameSign(1.0, 2.0) 应返回 true
- SameSign(-1.0, -2.0) 应返回 true
- SameSign(-1.0, 2.0) 应返回 false
- SameSign(0.0, 1.0) 应返回 true
- < b>SameSign(0.0, -1.0) 应返回 true
- SameSign(-0.0, 1.0) 应返回 true
- SameSign(-0.0, -1.0) 应返回 true
在 C++ 中 SameSign
的一个简单但正确的实现是:
bool SameSign(float a, float b)
{
if (fabs(a) == 0.0f || fabs(b) == 0.0f)
return true;
return (a >= 0.0f) == (b >= 0.0f);
}
假设 IEEE 浮点模型,这里有一个 SameSign
的变体,它编译为无分支代码(至少使用 with Visual C++ 2008):
bool SameSign(float a, float b)
{
int ia = binary_cast<int>(a);
int ib = binary_cast<int>(b);
int az = (ia & 0x7FFFFFFF) == 0;
int bz = (ib & 0x7FFFFFFF) == 0;
int ab = (ia ^ ib) >= 0;
return (az | bz | ab) != 0;
}
binary_cast
定义如下:
template <typename Target, typename Source>
inline Target binary_cast(Source s)
{
union
{
Source m_source;
Target m_target;
} u;
u.m_source = s;
return u.m_target;
}
我正在寻找两件事:
更快、更高效的
SameSign
实现,使用位技巧、FPU 技巧甚至 SSE 内在函数。SameSign
到三个值的有效扩展。
编辑:
我对 SameSign 的三个变体(原始问题中描述的两个变体,加上 Stephen 的变体)进行了一些性能测量。每个函数针对 101 个浮点数组中的所有连续值对运行 200-400 次,随机填充 -1.0、-0.0、+0.0 和 +1.0。每次测量重复 2000 次并保留最短时间(以消除所有缓存影响和系统引起的速度减慢)。该代码是使用 Visual C++ 2008 SP1 编译的,并启用了最大优化和 SSE2 代码生成。测量是在 Core 2 Duo P8600 2.4 Ghz 上完成的。
以下是计时,不包括从数组中获取输入值、调用函数并检索结果的开销(总计 6-7 个时钟周期):
- Naive 变体:15 个周期
- Bit magic 变体:13 个周期
- Stephens 的变体:6个刻度
Given two floating-point numbers, I'm looking for an efficient way to check if they have the same sign, given that if any of the two values is zero (+0.0 or -0.0), they should be considered to have the same sign.
For instance,
- SameSign(1.0, 2.0) should return true
- SameSign(-1.0, -2.0) should return true
- SameSign(-1.0, 2.0) should return false
- SameSign(0.0, 1.0) should return true
- SameSign(0.0, -1.0) should return true
- SameSign(-0.0, 1.0) should return true
- SameSign(-0.0, -1.0) should return true
A naive but correct implementation of SameSign
in C++ would be:
bool SameSign(float a, float b)
{
if (fabs(a) == 0.0f || fabs(b) == 0.0f)
return true;
return (a >= 0.0f) == (b >= 0.0f);
}
Assuming the IEEE floating-point model, here's a variant of SameSign
that compiles to branchless code (at least with with Visual C++ 2008):
bool SameSign(float a, float b)
{
int ia = binary_cast<int>(a);
int ib = binary_cast<int>(b);
int az = (ia & 0x7FFFFFFF) == 0;
int bz = (ib & 0x7FFFFFFF) == 0;
int ab = (ia ^ ib) >= 0;
return (az | bz | ab) != 0;
}
with binary_cast
defined as follow:
template <typename Target, typename Source>
inline Target binary_cast(Source s)
{
union
{
Source m_source;
Target m_target;
} u;
u.m_source = s;
return u.m_target;
}
I'm looking for two things:
A faster, more efficient implementation of
SameSign
, using bit tricks, FPU tricks or even SSE intrinsics.An efficient extension of
SameSign
to three values.
Edit:
I've made some performance measurements on the three variants of SameSign
(the two variants described in the original question, plus Stephen's one). Each function was run 200-400 times, on all consecutive pairs of values in an array of 101 floats filled at random with -1.0, -0.0, +0.0 and +1.0. Each measurement was repeated 2000 times and the minimum time was kept (to weed out all cache effects and system-induced slowdowns). The code was compiled with Visual C++ 2008 SP1 with maximum optimization and SSE2 code generation enabled. The measurements were done on a Core 2 Duo P8600 2.4 Ghz.
Here are the timings, not counting the overhead of fetching input values from the array, calling the function and retrieving the result (which amount to 6-7 clockticks):
- Naive variant: 15 ticks
- Bit magic variant: 13 ticks
- Stephens's variant: 6 ticks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您不需要支持无穷大,您可以使用:
这实际上在大多数现代硬件上都非常快,并且完全可移植。然而,它在 (零,无穷大) 的情况下无法正常工作,因为零 * 无穷大是 NaN,并且无论符号如何,比较都将返回 false。当 a 和 b 都很小时,它还会在某些硬件上引起非正常停顿。
If you don't need to support infinities, you can just use:
which is actually pretty fast on most modern hardware, and is completely portable. It doesn't work properly in the (zero, infinity) case however, because zero * infinity is NaN, and the comparison will return false, regardless of the signs. It will also incur a denormal stall on some hardware when a and b are both tiny.
也许是这样的:
请参阅 copysign 的手册页以获取有关其功能的更多信息(您也可能需要检查 -0 != +0),
或者如果您有 C99 函数
作为旁注,则可能是这样,在 gcc 上至少两者copysign 和 signbit 是内置函数,因此它们应该很快,如果您想确保使用内置版本,您可以执行 __builtin_signbitf(a)
编辑:这也应该很容易扩展到 3 值情况(实际上两者其中应该...)
perhaps something like:
see the man page for copysign for more info on what it does (also you may want to check that -0 != +0)
or possibly this if you have C99 functions
as a side note, on gcc at least both copysign and signbit are builtin functions so they should be fast, if you want to make sure the builtin version is being used you can do __builtin_signbitf(a)
EDIT: this should also be easy to extend to the 3 value case as well (actually both of these should...)
关于符号位的一个小注释:该宏返回一个 int 并且手册页指出“如果 x 的值设置了符号位,则它返回一个非零值”。这意味着当signbit为两个不同的负值返回两个不同的非零整数时,Spudd86的
bool same_sign()
不能保证工作。首先转换为 bool 可确保正确的返回值:
A small note on signbit: The macro returns an int and the man page states that "It returns a nonzero value if the value of x has its sign bit set." This means that the Spudd86's
bool same_sign()
is not guaranteed to work in case signbit returns two different non-zero int's for two different negative values.Casting to bool first ensures a correct return value: