如何在处理负零时有效比较两个浮点值的符号

发布于 2024-09-03 01:38:21 字数 1907 浏览 14 评论 0原文

给定两个浮点数,我正在寻找一种有效的方法来检查它们是否具有相同的符号,假设两个值中的任何一个为零(+0.0或-0.0) ),它们应该被认为具有相同的符号

例如,

  • SameSign(1.0, 2.0) 应返回 true
  • SameSign(-1.0, -2.0) 应返回 true
  • SameSign(-1.0, 2.0) 应返回 false
  • SameSign(0.0, 1.0) 应返回 true
  • < b>SameSign(0.0, -1.0) 应返回 true
  • SameSign(-0.0, 1.0) 应返回 true
  • SameSign(-0.0, -1.0) 应返回 true

在 C++ 中 SameSign 的一个简单但正确的实现是:

bool SameSign(float a, float b)
{
    if (fabs(a) == 0.0f || fabs(b) == 0.0f)
        return true;

    return (a >= 0.0f) == (b >= 0.0f);
}

假设 IEEE 浮点模型,这里有一个 SameSign 的变体,它编译为无分支代码(至少使用 with Visual C++ 2008):

bool SameSign(float a, float b)
{
    int ia = binary_cast<int>(a);
    int ib = binary_cast<int>(b);

    int az = (ia & 0x7FFFFFFF) == 0;
    int bz = (ib & 0x7FFFFFFF) == 0;
    int ab = (ia ^ ib) >= 0;

    return (az | bz | ab) != 0;
}

binary_cast 定义如下:

template <typename Target, typename Source>
inline Target binary_cast(Source s)
{
    union
    {
        Source  m_source;
        Target  m_target;
    } u;
    u.m_source = s;
    return u.m_target;
}

我正在寻找两件事:

  1. 更快、更高效的 SameSign 实现,使用位技巧、FPU 技巧甚至 SSE 内在函数。

  2. SameSign 到三个值的有效扩展

编辑:

我对 SameSign 的三个变体(原始问题中描述的两个变体,加上 Stephen 的变体)进行了一些性能测量。每个函数针对 101 个浮点数组中的所有连续值对运行 200-400 次,随机填充 -1.0、-0.0、+0.0 和 +1.0。每次测量重复 2000 次并保留最短时间(以消除所有缓存影响和系统引起的速度减慢)。该代码是使用 Visual C++ 2008 SP1 编译的,并启用了最大优化和 SSE2 代码生成。测量是在 Core 2 Duo P8600 2.4 Ghz 上完成的。

以下是计时,不包括从数组中获取输入值、调用函数并检索结果的开销(总计 6-7 个时钟周期):

  • Naive 变体:15 个周期
  • Bit magic 变体:13 个周期
  • Stephens 的变体:6个刻度

Given two floating-point numbers, I'm looking for an efficient way to check if they have the same sign, given that if any of the two values is zero (+0.0 or -0.0), they should be considered to have the same sign.

For instance,

  • SameSign(1.0, 2.0) should return true
  • SameSign(-1.0, -2.0) should return true
  • SameSign(-1.0, 2.0) should return false
  • SameSign(0.0, 1.0) should return true
  • SameSign(0.0, -1.0) should return true
  • SameSign(-0.0, 1.0) should return true
  • SameSign(-0.0, -1.0) should return true

A naive but correct implementation of SameSign in C++ would be:

bool SameSign(float a, float b)
{
    if (fabs(a) == 0.0f || fabs(b) == 0.0f)
        return true;

    return (a >= 0.0f) == (b >= 0.0f);
}

Assuming the IEEE floating-point model, here's a variant of SameSign that compiles to branchless code (at least with with Visual C++ 2008):

bool SameSign(float a, float b)
{
    int ia = binary_cast<int>(a);
    int ib = binary_cast<int>(b);

    int az = (ia & 0x7FFFFFFF) == 0;
    int bz = (ib & 0x7FFFFFFF) == 0;
    int ab = (ia ^ ib) >= 0;

    return (az | bz | ab) != 0;
}

with binary_cast defined as follow:

template <typename Target, typename Source>
inline Target binary_cast(Source s)
{
    union
    {
        Source  m_source;
        Target  m_target;
    } u;
    u.m_source = s;
    return u.m_target;
}

I'm looking for two things:

  1. A faster, more efficient implementation of SameSign, using bit tricks, FPU tricks or even SSE intrinsics.

  2. An efficient extension of SameSign to three values.

Edit:

I've made some performance measurements on the three variants of SameSign (the two variants described in the original question, plus Stephen's one). Each function was run 200-400 times, on all consecutive pairs of values in an array of 101 floats filled at random with -1.0, -0.0, +0.0 and +1.0. Each measurement was repeated 2000 times and the minimum time was kept (to weed out all cache effects and system-induced slowdowns). The code was compiled with Visual C++ 2008 SP1 with maximum optimization and SSE2 code generation enabled. The measurements were done on a Core 2 Duo P8600 2.4 Ghz.

Here are the timings, not counting the overhead of fetching input values from the array, calling the function and retrieving the result (which amount to 6-7 clockticks):

  • Naive variant: 15 ticks
  • Bit magic variant: 13 ticks
  • Stephens's variant: 6 ticks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

≈。彩虹 2024-09-10 01:38:21

如果您不需要支持无穷大,您可以使用:

inline bool SameSign(float a, float b) {
    return a*b >= 0.0f;
}

这实际上在大多数现代硬件上都非常快,并且完全可移植。然而,它在 (零,无穷大) 的情况下无法正常工作,因为零 * 无穷大是 NaN,并且无论符号如何,比较都将返回 false。当 a 和 b 都很小时,它还会在某些硬件上引起非正常停顿。

If you don't need to support infinities, you can just use:

inline bool SameSign(float a, float b) {
    return a*b >= 0.0f;
}

which is actually pretty fast on most modern hardware, and is completely portable. It doesn't work properly in the (zero, infinity) case however, because zero * infinity is NaN, and the comparison will return false, regardless of the signs. It will also incur a denormal stall on some hardware when a and b are both tiny.

无声无音无过去 2024-09-10 01:38:21

也许是这样的:

inline bool same_sign(float a, float b) {
    return copysignf(a,b) == a;
}

请参阅 copysign 的手册页以获取有关其功能的更多信息(您也可能需要检查 -0 != +0),

或者如果您有 C99 函数

inline bool same_sign(float a, float b) {
    return signbitf(a) == signbitf(b);
}

作为旁注,则可能是这样,在 gcc 上至少两者copysign 和 signbit 是内置函数,因此它们应该很快,如果您想确保使用内置版本,您可以执行 __builtin_signbitf(a)

编辑:这也应该很容易扩展到 3 值情况(实际上两者其中应该...)

inline bool same_sign(float a, float b, float c) {
    return copysignf(a,b) == a && copysignf(a,c) == a;
}

// trust the compiler to do common sub-expression elimination
inline bool same_sign(float a, float b, float c) {
    return signbitf(a) == signbitf(b) && signbitf(a) == signbitf(c);
}

// the manpages do not say that signbit returns 1 for negative... however
// if it does this should be good, (no branches for one thing...)
inline bool same_sign(float a, float b, float c) {
    int s = signbitf(a) + signbitf(b) + signbitf(c);
    return !s || s==3;
}

perhaps something like:

inline bool same_sign(float a, float b) {
    return copysignf(a,b) == a;
}

see the man page for copysign for more info on what it does (also you may want to check that -0 != +0)

or possibly this if you have C99 functions

inline bool same_sign(float a, float b) {
    return signbitf(a) == signbitf(b);
}

as a side note, on gcc at least both copysign and signbit are builtin functions so they should be fast, if you want to make sure the builtin version is being used you can do __builtin_signbitf(a)

EDIT: this should also be easy to extend to the 3 value case as well (actually both of these should...)

inline bool same_sign(float a, float b, float c) {
    return copysignf(a,b) == a && copysignf(a,c) == a;
}

// trust the compiler to do common sub-expression elimination
inline bool same_sign(float a, float b, float c) {
    return signbitf(a) == signbitf(b) && signbitf(a) == signbitf(c);
}

// the manpages do not say that signbit returns 1 for negative... however
// if it does this should be good, (no branches for one thing...)
inline bool same_sign(float a, float b, float c) {
    int s = signbitf(a) + signbitf(b) + signbitf(c);
    return !s || s==3;
}
送君千里 2024-09-10 01:38:21

关于符号位的一个小注释:该宏返回一个 int 并且手册页指出“如果 x 的值设置了符号位,则它返回一个非零值”。这意味着当signbit为两个不同的负值返回两个不同的非零整数时,Spudd86的bool same_sign()不能保证工作。

首先转换为 bool 可确保正确的返回值:

inline bool same_sign(float a, float b) {
    return (bool)signbitf(a) == (bool)signbitf(b);
}

A small note on signbit: The macro returns an int and the man page states that "It returns a nonzero value if the value of x has its sign bit set." This means that the Spudd86's bool same_sign() is not guaranteed to work in case signbit returns two different non-zero int's for two different negative values.

Casting to bool first ensures a correct return value:

inline bool same_sign(float a, float b) {
    return (bool)signbitf(a) == (bool)signbitf(b);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文