如何解决反三角（和 sqrt）函数（C 语言）的浮点运算中的舍入误差？

发布于 2024-10-01 17:28:02 字数 899 浏览 14 评论 0原文

我有一个相当复杂的函数，它采用几个双精度值，这些值表示形式（幅度、纬度、经度）的 3 空间中的两个向量，其中纬度和经度以弧度和角度为单位。该函数的目的是将第一个向量围绕第二个向量旋转指定的角度并返回结果向量。我已经验证了该代码在逻辑上是正确的并且可以工作。

该函数的预期目的是用于图形，因此双精度不是必需的；然而，在目标平台上，采用浮点数（sinf、cosf、atan2f、asinf、acosf 和 sqrtf）的 trig（和 sqrt）函数在双精度数上比在浮点数上运行得更快（可能是因为计算这些值的指令实际上可能需要一个double；如果传递的是 float，则该值必须转换为 double，这需要将其复制到具有更多内存的区域（即开销）。因此，函数中涉及的所有变量都是双精度的。

问题是：我正在尝试优化我的函数，以便每秒可以调用更多次。因此，我将对 sin、cos、sqrt 等的调用替换为对这些函数的浮点版本的调用，因为它们的总体速度提高了 3-4 倍。这适用于几乎所有输入；然而，如果输入向量与标准单位向量（i、j 或 k）接近平行，则各种函数的舍入误差累积起来足以导致稍后调用 sqrtf 或反三角函数（asinf、acosf、 atan2f) 来传递仅仅在这些函数域之外的参数。

所以，我陷入了这样的困境：要么我只能调用双精度函数并避免这个问题（最终会受到每秒约 1,300,000 次向量运算的限制），要么我可以尝试想出其他办法。最终，我想要一种方法来清理反三角函数的输入，以处理边缘情况（对于 sqrt 来说这样做很简单：只需使用abs）。分支不是一种选择，因为即使是单个条件语句也会增加大量开销，以至于失去任何性能提升。

那么，有什么想法吗？

编辑：有人对我使用双精度与浮点运算表示困惑。如果我实际上将所有值存储在双倍大小的容器（即双精度类型变量）中，那么该函数比将它们存储在浮点大小的容器中要快得多。然而，由于显而易见的原因，浮点精度三角运算比双精度三角运算更快。

原文

I have a fairly complicated function that takes several double values that represent two vectors in 3-space of the form (magnitude, latitude, longitude) where latitude and longitude are in radians, and an angle. The purpose of the function is to rotate the first vector around the second by the angle specified and return the resultant vector. I have already verified that the code is logically correct and works.

The expected purpose of the function is for graphics, so double precision is not necessary; however, on the target platform, trig (and sqrt) functions that take floats (sinf, cosf, atan2f, asinf, acosf and sqrtf specifically) work faster on doubles than on floats (probably because the instruction to calculate such values may actually require a double; if a float is passed, the value must be cast to a double, which requires copying it to an area with more memory -- i.e. overhead). As a result, all of the variables involved in the function are double precision.

Here is the issue: I am trying to optimize my function so that it can be called more times per second. I have therefore replaced the calls to sin, cos, sqrt, et cetera with calls to the floating point versions of those functions, as they result in a 3-4 times speed increase overall. This works for almost all inputs; however, if the input vectors are close to parallel with the standard unit vectors (i, j, or k), round-off errors for the various functions build up enough to cause later calls to sqrtf or inverse trig functions (asinf, acosf, atan2f) to pass arguments that are just barely outside of the domain of those functions.

So, I am left with this dilemma: either I can only call double precision functions and avoid the problem (and end up with a limit of about 1,300,000 vector operations per second), or I can try to come up with something else. Ultimately, I would like a way to sanitize the input to the inverse trig functions to take care of edge cases (it is trivial for do so for sqrt: just use abs). Branching is not an option, as even a single conditional statement adds so much overhead that any performance gains are lost.

So, any ideas?

Edit: someone expressed confusion over my using doubles versus floating point operations. The function is much faster if I actually store all my values in double-size containers (I.E. double-type variables) than if I store them in float-size containers. However, floating point precision trig operations are faster than double precision trig operations for obvious reasons.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄与心 2024-10-08 17:28:02

基本上，您需要找到一个数值稳定算法来解决您的问题。对于这种事情没有通用的解决方案，需要使用诸如之类的概念针对您的具体情况来完成条件号（如果是各个步骤）。事实上，如果根本问题本身是病态的，那么这实际上是不可能的。

回复收藏 0 原文

梦中楼上月下 2024-10-08 17:28:02

单精度浮点本质上会引入误差。因此，您需要建立数学模型，以便通过使用 epsilon 因子使所有比较都具有一定程度的“倾斜”，并且您需要清理具有有限域的函数的输入。

前者在分支时很容易，例如，

bool IsAlmostEqual( float a, float b ) { return fabs(a-b) < 0.001f; } // or
bool IsAlmostEqual( float a, float b ) { return fabs(a-b) < (a * 0.0001f); } // for relative error

但那很混乱。限制域输入有点棘手，但更好。关键是使用条件移动运算符，通常会

float ExampleOfConditionalMoveIntrinsic( float comparand, float a, float b ) 
{ return comparand >= 0.0f ? a : b ; }

在单个操作中执行类似的操作，而不会产生分支。

这些因架构而异。在 x87 浮点单元上，您可以使用 FCMOV 条件移动操作来完成此操作，但这是笨拙，因为它取决于之前设置的条件标志，所以速度很慢。此外，cmov 没有一致的编译器内在函数。这就是我们尽可能避免使用 x87 浮点而支持 SSE2 标量数学的原因之一。

通过配对在 SSE 中可以更好地支持条件移动带有按位 AND 的比较运算符。即使对于标量数学来说，这也是更可取的：

// assuming you've already used _mm_load_ss to load your floats onto registers 
__m128 fsel( __m128 comparand, __m128 a, __m128 b ) 
{
    __m128 zero = {0,0,0,0};
    // set low word of mask to all 1s if comparand > 0
    __m128 mask = _mm_cmpgt_ss( comparand, zero );  
    a = _mm_and_ss( a, mask );    // a = a & mask 
    b = _mm_andnot_ss( mask, b ); // b = ~mask & b
    return _mm_or_ss( a, b );     // return a | b
    }
}

当启用 SSE2 标量数学时，编译器可以更好地为三元数发出这种模式，但不是很好。您可以使用 MSVC 上的编译器标志 /arch:sse2 或 GCC 上的 -mfpmath=sse 来完成此操作。

在 PowerPC 和许多其他 RISC 架构上，fsel() 是一个硬件操作码，因此通常也是一个编译器内在函数。

Single precision floating point inherently introduces error. So, you need to build your math so that all comparisons have a certain degree of "slop" by using an epsilon factor, and you need to sanitize inputs to functions with limited domains.

The former is easy enough when branching, eg

bool IsAlmostEqual( float a, float b ) { return fabs(a-b) < 0.001f; } // or
bool IsAlmostEqual( float a, float b ) { return fabs(a-b) < (a * 0.0001f); } // for relative error

but that's messy. Clamping domain inputs is a little trickier, but better. The key is to use conditional move operators, which in general do something like

float ExampleOfConditionalMoveIntrinsic( float comparand, float a, float b ) 
{ return comparand >= 0.0f ? a : b ; }

in a single op, without incurring a branch.

These vary depending on architecture. On the x87 floating point unit you can do it with the FCMOV conditional-move op, but that is clumsy because it depends on condition flags being set previously, so it's slow. Also, there isn't a consistent compiler intrinsic for cmov. This is one of the reasons why we avoid x87 floating point in favor of SSE2 scalar math where possible.

Conditional move is much better supported in SSE by pairing a comparison operator with a bitwise AND. This is preferable even for scalar math:

// assuming you've already used _mm_load_ss to load your floats onto registers 
__m128 fsel( __m128 comparand, __m128 a, __m128 b ) 
{
    __m128 zero = {0,0,0,0};
    // set low word of mask to all 1s if comparand > 0
    __m128 mask = _mm_cmpgt_ss( comparand, zero );  
    a = _mm_and_ss( a, mask );    // a = a & mask 
    b = _mm_andnot_ss( mask, b ); // b = ~mask & b
    return _mm_or_ss( a, b );     // return a | b
    }
}

Compilers are better, but not great, about emitting this sort of pattern for ternaries when SSE2 scalar math is enabled. You can do that with the compiler flag /arch:sse2 on MSVC or -mfpmath=sse on GCC.

On the PowerPC and many other RISC architectures, fsel() is a hardware opcode and thus usually a compiler intrinsic as well.

回复收藏 0 原文