计算浮点向量的点积时,灾难性抵消是否会成为问题?如果是这样,通常如何解决?
我正在用 C++ 编写一个物理模拟器,并且担心鲁棒性。我读到,当计算两个几乎相等大小的数字的差时,浮点运算中可能会发生灾难性的抵消。 我想到,当计算两个几乎正交的向量的点积时,这可能会在模拟器中发生。 然而,我看过的参考文献仅讨论通过重写相关方程来解决问题(例如,可以重写二次公式以消除问题) - 但这在计算点积时似乎并不适用? 我想我有兴趣知道这是否是物理引擎中的一个典型问题以及如何解决它。
I am writing a physics simulator in C++ and am concerned about robustness. I've read that catastrophic cancellation can occur in floating point arithmetic when the difference of two numbers of almost equal magnitude is calculated.
It occurred to me that this may happen in the simulator when the dot product of two almost orthogonal vectors is calculated.
However, the references I have looked at only discuss solving the problem by rewriting the equation concerned (eg the quadratic formula can be rewritten to eliminate the problem) - but this doesn't seem to apply when calculating a dot product?
I guess I'd be interested to know if this is typically an issue in physics engines and how it is addressed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一个常见的技巧是使累加器变量成为比向量本身精度更高的类型。
或者,在对项求和时可以使用 Kahan 求和。
另一种方法是使用各种块点积算法而不是规范的算法。
当然可以将上述两种方法结合起来。
请注意,以上是点积的一般错误行为,而不是具体的灾难性取消。
One common trick is to make the accumulator variable be a type with higher precision than the vectors itself.
Alternatively, one can use Kahan summation when summing the terms.
Another approach is to use various blocked dot product algorithms instead of the canonical algorithm.
One can of course combine both the above approaches.
Note that the above is wrt general error behavior for dot products, not specifically catastrophic cancellation.
您在评论中说您必须计算 x1*x2 + y1*y2,其中所有变量都是浮点数。因此,如果您以双精度进行计算,则根本不会损失任何精度,因为双精度的精度位数是浮点数的两倍多(假设您的目标使用 IEEE-754)。
具体来说:令
xx, yy
为float
变量x,y
表示的实数。令xxyy
为它们的乘积,并令xy
为双精度乘法x * y
的结果。那么在所有情况下,xxyy
都是xy
表示的实数。You say in a comment that you have to calculate x1*x2 + y1*y2, where all variables are floats. So if you do the calculation in double-precision, you lose no accuracy at all, because double-precision has more than twice as many bits of precision as float (assuming your target uses IEEE-754).
Specifically: let
xx, yy
be the real numbers represented by thefloat
variablesx, y
. Letxxyy
be their product, and letxy
be the result of the double-precision multiplicationx * y
. Then in all cases,xxyy
is the real number represented byxy
.