浮点加法：精度损失问题

发布于 2024-07-30 03:43:03 字数 1381 浏览 2 评论 0原文

简而言之：如何执行a+b，使得由于截断而导致的任何精度损失远离零而不是接近零？

长话短说

我正在计算一长串浮点值的总和，以便计算该集合的样本均值和方差。由于 Var(X) = E(X²) - E(X)²，足以维持所有数字的运行计数，迄今为止所有数字的总和，以及迄今为止所有数字的平方和。

到目前为止，一切都很好。

但是，绝对需要 E(X²) > E(X)²，由于浮点精度的原因，情况并不总是如此。在伪代码中，问题是这样的：

int count;
double sum, sumOfSquares;
...
double value = <current-value>;
double sqrVal = value*value; 

count++;
sum += value; //slightly rounded down since value is truncated to fit into sum
sumOfSquares += sqrVal; //rounded down MORE since the order-of-magnitude 
//difference between sqrVal and sumOfSquares is twice that between value and sum;

对于可变序列，这不是一个大问题 - 您最终会稍微低估方差，但这通常不是一个大问题。然而，对于具有非零均值的常数或几乎常数集，它可能意味着 E(X²) E(X²) E(X²) E(X²) E(X)²，导致计算出的方差为负，这违反了使用代码的预期。

现在，我了解了卡汉求和，这不是一个有吸引力的解决方案。首先，它使代码容易受到优化变幻莫测的影响（根据优化标志，代码可能会或可能不会出现此问题），其次，问题并不是由于精度而真正 - 这很好足够了 - 这是因为加法引入了趋于零的系统误差。如果我可以

sumOfSquares += sqrVal;

以确保 sqrVal 向上舍入（而不是向下舍入）到 sumOfSquares 精度的方式执行该行，我将获得一个在数值上合理的解决方案。但我怎样才能做到这一点呢？

<子> 编辑：问题已完成 - 为什么在标签字段的下拉列表中按 Enter 键仍会提交问题？

原文

In short: how can I execute a+b such that any loss-of-precision due to truncation is away from zero rather than toward zero?

The Long Story

I'm computing the sum of a long series of floating point values for the purpose of computing the sample mean and variance of the set. Since Var(X) = E(X²) - E(X)², it suffices to maintain running count of all numbers, the sum of all numbers so far, and the sum of the squares of all numbers so far.

So far so good.

However, it's absolutely required that E(X²) > E(X)², which due to floating point accuracy isn't always the case. In pseudo-code, the problem is this:

int count;
double sum, sumOfSquares;
...
double value = <current-value>;
double sqrVal = value*value; 

count++;
sum += value; //slightly rounded down since value is truncated to fit into sum
sumOfSquares += sqrVal; //rounded down MORE since the order-of-magnitude 
//difference between sqrVal and sumOfSquares is twice that between value and sum;

For variable sequences, this isn't a big issue - you end up slightly under-estimating the variance, but it's often not a big issue. However, for constant or almost-constant sets with a non-zero mean, it can mean that E(X²) < E(X)², resulting in a negative computed variance, which violates expectations of consuming code.

Now, I know about Kahan Summation, which isn't an attractive solution. Firstly, it makes the code susceptible to optimization vagaries (depending on optimization flags, code may or may not exhibit this problem), and secondly, the problem isn't really due to the precision - which is good enough - it's because addition introduces systematic error towards zero. If I could execute the line

sumOfSquares += sqrVal;

in such a way as to ensure that sqrVal is rounded up, not down, into the precision of sumOfSquares, I'd have a numerically reasonable solution. But how can I achieve that?

_{Edit: Finished question - why does pressing enter in the drop-down-list in the tag field submit the question anyhow?}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沧桑㈠ 2024-08-06 03:43:03

还有另一种单遍算法可以稍微重新安排计算。在
伪代码：（

n = 0
mean = 0
M2 = 0

for x in data:
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    M2 = M2 + delta*(x - mean)  # This expression uses the new value of mean

variance_n = M2/n         # Sample variance
variance = M2/(n - 1)     # Unbiased estimate of population variance

：http://en.wikipedia.org/wiki/Algorithms_for_calculated_variance）

来源对于您指出的问题，似乎表现得更好
用通常的算法。

There's another single-pass algorithm which rearranges the calculation a bit. In
pseudocode:

n = 0
mean = 0
M2 = 0

for x in data:
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    M2 = M2 + delta*(x - mean)  # This expression uses the new value of mean

variance_n = M2/n         # Sample variance
variance = M2/(n - 1)     # Unbiased estimate of population variance

(Source: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance )

This seems better behaved with respect to the issues you pointed out
with the usual algorithm.

回复收藏 0 原文

深白境迁sunset 2024-08-06 03:43:03

IEEE 提供四种舍入模式（朝 -inf、朝 +inf、朝 0、tonearest）。走向 +inf 似乎是你想要的。 C90或C++中没有标准控件。 C99 添加了标头，它也在某些 C90 和 C++ 实现中作为扩展出现。要遵守 C99 标准，您必须编写如下内容：

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int old_round_mode = fegetround();
int set_round_ok = fesetround(FE_UPWARD);
assert(set_round_ok == 0);
...
int set_round_ok = fesetround(old_round_mode);
assert(set_round_ok == 0);

众所周知，您使用的算法在数值上不稳定并且存在精度问题。对数据进行两次传递可以获得更好的精度。

IEEE provides four rounding modes, (toward -inf, toward +inf, toward 0, tonearest). Toward +inf is what you seem to want. There is no standard control in C90 or C++. C99 added the header <fenv.h> which is also present as an extension in some C90 and C++ implementation. To respect the C99 standard, you'd have to write something like:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int old_round_mode = fegetround();
int set_round_ok = fesetround(FE_UPWARD);
assert(set_round_ok == 0);
...
int set_round_ok = fesetround(old_round_mode);
assert(set_round_ok == 0);

It is well known that the algorithm you use is numerically unstable and has precision problem. It is better for precision to do two passes on the data.

回复收藏 0 原文