计算平均值的最佳数值方法是什么

发布于 2024-12-06 11:54:46 字数 157 浏览 2 评论 0原文

计算平均值的最佳方法是什么?带着这个问题,我想知道哪种计算平均值的算法在数值意义上是最好的。它应该具有最小的舍入误差,不应该对上溢或下溢等敏感。

谢谢。


附加信息:首选增量方法,因为值的数量可能无法装入 RAM(对大于 4 GB 的文件进行多次并行计算)。

what's the best way to calculate the average? With this question I want to know which algorithm for calculating the average is the best in a numerical sense. It should have the least rounding errors, should not be sensitive to over- or underflows and so on.

Thank you.


Additional information: incremental approaches preferred since the number of values may not fit into RAM (several parallel calculations on files larger than 4 GB).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

花间憩 2024-12-13 11:54:46

如果您想要 O(N) 算法,请查看 Kahan 求和

If you want an O(N) algorithm, look at Kahan summation.

玉环 2024-12-13 11:54:46

您可以查看 http://citeseer.ist.psu.edu/viewdoc/摘要?doi=10.1.1.43.3535(Nick Higham,“浮点求和的准确性”,SIAM 科学计算杂志, 1993)。

如果我没记错的话,如果所有数字都是正数,补偿求和(卡汉求和)就很好,至少和对它们进行排序并按升序相加一样好(除非有非常非常多的数字)。如果有些数字是正数,有些是负数,那么情况就会复杂得多,这样你就会被取消。在这种情况下,有一种观点认为按降序添加它们。

You can have a look at http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3535 (Nick Higham, "The accuracy of floating point summation", SIAM Journal of Scientific Computation, 1993).

If I remember it correctly, compensated summation (Kahan summation) is good if all numbers are positive, as least as good as sorting them and adding them in ascending order (unless there are very very many numbers). The story is much more complicated if some numbers are positive and some are negative, so that you get cancellation. In that case, there is an argument for adding them in descending order.

凉栀 2024-12-13 11:54:46

按数量级升序对数字进行排序。将它们相加,首先是低震级。除以计数。

Sort the numbers in ascending order of magnitude. Sum them, low magnitude first. Divide by the count.

难以启齿的温柔 2024-12-13 11:54:46

我总是使用以下伪代码:

float mean=0.0; // could use doulbe
int n=0;  // could use long

for each x in data:
    ++n;
    mean+=(x-mean)/n;

我没有其稳定性的正式证明,但您可以看到,假设数据值表现良好,我们不会遇到数值溢出问题。 Knuth 的计算机编程艺术中提到了它

I always use the following pseudocode:

float mean=0.0; // could use doulbe
int n=0;  // could use long

for each x in data:
    ++n;
    mean+=(x-mean)/n;

I don't have formal proofs of its stability but you can see that we won't have problems with numerical overflow, assuming that the data values are well behaved. It's referred to in Knuth's The Art of Computer Programming

泛泛之交 2024-12-13 11:54:46

只是为了进一步讨论添加一个可能的答案:

增量计算每一步的平均值:

AVG_n = AVG_(n-1) * (n-1)/n + VALUE_n / n

或两两组合

AVG_(n_a + n_b) = (n_a * AVG_a + n_b * AVG_b) / (n_a + n_b)

(我希望公式足够清楚)

Just to add one possible answer for further discussion:

Incrementally calculate the average for each step:

AVG_n = AVG_(n-1) * (n-1)/n + VALUE_n / n

or pairwise combination

AVG_(n_a + n_b) = (n_a * AVG_a + n_b * AVG_b) / (n_a + n_b)

(I hope the formulas are clear enough)

倾城°AllureLove 2024-12-13 11:54:46

一篇很晚的帖子,但由于我没有足够的声誉来发表评论,@Dave 的方法是 Gnu 科学图书馆

以下是从mean_source.c 中提取的代码:

double FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* Compute the arithmetic mean of a dataset using the recurrence relation mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1)   */

long double mean = 0;
size_t i;

for (i = 0; i < size; i++)
{
  mean += (data[i * stride] - mean) / (i + 1);
}

return mean;
}

GSL 使用相同的算法来计算方差,毕竟,方差只是给定数字的平方差的平均值。

A very late post, but since I don't have enough reputation to comment, @Dave's method is the one used (as at December 2020) by the Gnu Scientific Library.

Here is the code, extracted from mean_source.c:

double FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* Compute the arithmetic mean of a dataset using the recurrence relation mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1)   */

long double mean = 0;
size_t i;

for (i = 0; i < size; i++)
{
  mean += (data[i * stride] - mean) / (i + 1);
}

return mean;
}

GSL uses the same algorithm to calculate the variance, which is, after all, just a mean of squared differences from a given number.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文