当前位置：文江博客话题详情

计算平均值的最佳数值方法是什么

发布于 2024-12-06 11:54:46 字数 157 浏览 2 评论 0原文

计算平均值的最佳方法是什么？带着这个问题，我想知道哪种计算平均值的算法在数值意义上是最好的。它应该具有最小的舍入误差，不应该对上溢或下溢等敏感。

谢谢。

附加信息：首选增量方法，因为值的数量可能无法装入 RAM（对大于 4 GB 的文件进行多次并行计算）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花间憩 2024-12-13 11:54:46

如果您想要 O(N) 算法，请查看 Kahan 求和。

回复收藏 0 原文

玉环 2024-12-13 11:54:46

您可以查看 http://citeseer.ist.psu.edu/viewdoc/摘要？doi=10.1.1.43.3535（Nick Higham，“浮点求和的准确性”，SIAM 科学计算杂志， 1993）。

如果我没记错的话，如果所有数字都是正数，补偿求和（卡汉求和）就很好，至少和对它们进行排序并按升序相加一样好（除非有非常非常多的数字）。如果有些数字是正数，有些是负数，那么情况就会复杂得多，这样你就会被取消。在这种情况下，有一种观点认为按降序添加它们。

回复收藏 0 原文

凉栀 2024-12-13 11:54:46

按数量级升序对数字进行排序。将它们相加，首先是低震级。除以计数。

回复收藏 0 原文

难以启齿的温柔 2024-12-13 11:54:46

我总是使用以下伪代码：

float mean=0.0; // could use doulbe
int n=0;  // could use long

for each x in data:
    ++n;
    mean+=(x-mean)/n;

我没有其稳定性的正式证明，但您可以看到，假设数据值表现良好，我们不会遇到数值溢出问题。 Knuth 的计算机编程艺术中提到了它

I always use the following pseudocode:

float mean=0.0; // could use doulbe
int n=0;  // could use long

for each x in data:
    ++n;
    mean+=(x-mean)/n;

I don't have formal proofs of its stability but you can see that we won't have problems with numerical overflow, assuming that the data values are well behaved. It's referred to in Knuth's The Art of Computer Programming

回复收藏 0 原文

泛泛之交 2024-12-13 11:54:46

只是为了进一步讨论添加一个可能的答案：

增量计算每一步的平均值：

AVG_n = AVG_(n-1) * (n-1)/n + VALUE_n / n

或两两组合

AVG_(n_a + n_b) = (n_a * AVG_a + n_b * AVG_b) / (n_a + n_b)

（我希望公式足够清楚）

回复收藏 0 原文

倾城°AllureLove 2024-12-13 11:54:46

一篇很晚的帖子，但由于我没有足够的声誉来发表评论，@Dave 的方法是 Gnu 科学图书馆。

以下是从mean_source.c 中提取的代码：

double FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* Compute the arithmetic mean of a dataset using the recurrence relation mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1)   */

long double mean = 0;
size_t i;

for (i = 0; i < size; i++)
{
  mean += (data[i * stride] - mean) / (i + 1);
}

return mean;
}

GSL 使用相同的算法来计算方差，毕竟，方差只是给定数字的平方差的平均值。

A very late post, but since I don't have enough reputation to comment, @Dave's method is the one used (as at December 2020) by the Gnu Scientific Library.

Here is the code, extracted from mean_source.c:

double FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* Compute the arithmetic mean of a dataset using the recurrence relation mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1)   */

long double mean = 0;
size_t i;

for (i = 0; i < size; i++)
{
  mean += (data[i * stride] - mean) / (i + 1);
}

return mean;
}

GSL uses the same algorithm to calculate the variance, which is, after all, just a mean of squared differences from a given number.

回复收藏 0 原文

~没有更多了~