计算平均值的最佳数值方法是什么
计算平均值的最佳方法是什么?带着这个问题,我想知道哪种计算平均值的算法在数值意义上是最好的。它应该具有最小的舍入误差,不应该对上溢或下溢等敏感。
谢谢。
附加信息:首选增量方法,因为值的数量可能无法装入 RAM(对大于 4 GB 的文件进行多次并行计算)。
what's the best way to calculate the average? With this question I want to know which algorithm for calculating the average is the best in a numerical sense. It should have the least rounding errors, should not be sensitive to over- or underflows and so on.
Thank you.
Additional information: incremental approaches preferred since the number of values may not fit into RAM (several parallel calculations on files larger than 4 GB).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您想要 O(N) 算法,请查看 Kahan 求和。
If you want an O(N) algorithm, look at Kahan summation.
您可以查看 http://citeseer.ist.psu.edu/viewdoc/摘要?doi=10.1.1.43.3535(Nick Higham,“浮点求和的准确性”,SIAM 科学计算杂志, 1993)。
如果我没记错的话,如果所有数字都是正数,补偿求和(卡汉求和)就很好,至少和对它们进行排序并按升序相加一样好(除非有非常非常多的数字)。如果有些数字是正数,有些是负数,那么情况就会复杂得多,这样你就会被取消。在这种情况下,有一种观点认为按降序添加它们。
You can have a look at http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3535 (Nick Higham, "The accuracy of floating point summation", SIAM Journal of Scientific Computation, 1993).
If I remember it correctly, compensated summation (Kahan summation) is good if all numbers are positive, as least as good as sorting them and adding them in ascending order (unless there are very very many numbers). The story is much more complicated if some numbers are positive and some are negative, so that you get cancellation. In that case, there is an argument for adding them in descending order.
按数量级升序对数字进行排序。将它们相加,首先是低震级。除以计数。
Sort the numbers in ascending order of magnitude. Sum them, low magnitude first. Divide by the count.
我总是使用以下伪代码:
我没有其稳定性的正式证明,但您可以看到,假设数据值表现良好,我们不会遇到数值溢出问题。 Knuth 的计算机编程艺术中提到了它
I always use the following pseudocode:
I don't have formal proofs of its stability but you can see that we won't have problems with numerical overflow, assuming that the data values are well behaved. It's referred to in Knuth's The Art of Computer Programming
只是为了进一步讨论添加一个可能的答案:
增量计算每一步的平均值:
或两两组合
(我希望公式足够清楚)
Just to add one possible answer for further discussion:
Incrementally calculate the average for each step:
or pairwise combination
(I hope the formulas are clear enough)
一篇很晚的帖子,但由于我没有足够的声誉来发表评论,@Dave 的方法是 Gnu 科学图书馆。
以下是从mean_source.c 中提取的代码:
GSL 使用相同的算法来计算方差,毕竟,方差只是给定数字的平方差的平均值。
A very late post, but since I don't have enough reputation to comment, @Dave's method is the one used (as at December 2020) by the Gnu Scientific Library.
Here is the code, extracted from mean_source.c:
GSL uses the same algorithm to calculate the variance, which is, after all, just a mean of squared differences from a given number.