计算实际平均值
我有一组相对较少(~100 个值)的整数:每个整数代表我运行的测试持续了多长时间(以毫秒为单位)。
计算平均值的简单算法是将所有 n
值相加,然后将结果除以 n
,但这并没有考虑到一些高得离谱的高/低值一定是错误的,应该被丢弃。
有哪些算法可用于估计实际平均值?
I've got a relatively little (~100 values) set of integers: each of them represents how much time (in millisecond) a test I ran lasted.
The trivial algorithm to calculate the average is to sum up all the n
values and divide the result by n
, but this doesn't take into account that some ridiculously high/low value must be wrong and should get discarded.
What algorithms are available to estimate the actual average value?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
正如您所说,您可以丢弃与平均值偏差超过给定值的所有值,然后重新计算平均值。另一个有趣的值是中位数,它是最常见的值。
As you said you can discard all values that diverge more than a given value from the average and then recompute the average. Another value that can be interesting is the Median, that is the most frequent value.
这取决于您的测试的不同条件。这是概率论的任务。
最简单的方法之一是尝试计算中位数,您可以处理高得离谱的高/低值。看下面的链接:
关于中位数的 Wiki
It depends on different conditions of your test. And it is a task from probability theory.
One of the simplest way is to try calculate a median, that you can deal with ridiculously high/low values. Look at link below:
Wiki about median
正如您所指出的,如果值非常高/低,算术平均值就不好。
正如有人建议的那样,您可以计算中位数,即在值的排序列表中,“中间”值(如果您的集合包含不均匀数量的项目)或两个“中间”值的算术平均值(否则)。
另一种方法是删除最低和最高的五个百分位数并计算算术平均值其余的部分。
As you noted, the arithmetic mean isn't good if there are very high/low values.
You could compute the median, as someone suggested, which is, in a sorted list of your values, the "middle" value (if your set contains an uneven amount of items) or the arithmetic mean of the two "middle" values (else).
Another method would be to drop, say, the lowest and highest five percentiles and compute the arithmetic mean of the rest.
一些选项:
维基百科列出了一些方法来计算不同的“平均值”值
Some options:
Wikipedia lists some ways to compute different "mean" values