如何计算 Map 的中位数?
对于一个映射,其中键代表序列中的一个数字,而值代表该数字在序列中出现的频率,那么 Java 中算法的实现如何计算中位数?
例如:
1,1,2,2,2,2,3,3,3,4,5,6,6,6,7,7
在地图中:
Map<Int,Int> map = ...
map.put(1,2)
map.put(2,4)
map.put(3,3)
map.put(4,1)
map.put(5,1)
map.put(6,3)
map.put(7,2)
double median = calculateMedian(map);
print(median);
会导致:
> print(median);
3
>
所以我正在寻找的是calculateMedian
的java实现。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
SortedMap
,即TreeMap
SortedMap
, i.e. aTreeMap
对于简单但可能不太高效的算法,我会这样做:
1。将地图展开为列表。
实际上:迭代地图并将键“value-times”添加到新列表中。最后对列表进行排序。
2.计算中位数
现在您必须实现方法
intcalculateMedian(Listsorted)
。这取决于您需要的中位数类型。如果只是样本中位数,则结果是最中间的值(对于具有奇数个元素的列表)或两个最中间值的平均值(对于具有偶数长度的列表)。请注意,该列表需要排序!(参考:样本中位数/维基百科)
好吧,好吧,尽管克里斯没有提到效率,这里有一个想法如何在不扩展地图的情况下计算样本中位数(!)...
(我手头没有编译器 - 如果它有很多语法错误,请将其视为伪代码;))
For in easy but maybe not-so-efficient algorithm I'd do it like this:
1. expand the map to a list.
practically spoken: iterate through the map and add the key 'value-times' to the new list. Finally sort the list.
2. calculate the median
now you have to implement a method
int calculateMedian(List<Integer> sorted)
. This depends on the kind of median you need. If it's just the sample median, then the result is either the middlemost value (for lists with an odd number of elements) or the average of the two middlemost values (for lists with an even length). Note, that the list needs to be sorted!(Ref: Sample Median / wikipedia)
OK, OK, even though Chris didn't mention efficiency, here's an idea how to calculate the sample median (!) without expanding the map...
(I have no compiler at hand - if it has to many syntax errors, treat it as pseudo code, please ;) )
使用 Guava:
现在您问题的答案是:
真的。就是这样。(或者检查大小是否均匀,并平均两个中心值,准确地说。)
如果计数特别大,使用多重集的
entrySet
并保持连续总和,但最简单的方法通常就可以了。Using Guava:
Now the answer to your question is:
Really. That's it. (Or check if size is even and average the two central values, to be precise about it.)
If the counts are particularly large, it would be faster to use the multiset's
entrySet
and keep a running sum, but the simplest way is usually fine.线性时间
如果您知道数字总数(在您的情况下为 16),您可以从地图的开头或结尾开始计算计数的总和,直到达到 round(n/第 2) 个元素,或者如果总和等于第 (n/2) 个元素和第 (n/2) 个元素的平均值 = 中位数。
如果您不知道总数,则必须至少将所有内容都检查一遍。
次线性时间
如果您可以决定数据结构并可以进行预处理,请参阅维基百科
编辑:
因此,假设我们有一个带有计数的序列,我们可以做的是
key ->; count
对维护另一个映射 -key -> running_total
这将使内存使用量加倍,但中位数的性能为O(log n),total_count的性能为O(1)。
Linear time
If you know the total of the numbers (in your case it is 16) you can go from the beginning or the end of the map and sum up the counts until you get to round(n/2)th element, or in case the sum is even to average of floor(n/2)th and ceil(n/2)th elements = median.
If you don't know the total count you will have to go through all of them at least once.
Sublinear time
If you can decide on the data structure and can do pre-processing see wikipedia on selection algorithm and you might get even sublinear algorithm.
You can also get sublinear time if you know something about the distribution of the data.
EDIT:
So under assumption that we have a sequence with counts what we can do is
key -> count
pairs maintain another map -key -> running_total
This will double the memory usage, but will give O(log n) performance for median and O(1) for total_count.