根据出现频率排列列表中的元素(包含重复元素)
根据列表中出现的频率来排列列表元素(具有重复元素)的好方法是什么?
我需要使用列表中最常出现的 5 个项目。
我正在考虑使用 HashMap 来计算元素的频率,方法是每次元素出现时增加相应的计数器。然后进行 HashMap 迭代 5 次以找到最高频率。每次迭代的元素。
What would be a good way to arrange the elements of a list(with repeating elements) according to the frequency of their occurrence in the list.
I need to use the top 5 frequently occurring items in the list.
I am thinking of using HashMap to count the elements' frequencies by incrementing the corresponding counter each time an element occurs & then doing HashMap iteration 5 times to find the highest freq. element over each iteration.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以使用 番石榴
Multiset
和 按频率排序关于性能。当然,这取决于您有多少个不同的值,但是这个测试代码在我的机器上花费了大约一秒钟。我想说这对于 1000 万个项目来说是足够合理的:
You can use a Guava
Multiset
and order it by frequencyAnd about performance. Of course it depends on how many distinct values you have, but this test code took about a second on my machine. And I'd say that's reasonable enough for 10 M items:
任何基于比较的排序都会导致 O(N log N) 或更糟糕的时间复杂度,因此(渐近地)这些都不是好建议。
您的方法具有
O(N)
时间复杂度,这是您可以获得的最佳方法。您可以尝试降低常量(当前您对列表的元素进行大约6*N
次访问)。我会像这样进行两次迭代:首先使用 HashMap 计算频率。接下来,迭代映射中的条目并保留迄今为止看到的 5 个最常见值的有序 5 元素数组。对于每个新元素,检查该值是否比迄今为止第五个最常见的值更常见,并在必要时更新“前 5 个”。
更新一个更简单的解决方案,
具有相同的时间复杂度。首先,使用HashMap
计算频率。接下来,将所有条目放入PriorityQueue< /code>
并弹出五个值。这些条目应该是值-频率对,可按频率进行比较(如 @Jigar 的解决方案中所示)。这样的顺序不会“与 equals 一致”(请参阅 类似的解释),但没关系。
Any comparison-based sorting will incur
O(N log N)
or worse time complexity, so (asymptotically) these are not good advices.Your approach has
O(N)
time complexity, and that's the best you can get. You can try to lower the constant (currently you're making roughly6*N
accesses to elements of the list).I would do it in two iterations like this: first count the frequencies using a HashMap. Next, iterate over the entries in the map and keep an ordered 5-element array of the 5 most frequent values seen so far. With each new element, check whether the value is more common than the 5th most common so far, and update the "Top 5" if necessary.
UPDATE A simpler solution,
of the same time complexity. First, calculate frequencies usingHashMap
. Next, put all the entries into aPriorityQueue
and pop five values. The entries should be value-frequency pairs, comparable by frequency (as in @Jigar's solution). Such an ordering will not be "consistent with equals" (see Comparable for explanation), but that's OK.我还会使用 HashMap。我找到了一些代码,我就是这么做的:
列出元素:
I would also use a HashMap. I found some code where I did just that:
Listing the elements:
这种方法怎么样?
维护一个包含计数的映射,
只需使用
list
中的更新来更新映射。使用
addFooToList()
、removeFooFromList()
强制包装对 List 的访问,并封装其中的地图更新逻辑。How about this approach ?
maintain a map in that holds count
just update map with update in
list
.Wrap the access to List forcefully with the
addFooToList()
,removeFooFromList()
and encapsulate the map updation logic there.