我需要在 Java 中有一个自动按值排序的映射 - 以便在我添加新的键值对或更新现有键的值时随时对其进行排序 -值对,甚至删除某些条目。
还请记住,这张地图将非常大(大小为数百个,甚至数百万个条目)。
所以基本上我正在寻找以下功能:
假设我们有一个实现上述功能的“SortedByValuesMap”类
我们有以下代码:
SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);
for (String key : sorted_map.keySet()) {
System.out.println(key + ":" + sorted_map.get(key));
}
输出应该是:
bananas:6
apples:4
lemons:3
oranges:2
特别是,对我来说真正重要的是能够使用
任何时候的最低值 - 使用如下命令:
smallestItem = sorted_map.lastEntry();
这应该给我“橙色”条目
编辑:我是一个 Java 新手,所以请在你的答案中详细说明一下 - 谢谢
编辑2:这可能有帮助:我用它来计算单词数(对于熟悉的人来说:特别是 n-gram)在巨大的文本文件中。所以我需要构建一个地图,其中键是单词,值是这些单词的频率。但是,由于限制(例如 RAM),我只想保留 X 个最常见的单词 - 但您当然无法事先知道哪些将是最常见的单词。因此,我认为它可能起作用的方式(作为近似值)是开始计算单词数,当地图达到上限(例如 100 万个条目)时,将删除最不频繁的条目,以便将地图的大小保持为总是一百万。
I need to have an automatically sorted-by-values map in Java - so that It keeps being sorted at any time while I'm adding new key-value pairs or update the value of an existing key-value pair, or even delete some entry.
Please also have in mind that this map is going to be really big (100's of thousands, or even 10's of millions of entries in size).
So basically I'm looking for the following functionality:
Supposed that we had a class 'SortedByValuesMap' that implements the aforementioned functionality
and we have the following code:
SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);
for (String key : sorted_map.keySet()) {
System.out.println(key + ":" + sorted_map.get(key));
}
the output should be:
bananas:6
apples:4
lemons:3
oranges:2
In particular, what's really important for me, is to be able to get the entry with the
lowest value at any time - using a command like:
smallestItem = sorted_map.lastEntry();
which should give me the 'oranges' entry
EDIT: I am a Java newbie so please elaborate a bit in your answers - thanks
EDIT2: This might help: I am using this for counting words (for those who are familiar: n-grams in particular) in huge text files. So I need to build a map where keys are words and values are the frequencies of those words. However, due to limitations (like RAM), I want to keep only the X most frequent words - but you can't know beforehand which are going to be the most frequent words of course. So, the way I thought it might work (as an approximation) is to start counting words and when the map reaches a top-limit (like 1 mil entries) , the least frequent entry will be deleted so as to keep the map's size to 1 mil always.
发布评论
评论(8)
保留2个数据结构:
HashMap
即可。用于跟踪顺序的“数组”,例如
list[count]
保存具有该计数的单词的Set
。为了符号方便,我把它写成一个数组。事实上,您可能不知道出现次数的上限,因此您需要一个可调整大小的数据结构。使用
Map>
实现。或者,如果使用太多内存,请使用ArrayList>
(您必须测试count == size() - 1
,如果是这样,请使用add()
而不是set(count + 1)
)。增加单词出现的次数(伪代码):
按顺序迭代单词(伪代码):
Keep 2 data structures:
HashMap<String, Long>
.An "array" to keep track of order, such that
list[count]
holds aSet<String>
of words with that count.I'm writing this as though it were an array as a notational convenience. In fact, you probably don't know an upper bound on the number of occurrences, so you need a resizable data structure. Implement using a
Map<Long, Set<String>>
. Or, if that uses too much memory, use anArrayList<Set<String>>
(you'll have to test forcount == size() - 1
, and if so, useadd()
instead ofset(count + 1)
).To increment the number of occurrences for a word (pseudocode):
To iterate over words in order (pseudocode):
如果 Long 值不同,那么使用附加索引或仅使用
TreeMap>
或TreeMap
怎么样?您还可以编写堆。
How about using additional index or only
TreeMap<Long, TreeSet<String>>
orTreeMap<Long, String>
if Long values are distinct?You can also write a Heap.
我发现需要一个类似的结构来保存按关联值排序的对象列表。根据 Mechanical snail 在该线程中的建议,我编写了此类地图的基本实现。请随意使用。
此实现并不遵守 Map 接口的所有约定,例如反映实际地图中返回的键集和条目集中的值更改和删除,但这样的解决方案包含在这样的论坛中会有点大。也许我会开发一个并通过 github 或类似的东西提供它。
I found the need of a similar structure to keep a list of objects ordered by associated values. Based on the suggestion from Mechanical snail in this thread, I coded up a basic implementation of such a map. Feel free to use.
This implementation does not honor all the contracts of the Map interface such as reflecting value changes and removals in the returned key set and entry sets in the actual map, but such a solution would be a bit large to include in a forum like this. Perhaps I will work on one and make it available via github or something similar.
番石榴 BiMap 解决方案:
Guava BiMap Solution:
尝试 http://paaloliver.wordpress.com 上发布的解决方案/2006/01/24/sorting-maps-in-java/ 。您也可以灵活地进行升序或降序排序。
这是他们所说
的
Try the solution posted on http://paaloliver.wordpress.com/2006/01/24/sorting-maps-in-java/ . You have the flexibility of doing sorting ascending or descending too.
Here is what they say
Outputs
更新:抱歉,您无法按值对地图进行排序。
您可以使用SortedMap
实现,如TreeMap
和Comparator
按值定义顺序(而不是默认 - 按键)。或者,更好的是,您可以将元素放入 PriorityQueue< /a> 具有按值预定义的比较器。与 TreeMap 相比,它应该更快并且占用更少的内存。
Update: You cannot sort maps by values, sorry.
You can useSortedMap
implementation likeTreeMap
withComparator
defining order by values (instead of default - by keys).Or, even better, you can put elements into a PriorityQueue with predefined comparator by values. It should be faster and take less memory compared to TreeMap.
您可以参考
java.util.LinkedHashMap
的实现。基本思想是,使用内部链表来存储订单。以下是一些细节:
从 HashMap 扩展。在HashMap中,每个条目都有一个键和值,这是基本的。您可以添加下一个和上一个指针来按值顺序存储条目。以及用于获取第一个和最后一个条目的标头和尾部指针。对于每次修改(添加、删除、更新),您可以添加自己的代码来更改列表顺序。它只不过是一个线性搜索和指针开关。
当然,如果条目太多,添加/更新会很慢,因为它是链表而不是数组。但只要列表是排序的,我相信有很多方法可以加快搜索速度。
所以这就是你得到的:一个在通过键检索条目时与 HashMap 具有相同速度的映射。按顺序存储条目的链接列表。
如果该解决方案满足您的要求,我们可以进一步讨论。
致杰塔尔伯恩:
正如我所说,如果没有任何优化,它肯定会很慢。由于我们现在讨论的是性能而不是实现,因此可以做很多事情。
一种解决方案是使用树而不是链表,例如红黑树。然后迭代树而不是迭代映射。
关于最小值,就比较容易了。只是用一个成员变量来存储最小值,当添加或更新元素时,更新最小值。删除时,在树中搜索最小的(这非常快),
如果树太复杂,也可以使用另一个列表/数组来标记列表中的某些位置。例如,每个元素可能有 100 个。那么搜索的时候就先搜索位置列表,再搜索真实列表即可。这个列表也需要维护,对于某些修改次数重新统计位置列表是合理的,也许100个。
You may refer to the implementation of
java.util.LinkedHashMap
.The basic idea is, using a inner linked list to store orders. Here is some details:
Extends from HashMap. In HashMap, each entry has a key and value, that is basic. You can Add a next and a prev pointer to store entries in order by value. And a header and tail pointer to get the first and last entry. For every modification (add, remove, update), you can add your own code to change the list order. It is no more than a linear search and pointer switch.
Sure it will be slow for add/update if there are too many entries because it is a linked list not array. But as long as the list is sorted, I believe there are lots of ways to speedup the search.
So here is what you got: A map that has the same speed with HashMap when retrieving an entry by a key. A linked list which stores entries in order.
We can discuss this further if this solution meets your requirement.
to jtahlborn:
As I said, it surely is slow without any optimization. Since we are talking about performance not impl now, lots of things can be done.
One solution is using a tree instead of Linked List, like Red-Black Tree. Then iterate the tree instead of iterator the map.
About the smallest value, it is easier. Just using a member variable to store the smallest, when add or update an element, update the smallest value. When delete, search the tree for the smallest (this is very fast)
if tree is too complex, it is also possible to using another list/array to mark the some positions in the list. for example, maybe 100 element each. Then when search, just search the position list first and then the real list. This list also needs to be maintained, it would be reasonable to recount the position list for certain times of modification, maybe 100.
如果您需要的只是“min”值,那么只需使用法线贴图并在修改时跟踪“min”值。
编辑:
所以,如果您确实需要值排序并且想要使用开箱即用的解决方案,那么您基本上需要 2 个集合。一张法线图(例如HashMap)和一张SortedSet(例如TreeSet>)。您可以通过 TreeSet 遍历有序元素,并使用 HashMap 按键查找频率。
显然,你总是可以自己编写一些类似于 LinkedHashMap 的东西,其中元素可以通过键定位并可以按顺序遍历,但这几乎将是完全自定义的代码(我怀疑任何特定的东西已经存在,但我可以错误的)。
if all you need is the "min" value, then just use a normal map and keep track of the "min" value anytime it is modified.
EDIT:
so, if you really need value ordering and you want to use out-of-the-box solutions, you basically need 2 collections. One normal map (e.g. HashMap), and one SortedSet (e.g. TreeSet>). you can traverse ordered elements via the TreeSet, and find frequencies by key using the HashMap.
obviously, you could always code up something yourself sort of like a LinkedHashMap, where the elements are locatable by key and traversable by order, but that's pretty much going to be entirely custom code (i doubt anything that specific already exists, but i could be wrong).