Java中按值映射自动排序

发布于 2024-12-05 09:04:38 字数 1057 浏览 2 评论 0 原文

我需要在 Java 中有一个自动按值排序的映射 - 以便在我添加新的键值对或更新现有键的值时随时对其进行排序 -值对,甚至删除某些条目。

还请记住,这张地图将非常大(大小为数百个,甚至数百万个条目)。

所以基本上我正在寻找以下功能:

假设我们有一个实现上述功能的“SortedByValuesMap”类 我们有以下代码:

SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);

for (String key : sorted_map.keySet()) {
  System.out.println(key + ":" + sorted_map.get(key));
}

输出应该是:

bananas:6
apples:4
lemons:3
oranges:2

特别是,对我来说真正重要的是能够使用 任何时候的最低值 - 使用如下命令:

smallestItem = sorted_map.lastEntry();

这应该给我“橙色”条目

编辑:我是一个 Java 新手,所以请在你的答案中详细说明一下 - 谢谢

编辑2:这可能有帮助:我用它来计算单词数(对于熟悉的人来说:特别是 n-gram)在巨大的文本文件中。所以我需要构建一个地图,其中键是单词,值是这些单词的频率。但是,由于限制(例如 RAM),我只想保留 X 个最常见的单词 - 但您当然无法事先知道哪些将是最常见的单词。因此,我认为它可能起作用的方式(作为近似值)是开始计算单词数,当地图达到上限(例如 100 万个条目)时,将删除最不频繁的条目,以便将地图的大小保持为总是一百万。

I need to have an automatically sorted-by-values map in Java - so that It keeps being sorted at any time while I'm adding new key-value pairs or update the value of an existing key-value pair, or even delete some entry.

Please also have in mind that this map is going to be really big (100's of thousands, or even 10's of millions of entries in size).

So basically I'm looking for the following functionality:

Supposed that we had a class 'SortedByValuesMap' that implements the aforementioned functionality
and we have the following code:

SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);

for (String key : sorted_map.keySet()) {
  System.out.println(key + ":" + sorted_map.get(key));
}

the output should be:

bananas:6
apples:4
lemons:3
oranges:2

In particular, what's really important for me, is to be able to get the entry with the
lowest value at any time - using a command like:

smallestItem = sorted_map.lastEntry();

which should give me the 'oranges' entry

EDIT: I am a Java newbie so please elaborate a bit in your answers - thanks

EDIT2: This might help: I am using this for counting words (for those who are familiar: n-grams in particular) in huge text files. So I need to build a map where keys are words and values are the frequencies of those words. However, due to limitations (like RAM), I want to keep only the X most frequent words - but you can't know beforehand which are going to be the most frequent words of course. So, the way I thought it might work (as an approximation) is to start counting words and when the map reaches a top-limit (like 1 mil entries) , the least frequent entry will be deleted so as to keep the map's size to 1 mil always.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

怀里藏娇 2024-12-12 09:04:38

保留2个数据结构:

  • 单词词典->数数。只需使用普通的 HashMap 即可。
  • 用于跟踪顺序的“数组”,例如 list[count] 保存具有该计数的单词的 Set

    为了符号方便,我把它写成一个数组。事实上,您可能不知道出现次数的上限,因此您需要一个可调整大小的数据结构。使用 Map> 实现。或者,如果使用太多内存,请使用 ArrayList> (您必须测试 count == size() - 1,如果是这样,请使用 add() 而不是 set(count + 1))。

增加单词出现的次数(伪代码):

// assumes data structures are in instance variables dict and arr
public void tally(final String word)
{
    final long count = this.dict.get(word) or 0 if absent;
    this.dict.put(word, count + 1);
    // move word up one place in arr
    this.arr[count].remove(word);   // This is why we use a Set: for fast deletion here.
    this.arr[count + 1].add(word);
}

按顺序迭代单词(伪代码):

for(int count = 0; count < arr.size; count++)
    for(final String word : this.arr[count])
        process(word, count);

Keep 2 data structures:

  • A dictionary of words -> count. Just use an ordinary HashMap<String, Long>.
  • An "array" to keep track of order, such that list[count] holds a Set<String> of words with that count.

    I'm writing this as though it were an array as a notational convenience. In fact, you probably don't know an upper bound on the number of occurrences, so you need a resizable data structure. Implement using a Map<Long, Set<String>>. Or, if that uses too much memory, use an ArrayList<Set<String>> (you'll have to test for count == size() - 1, and if so, use add() instead of set(count + 1)).

To increment the number of occurrences for a word (pseudocode):

// assumes data structures are in instance variables dict and arr
public void tally(final String word)
{
    final long count = this.dict.get(word) or 0 if absent;
    this.dict.put(word, count + 1);
    // move word up one place in arr
    this.arr[count].remove(word);   // This is why we use a Set: for fast deletion here.
    this.arr[count + 1].add(word);
}

To iterate over words in order (pseudocode):

for(int count = 0; count < arr.size; count++)
    for(final String word : this.arr[count])
        process(word, count);
一抹微笑 2024-12-12 09:04:38

如果 Long 值不同,那么使用附加索引或仅使用 TreeMap>TreeMap 怎么样?

您还可以编写

How about using additional index or only TreeMap<Long, TreeSet<String>> or TreeMap<Long, String> if Long values are distinct?

You can also write a Heap.

梦在深巷 2024-12-12 09:04:38

我发现需要一个类似的结构来保存按关联值排序的对象列表。根据 Mechanical snail 在该线程中的建议,我编写了此类地图的基本实现。请随意使用。

import java.util.*;

/**
 * A map where {@link #keySet()} and {@link #entrySet()} return sets ordered
 * with ascending associated values with respect to the the comparator provided
 * at constuction. The order of two or more keys with identical values is not
 * defined.
 * <p>
 * Several contracts of the Map interface are not satisfied by this minimal
 * implementation.
 */
public class ValueSortedMap<K, V> extends HashMap<K, V> {
    protected Map<V, Collection<K>> valueToKeysMap;

    public ValueSortedMap() {
        this((Comparator<? super V>) null);
    }

    public ValueSortedMap(Comparator<? super V> valueComparator) {
        this.valueToKeysMap = new TreeMap<V, Collection<K>>(valueComparator);
    }

    public boolean containsValue(Object o) {
        return valueToKeysMap.containsKey(o);
    }

    public V put(K k, V v) {
        V oldV = null;
        if (containsKey(k)) {
            oldV = get(k);
            valueToKeysMap.get(oldV).remove(k);
        }
        super.put(k, v);
        if (!valueToKeysMap.containsKey(v)) {
            Collection<K> keys = new ArrayList<K>();
            keys.add(k);
            valueToKeysMap.put(v, keys);
        } else {
            valueToKeysMap.get(v).add(k);
        }
        return oldV;
    }

    public void putAll(Map<? extends K, ? extends V> m) {
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet())
            put(e.getKey(), e.getValue());
    }

    public V remove(Object k) {
        V oldV = null;
        if (containsKey(k)) {
            oldV = get(k);
            super.remove(k);
            valueToKeysMap.get(oldV).remove(k);
        }
        return oldV;
    }

    public void clear() {
        super.clear();
        valueToKeysMap.clear();
    }

    public Set<K> keySet() {
        LinkedHashSet<K> ret = new LinkedHashSet<K>(size());
        for (V v : valueToKeysMap.keySet()) {
            Collection<K> keys = valueToKeysMap.get(v);
            ret.addAll(keys);
        }
        return ret;
    }

    public Set<Map.Entry<K, V>> entrySet() {
        LinkedHashSet<Map.Entry<K, V>> ret = new LinkedHashSet<Map.Entry<K, V>>(size());
        for (Collection<K> keys : valueToKeysMap.values()) {
            for (final K k : keys) {
                final V v = get(k);
                ret.add(new Map.Entry<K,V>() {
                    public K getKey() {
                        return k;
                    }

                    public V getValue() {
                        return v;
                    }

                    public V setValue(V v) {
                        throw new UnsupportedOperationException();
                    }
                });
            }
        }
        return ret;
    }
}

此实现并不遵守 Map 接口的所有约定,例如反映实际地图中返回的键集和条目集中的值更改和删除,但这样的解决方案包含在这样的论坛中会有点大。也许我会开发一个并通过 github 或类似的东西提供它。

I found the need of a similar structure to keep a list of objects ordered by associated values. Based on the suggestion from Mechanical snail in this thread, I coded up a basic implementation of such a map. Feel free to use.

import java.util.*;

/**
 * A map where {@link #keySet()} and {@link #entrySet()} return sets ordered
 * with ascending associated values with respect to the the comparator provided
 * at constuction. The order of two or more keys with identical values is not
 * defined.
 * <p>
 * Several contracts of the Map interface are not satisfied by this minimal
 * implementation.
 */
public class ValueSortedMap<K, V> extends HashMap<K, V> {
    protected Map<V, Collection<K>> valueToKeysMap;

    public ValueSortedMap() {
        this((Comparator<? super V>) null);
    }

    public ValueSortedMap(Comparator<? super V> valueComparator) {
        this.valueToKeysMap = new TreeMap<V, Collection<K>>(valueComparator);
    }

    public boolean containsValue(Object o) {
        return valueToKeysMap.containsKey(o);
    }

    public V put(K k, V v) {
        V oldV = null;
        if (containsKey(k)) {
            oldV = get(k);
            valueToKeysMap.get(oldV).remove(k);
        }
        super.put(k, v);
        if (!valueToKeysMap.containsKey(v)) {
            Collection<K> keys = new ArrayList<K>();
            keys.add(k);
            valueToKeysMap.put(v, keys);
        } else {
            valueToKeysMap.get(v).add(k);
        }
        return oldV;
    }

    public void putAll(Map<? extends K, ? extends V> m) {
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet())
            put(e.getKey(), e.getValue());
    }

    public V remove(Object k) {
        V oldV = null;
        if (containsKey(k)) {
            oldV = get(k);
            super.remove(k);
            valueToKeysMap.get(oldV).remove(k);
        }
        return oldV;
    }

    public void clear() {
        super.clear();
        valueToKeysMap.clear();
    }

    public Set<K> keySet() {
        LinkedHashSet<K> ret = new LinkedHashSet<K>(size());
        for (V v : valueToKeysMap.keySet()) {
            Collection<K> keys = valueToKeysMap.get(v);
            ret.addAll(keys);
        }
        return ret;
    }

    public Set<Map.Entry<K, V>> entrySet() {
        LinkedHashSet<Map.Entry<K, V>> ret = new LinkedHashSet<Map.Entry<K, V>>(size());
        for (Collection<K> keys : valueToKeysMap.values()) {
            for (final K k : keys) {
                final V v = get(k);
                ret.add(new Map.Entry<K,V>() {
                    public K getKey() {
                        return k;
                    }

                    public V getValue() {
                        return v;
                    }

                    public V setValue(V v) {
                        throw new UnsupportedOperationException();
                    }
                });
            }
        }
        return ret;
    }
}

This implementation does not honor all the contracts of the Map interface such as reflecting value changes and removals in the returned key set and entry sets in the actual map, but such a solution would be a bit large to include in a forum like this. Perhaps I will work on one and make it available via github or something similar.

盛装女皇 2024-12-12 09:04:38

番石榴 BiMap 解决方案:

//Prepare original data
BiMap<String, Integer> biMap = HashBiMap.create();
biMap.put("apples" , 4);
biMap.put("oranges", 2);
biMap.put("bananas", 1);
biMap.put("lemons" , 3);
biMap.put("bananas", 6);

//Create a desc order SortedMap
SortedMap<Integer, String> sortedMap = new TreeMap<Integer, String>(new Comparator<Integer>(){
    @Override public int compare(Integer o1, Integer o2) {
      return o2-o1;
}});

//Put inversed map
sortedMap.putAll(biMap.inverse());
for (Map.Entry<Integer, String> e: sortedMap.entrySet()) {
      System.out.println(e);
}
System.out.println(sortedMap.lastKey()); 

Guava BiMap Solution:

//Prepare original data
BiMap<String, Integer> biMap = HashBiMap.create();
biMap.put("apples" , 4);
biMap.put("oranges", 2);
biMap.put("bananas", 1);
biMap.put("lemons" , 3);
biMap.put("bananas", 6);

//Create a desc order SortedMap
SortedMap<Integer, String> sortedMap = new TreeMap<Integer, String>(new Comparator<Integer>(){
    @Override public int compare(Integer o1, Integer o2) {
      return o2-o1;
}});

//Put inversed map
sortedMap.putAll(biMap.inverse());
for (Map.Entry<Integer, String> e: sortedMap.entrySet()) {
      System.out.println(e);
}
System.out.println(sortedMap.lastKey()); 
弃爱 2024-12-12 09:04:38

尝试 http://paaloliver.wordpress.com 上发布的解决方案/2006/01/24/sorting-maps-in-java/ 。您也可以灵活地进行升序或降序排序。

这是他们所说

import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.SortedMap;
import java.util.TreeMap;

public class MapValueSort {

    /** inner class to do soring of the map **/
    private static class ValueComparer implements Comparator<String> {
        private Map<String, String>  _data = null;
        public ValueComparer (Map<String, String> data){
            super();
            _data = data;
        }

         public int compare(String o1, String o2) {
             String e1 = (String) _data.get(o1);
             String e2 = (String) _data.get(o2);
             return e1.compareTo(e2);
         }
    }

    public static void main(String[] args){

        Map<String, String> unsortedData = new HashMap<String, String>();
        unsortedData.put("2", "DEF");
        unsortedData.put("1", "ABC");
        unsortedData.put("4", "ZXY");
        unsortedData.put("3", "BCD");


        SortedMap<String, String> sortedData = new TreeMap<String, String>(new MapValueSort.ValueComparer(unsortedData));

        printMap(unsortedData);

        sortedData.putAll(unsortedData);
        System.out.println();
        printMap(sortedData);
    }

    private static void printMap(Map<String, String> data) {
        for (Iterator<String> iter = data.keySet().iterator(); iter.hasNext();) {
            String key = (String) iter.next();
            System.out.println("Value/key:"+data.get(key)+"/"+key);
        }
    }

}

Value/key:BCD/3
Value/key:DEF/2
Value/key:ABC/1
Value/key:ZXY/4

Value/key:ABC/1
Value/key:BCD/3
Value/key:DEF/2
Value/key:ZXY/4

Try the solution posted on http://paaloliver.wordpress.com/2006/01/24/sorting-maps-in-java/ . You have the flexibility of doing sorting ascending or descending too.

Here is what they say

import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.SortedMap;
import java.util.TreeMap;

public class MapValueSort {

    /** inner class to do soring of the map **/
    private static class ValueComparer implements Comparator<String> {
        private Map<String, String>  _data = null;
        public ValueComparer (Map<String, String> data){
            super();
            _data = data;
        }

         public int compare(String o1, String o2) {
             String e1 = (String) _data.get(o1);
             String e2 = (String) _data.get(o2);
             return e1.compareTo(e2);
         }
    }

    public static void main(String[] args){

        Map<String, String> unsortedData = new HashMap<String, String>();
        unsortedData.put("2", "DEF");
        unsortedData.put("1", "ABC");
        unsortedData.put("4", "ZXY");
        unsortedData.put("3", "BCD");


        SortedMap<String, String> sortedData = new TreeMap<String, String>(new MapValueSort.ValueComparer(unsortedData));

        printMap(unsortedData);

        sortedData.putAll(unsortedData);
        System.out.println();
        printMap(sortedData);
    }

    private static void printMap(Map<String, String> data) {
        for (Iterator<String> iter = data.keySet().iterator(); iter.hasNext();) {
            String key = (String) iter.next();
            System.out.println("Value/key:"+data.get(key)+"/"+key);
        }
    }

}

Outputs

Value/key:BCD/3
Value/key:DEF/2
Value/key:ABC/1
Value/key:ZXY/4

Value/key:ABC/1
Value/key:BCD/3
Value/key:DEF/2
Value/key:ZXY/4
与君绝 2024-12-12 09:04:38

更新:抱歉,您无法按值对地图进行排序。

您可以使用 SortedMap 实现,如 TreeMapComparator 按值定义顺序(而不是默认 - 按键)。

或者,更好的是,您可以将元素放入 PriorityQueue< /a> 具有按值预定义的比较器。与 TreeMap 相比,它应该更快并且占用更少的内存。

Update: You cannot sort maps by values, sorry.

You can use SortedMap implementation like TreeMap with Comparator defining order by values (instead of default - by keys).

Or, even better, you can put elements into a PriorityQueue with predefined comparator by values. It should be faster and take less memory compared to TreeMap.

沫离伤花 2024-12-12 09:04:38

您可以参考java.util.LinkedHashMap的实现。
基本思想是,使用内部链表来存储订单。以下是一些细节:

从 HashMap 扩展。在HashMap中,每个条目都有一个键和值,这是基本的。您可以添加下一个和上一个指针来按值顺序存储条目。以及用于获取第一个和最后一个条目的标头和尾部指针。对于每次修改(添加、删除、更新),您可以添加自己的代码来更改列表顺序。它只不过是一个线性搜索和指针开关。

当然,如果条目太多,添加/更新会很慢,因为它是链表而不是数组。但只要列表是排序的,我相信有很多方法可以加快搜索速度。

所以这就是你得到的:一个在通过键检索条目时与 HashMap 具有相同速度的映射。按顺序存储条目的链接列表。

如果该解决方案满足您的要求,我们可以进一步讨论。


致杰塔尔伯恩:
正如我所说,如果没有任何优化,它肯定会很慢。由于我们现在讨论的是性能而不是实现,因此可以做很多事情。

一种解决方案是使用树而不是链表,例如红黑树。然后迭代树而不是迭代映射。

关于最小值,就比较容易了。只是用一个成员变量来存储最小值,当添加或更新元素时​​,更新最小值。删除时,在树中搜索最小的(这非常快),

如果树太复杂,也可以使用另一个列表/数组来标记列表中的某些位置。例如,每个元素可能有 100 个。那么搜索的时候就先搜索位置列表,再搜索真实列表即可。这个列表也需要维护,对于某些修改次数重新统计位置列表是合理的,也许100个。

You may refer to the implementation of java.util.LinkedHashMap.
The basic idea is, using a inner linked list to store orders. Here is some details:

Extends from HashMap. In HashMap, each entry has a key and value, that is basic. You can Add a next and a prev pointer to store entries in order by value. And a header and tail pointer to get the first and last entry. For every modification (add, remove, update), you can add your own code to change the list order. It is no more than a linear search and pointer switch.

Sure it will be slow for add/update if there are too many entries because it is a linked list not array. But as long as the list is sorted, I believe there are lots of ways to speedup the search.

So here is what you got: A map that has the same speed with HashMap when retrieving an entry by a key. A linked list which stores entries in order.

We can discuss this further if this solution meets your requirement.


to jtahlborn:
As I said, it surely is slow without any optimization. Since we are talking about performance not impl now, lots of things can be done.

One solution is using a tree instead of Linked List, like Red-Black Tree. Then iterate the tree instead of iterator the map.

About the smallest value, it is easier. Just using a member variable to store the smallest, when add or update an element, update the smallest value. When delete, search the tree for the smallest (this is very fast)

if tree is too complex, it is also possible to using another list/array to mark the some positions in the list. for example, maybe 100 element each. Then when search, just search the position list first and then the real list. This list also needs to be maintained, it would be reasonable to recount the position list for certain times of modification, maybe 100.

怪我鬧 2024-12-12 09:04:38

如果您需要的只是“min”值,那么只需使用法线贴图并在修改时跟踪“min”值。

编辑:

所以,如果您确实需要值排序并且想要使用开箱即用的解决方案,那么您基本上需要 2 个集合。一张法线图(例如HashMap)和一张SortedSet(例如TreeSet>)。您可以通过 TreeSet 遍历有序元素,并使用 HashMap 按键查找频率。

显然,你总是可以自己编写一些类似于 LinkedHashMap 的东西,其中元素可以通过键定位并可以按顺序遍历,但这几乎将是完全自定义的代码(我怀疑任何特定的东西已经存在,但我可以错误的)。

if all you need is the "min" value, then just use a normal map and keep track of the "min" value anytime it is modified.

EDIT:

so, if you really need value ordering and you want to use out-of-the-box solutions, you basically need 2 collections. One normal map (e.g. HashMap), and one SortedSet (e.g. TreeSet>). you can traverse ordered elements via the TreeSet, and find frequencies by key using the HashMap.

obviously, you could always code up something yourself sort of like a LinkedHashMap, where the elements are locatable by key and traversable by order, but that's pretty much going to be entirely custom code (i doubt anything that specific already exists, but i could be wrong).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文