查找 Java TreeMap 中的元素位置

发布于 2024-12-21 06:38:22 字数 1070 浏览 5 评论 0原文

我正在使用字符串的 TreeMap TreeMap,并使用它来实现单词词典。

然后,我有一个文件集合,并希望在字典定义的向量空间(单词空间)中创建每个文件的表示。

每个文件都应有一个表示它的向量,具有以下属性:

  • 应与字典具有相同的大小
  • 对于文件中包含的每个单词,向量 向量应在相应的位置具有 1对于文件中不包含的每个单词在字典中的单词位置
  • ,向量应该在与字典中单词位置相对应的位置有一个-1

所以我的想法是使用Vector来实现这些向量。 (这种表示集合中文档的方式称为布尔模型 - http:// www.site.uottawa.ca/~diana/csi4107/L3.pdf

我在创建此向量的过程中面临的问题是我需要一种方法来查找单词在字典中的位置,某些东西喜欢这个:

String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...

1)有没有类似的方法我可以在TreeMap上使用?如果没有,你能提供一些代码来帮助我自己实现它吗?

2)TreeMap 上是否有一个迭代器(按键的字母顺序排列)我可以获取其位置?

3)最终我应该使用另一个类来实现字典吗?(如果你认为使用TreeMaps我不能做我需要的事情)如果是,哪个?

提前致谢。

添加部分:

dasblinkenlight提出的解决方案看起来不错,但存在复杂性问题(由于将键复制到数组中而与字典的维度成线性),并且为每个文件执行此操作的想法是不可接受的。

对于我的问题还有其他想法吗?

I am working with a TreeMap of Strings TreeMap<String, String>, and using it to implement a Dictionay of words.

I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.

Each file should have a vector representing it with following properties:

  • vector should have same size as dictionary
  • for each word contained in the file the vector should have a 1 in the position corresponding to the word position in dictionary
  • for each word not contained in the file the vector should have a -1 in the position corresponding to the word position in dictionary

So my idea is to use a Vector<Boolean> to implement these vectors. (This way of representing documents in a collection is called Boolean Model - http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)

The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:

String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...

1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?

2) Is there an iterator on TreeMap (it's alphabetically ordered on keys) of which I can get position?

3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can't do what I need) If yes, which?

Thanks in advance.

ADDED PART:

Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.

Any other ideas for my questions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

£冰雨忧蓝° 2024-12-28 06:38:22

构建树图后,将其排序键复制到数组中,然后使用 Arrays.binarySearch 在 O(logN) 中查找索引时间。如果您需要该值,也可以在原始地图上查找。

编辑:这是将键复制到数组中的方法

String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
    mapKeys[pos++] = key;
}

Once you have constructed your tree map, copy its sorted keys into an array, and use Arrays.binarySearch to look up the index in O(logN) time. If you need the value, do a lookup on the original map too.

Edit: this is how you copy keys into an array

String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
    mapKeys[pos++] = key;
}
诺曦 2024-12-28 06:38:22

另一种解决方案是使用 TreeMap 的 headMap 方法。如果该单词存在于 TreeMap 中,则 size()等于该词在字典中的索引。与我的其他答案相比,这可能有点浪费。

以下是用 Java 进行编码的方法:

import java.util.*;

class Test {
    public static void main(String[] args) {
        TreeMap<String,String> tm = new TreeMap<String,String>();
        tm.put("quick", "one");
        tm.put("brown", "two");
        tm.put("fox", "three");
        tm.put("jumps", "four");
        tm.put("over", "five");
        tm.put("the", "six");
        tm.put("lazy", "seven");
        tm.put("dog", "eight");
        for (String s : new String[] {
            "quick", "brown", "fox", "jumps", "over",
            "the", "lazy", "dog", "before", "way_after"}
        ) {
            if (tm.containsKey(s)) {
                // Here is the operation you are looking for.
                // It does not work for items not in the dictionary.
                int pos = tm.headMap(s).size();
                System.out.println("Key '"+s+"' is at the position "+pos);
            } else {
                System.out.println("Key '"+s+"' is not found");
            }
        }
    }
}

以下是程序产生的输出:

Key 'quick' is at the position 6
Key 'brown' is at the position 0
Key 'fox' is at the position 2
Key 'jumps' is at the position 3
Key 'over' is at the position 5
Key 'the' is at the position 7
Key 'lazy' is at the position 4
Key 'dog' is at the position 1
Key 'before' is not found
Key 'way_after' is not found

An alternative solution would be to use TreeMap's headMap method. If the word exists in the TreeMap, then the size() of its head map is equal to the index of the word in the dictionary. It may be a bit wasteful compared to my other answer, through.

Here is how you code it in Java:

import java.util.*;

class Test {
    public static void main(String[] args) {
        TreeMap<String,String> tm = new TreeMap<String,String>();
        tm.put("quick", "one");
        tm.put("brown", "two");
        tm.put("fox", "three");
        tm.put("jumps", "four");
        tm.put("over", "five");
        tm.put("the", "six");
        tm.put("lazy", "seven");
        tm.put("dog", "eight");
        for (String s : new String[] {
            "quick", "brown", "fox", "jumps", "over",
            "the", "lazy", "dog", "before", "way_after"}
        ) {
            if (tm.containsKey(s)) {
                // Here is the operation you are looking for.
                // It does not work for items not in the dictionary.
                int pos = tm.headMap(s).size();
                System.out.println("Key '"+s+"' is at the position "+pos);
            } else {
                System.out.println("Key '"+s+"' is not found");
            }
        }
    }
}

Here is the output produced by the program:

Key 'quick' is at the position 6
Key 'brown' is at the position 0
Key 'fox' is at the position 2
Key 'jumps' is at the position 3
Key 'over' is at the position 5
Key 'the' is at the position 7
Key 'lazy' is at the position 4
Key 'dog' is at the position 1
Key 'before' is not found
Key 'way_after' is not found
树深时见影 2024-12-28 06:38:22

https://github.com/geniot/indexed-tree-map

我有同样的问题。于是我就拿了java.util.TreeMap的源码,写了IndexedTreeMap。它实现了我自己的IndexedNavigableMap

public interface IndexedNavigableMap<K, V> extends NavigableMap<K, V> {
   K exactKey(int index);
   Entry<K, V> exactEntry(int index);
   int keyIndex(K k);
}

该实现基于红黑树中节点权重发生变化时的更新。权重是给定节点下的子节点数量加一 - self。例如,当一棵树向左旋转时:

    private void rotateLeft(Entry<K, V> p) {
    if (p != null) {
        Entry<K, V> r = p.right;

        int delta = getWeight(r.left) - getWeight(p.right);
        p.right = r.left;
        p.updateWeight(delta);

        if (r.left != null) {
            r.left.parent = p;
        }

        r.parent = p.parent;


        if (p.parent == null) {
            root = r;
        } else if (p.parent.left == p) {
            delta = getWeight(r) - getWeight(p.parent.left);
            p.parent.left = r;
            p.parent.updateWeight(delta);
        } else {
            delta = getWeight(r) - getWeight(p.parent.right);
            p.parent.right = r;
            p.parent.updateWeight(delta);
        }

        delta = getWeight(p) - getWeight(r.left);
        r.left = p;
        r.updateWeight(delta);

        p.parent = r;
    }
  }

updateWeight 只是将权重更新到根:

   void updateWeight(int delta) {
        weight += delta;
        Entry<K, V> p = parent;
        while (p != null) {
            p.weight += delta;
            p = p.parent;
        }
    }

当我们需要通过索引查找元素时,这里是使用权重的实现:

public K exactKey(int index) {
    if (index < 0 || index > size() - 1) {
        throw new ArrayIndexOutOfBoundsException();
    }
    return getExactKey(root, index);
}

private K getExactKey(Entry<K, V> e, int index) {
    if (e.left == null && index == 0) {
        return e.key;
    }
    if (e.left == null && e.right == null) {
        return e.key;
    }
    if (e.left != null && e.left.weight > index) {
        return getExactKey(e.left, index);
    }
    if (e.left != null && e.left.weight == index) {
        return e.key;
    }
    return getExactKey(e.right, index - (e.left == null ? 0 : e.left.weight) - 1);
}

查找键的索引也非常方便:

    public int keyIndex(K key) {
    if (key == null) {
        throw new NullPointerException();
    }
    Entry<K, V> e = getEntry(key);
    if (e == null) {
        throw new NullPointerException();
    }
    if (e == root) {
        return getWeight(e) - getWeight(e.right) - 1;//index to return
    }
    int index = 0;
    int cmp;
    if (e.left != null) {
        index += getWeight(e.left);
    }
    Entry<K, V> p = e.parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        while (p != null) {
            cmp = cpr.compare(key, p.key);
            if (cmp > 0) {
                index += getWeight(p.left) + 1;
            }
            p = p.parent;
        }
    } else {
        Comparable<? super K> k = (Comparable<? super K>) key;
        while (p != null) {
            if (k.compareTo(p.key) > 0) {
                index += getWeight(p.left) + 1;
            }
            p = p.parent;
        }
    }
    return index;
}

您可以在 https://github.com/geniot/indexed-tree 找到这项工作的结果-地图

https://github.com/geniot/indexed-tree-map

I had the same problem. So I took the source code of java.util.TreeMap and wrote IndexedTreeMap. It implements my own IndexedNavigableMap:

public interface IndexedNavigableMap<K, V> extends NavigableMap<K, V> {
   K exactKey(int index);
   Entry<K, V> exactEntry(int index);
   int keyIndex(K k);
}

The implementation is based on updating node weights in the red-black tree when it is changed. Weight is the number of child nodes beneath a given node, plus one - self. For example when a tree is rotated to the left:

    private void rotateLeft(Entry<K, V> p) {
    if (p != null) {
        Entry<K, V> r = p.right;

        int delta = getWeight(r.left) - getWeight(p.right);
        p.right = r.left;
        p.updateWeight(delta);

        if (r.left != null) {
            r.left.parent = p;
        }

        r.parent = p.parent;


        if (p.parent == null) {
            root = r;
        } else if (p.parent.left == p) {
            delta = getWeight(r) - getWeight(p.parent.left);
            p.parent.left = r;
            p.parent.updateWeight(delta);
        } else {
            delta = getWeight(r) - getWeight(p.parent.right);
            p.parent.right = r;
            p.parent.updateWeight(delta);
        }

        delta = getWeight(p) - getWeight(r.left);
        r.left = p;
        r.updateWeight(delta);

        p.parent = r;
    }
  }

updateWeight simply updates weights up to the root:

   void updateWeight(int delta) {
        weight += delta;
        Entry<K, V> p = parent;
        while (p != null) {
            p.weight += delta;
            p = p.parent;
        }
    }

And when we need to find the element by index here is the implementation that uses weights:

public K exactKey(int index) {
    if (index < 0 || index > size() - 1) {
        throw new ArrayIndexOutOfBoundsException();
    }
    return getExactKey(root, index);
}

private K getExactKey(Entry<K, V> e, int index) {
    if (e.left == null && index == 0) {
        return e.key;
    }
    if (e.left == null && e.right == null) {
        return e.key;
    }
    if (e.left != null && e.left.weight > index) {
        return getExactKey(e.left, index);
    }
    if (e.left != null && e.left.weight == index) {
        return e.key;
    }
    return getExactKey(e.right, index - (e.left == null ? 0 : e.left.weight) - 1);
}

Also comes in very handy finding the index of a key:

    public int keyIndex(K key) {
    if (key == null) {
        throw new NullPointerException();
    }
    Entry<K, V> e = getEntry(key);
    if (e == null) {
        throw new NullPointerException();
    }
    if (e == root) {
        return getWeight(e) - getWeight(e.right) - 1;//index to return
    }
    int index = 0;
    int cmp;
    if (e.left != null) {
        index += getWeight(e.left);
    }
    Entry<K, V> p = e.parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        while (p != null) {
            cmp = cpr.compare(key, p.key);
            if (cmp > 0) {
                index += getWeight(p.left) + 1;
            }
            p = p.parent;
        }
    } else {
        Comparable<? super K> k = (Comparable<? super K>) key;
        while (p != null) {
            if (k.compareTo(p.key) > 0) {
                index += getWeight(p.left) + 1;
            }
            p = p.parent;
        }
    }
    return index;
}

You can find the result of this work at https://github.com/geniot/indexed-tree-map

眼眸印温柔 2024-12-28 06:38:22

JDK 本身没有这样的实现。尽管 TreeMap 以自然键排序进行迭代,但其内部数据结构均基于树而不是数组(请记住,根据定义,Map 不会对键进行排序,尽管如此)非常常见的用例)。

也就是说,您必须做出选择,因为插入 MapindexOf(key) 的比较标准不可能有 O(1) 计算时间代码>计算。这是因为字典顺序在可变数据结构中不稳定(例如,与插入顺序相反)。举个例子:一旦将第一个键值对(条目)插入到映射中,它的位置将始终为 1。但是,根据插入的第二个键,该位置可能会发生变化,因为新键可能比 Map 中的键“更大”或“更低”。您当然可以通过在插入操作期间维护和更新键的索引列表来实现这一点,但是随后您的插入操作将有 O(n log(n)) (因为需要重新排序数组)。这可能是可取的,也可能不是,具体取决于您的数据访问模式。

Apache Commons 中的 ListOrderedMapLinkedMap 都接近您的需求,但依赖于插入顺序。我相信,您可以检查他们的实现并开发自己的解决方案来解决问题,只需付出很少到中等的努力(这应该只是用排序列表替换 ListOrderedMap 的内部支持数组 - <例如,Apache Commons 中的 code>TreeList)。

您还可以通过减去低于给定键的元素数量来自己计算索引(在最常见的情况下,这应该比迭代列表搜索元素更快 - 因为您没有比较任何内容) 。

There's no such implementation in the JDK itself. Although TreeMap iterates in natural key ordering, its internal data structures are all based on trees and not arrays (remember that Maps do not order keys, by definition, in spite of that the very common use case).

That said, you have to make a choice as it is not possible to have O(1) computation time for your comparison criteria both for insertion into the Map and the indexOf(key) calculation. This is due to the fact that lexicographical order is not stable in a mutable data structure (as opposed to insertion order, for instance). An example: once you insert the first key-value pair (entry) into the map, its position will always be one. However, depending on the second key inserted, that position might change as the new key may be "greater" or "lower" than the one in the Map. You can surely implement this by maintaining and updating an indexed list of keys during the insertion operation, but then you'll have O(n log(n)) for your insert operations (as will need to re-order an array). That might be desirable or not, depending on your data access patterns.

ListOrderedMap and LinkedMap in Apache Commons both come close to what you need but rely on insertion order. You can check out their implementation and develop your own solution to the problem with little to moderate effort, I believe (that should be just a matter of replacing the ListOrderedMaps internal backing array with a sorted list - TreeList in Apache Commons, for instance).

You can also calculate the index yourself, by subtracting the number of elements that are lower than then given key (which should be faster than iterating through the list searching for your element, in the most frequent case - as you're not comparing anything).

坦然微笑 2024-12-28 06:38:22

我同意伊索尔维拉的观点。也许最好的方法是使用与 TreeMap 不同的结构。

但是,如果您仍然想计算键的索引,解决方案是计算有多少个键低于您要查找的键。

这是一个代码片段:

    java.util.SortedMap<String, String> treeMap = new java.util.TreeMap<String, String>();
    treeMap.put("d", "content 4");
    treeMap.put("b", "content 2");
    treeMap.put("c", "content 3");
    treeMap.put("a", "content 1");

    String key = "d"; // key to get the index for
    System.out.println( treeMap.keySet() );

    final String firstKey = treeMap.firstKey(); // assuming treeMap structure doesn't change in the mean time
    System.out.format( "Index of %s is %d %n", key, treeMap.subMap(firstKey, key).size() );

I agree with Isolvieira. Perhaps the best approach would be to use a different structure than TreeMap.

However, if you still want to go with computing the index of the keys, a solution would be to count how many keys are lower than the key you are looking for.

Here is a code snippet:

    java.util.SortedMap<String, String> treeMap = new java.util.TreeMap<String, String>();
    treeMap.put("d", "content 4");
    treeMap.put("b", "content 2");
    treeMap.put("c", "content 3");
    treeMap.put("a", "content 1");

    String key = "d"; // key to get the index for
    System.out.println( treeMap.keySet() );

    final String firstKey = treeMap.firstKey(); // assuming treeMap structure doesn't change in the mean time
    System.out.format( "Index of %s is %d %n", key, treeMap.subMap(firstKey, key).size() );
一念一轮回 2024-12-28 06:38:22

我要感谢你们所有人为回答我的问题所付出的努力,他们都非常有用,并且充分利用他们每个人的优点,使我找到了我在项目中实际实施的解决方案。


我认为对我的单个问题的最佳答案是:

2)TreeMaps 上没有定义迭代器,如 @Isoliveira 所说:

There's no such implementation in the JDK itself. 
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition, 
in spite of that the very common use case).

正如我在这个 SO 答案中发现的那样 如何迭代 TreeMap?,迭代 Map 中的元素的唯一方法是使用map.entrySet() 并使用 Set 上定义的迭代器(或其他具有迭代器的类)。


3) 可以使用 TreeMap 来实现字典,但这将保证查找所包含单词的索引时的复杂度为 O(logN)(在树数据结构中查找的成本)。

使用具有相同过程的 HashMap 将具有 O(1) 的复杂性。


1)不存在这样的方法。唯一的解决办法就是彻底实施它。

正如 @Paul 所说,

Assumes that once getPosition() has been called, the dictionary is not changed.

解决方案的假设是,一旦创建了字典,之后就不会更改:这样单词的位置将始终相同。

给出这个假设,我找到了一个解决方案,允许构建复杂度为 O(N) 的字典,并保证在查找中获得包含 constat 时间 O(1) 的单词索引的可能性。

我将 Dictionary 定义为 HashMap,如下所示:

public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();
  • key -->表示 Dictionary 值中包含的单词的 String
  • -->创建的类 WordStructObject

,其中 WordStruct 类的定义如下:

public class WordStruct {

    private int DictionaryPosition;    // defines the position of word in dictionary once it is alphabetically ordered

    public WordStruct(){

    }

    public SetWordPosition(int pos){
        this.DictionaryPosition = pos;
    }

}

并允许我保留我喜欢的任何类型的属性加上词典的词条。

现在,我填充字典,迭代我集合的所有文件中包含的所有单词:

THE FOLLOWING IS PSEUDOCODE

for(int i = 0; i < number_of_files ; i++){

        get_file(i);

        while (file_contais_words){

            dictionary.put( word(j) , new LemmaStruct());

        }

}   

一旦以任何顺序填充 HashMap,我就使用 @dasblinkenlight 指示的过程来一次性排序它,复杂度为 O(N)

    Object[] dictionaryArray = dictionary.keySet().toArray();
    Arrays.sort(dictionaryArray);

    for(int i = 0; i < dictionaryArray.length; i++){

        String word = (String) dictionaryArray[i];
        dictionary.get(word).SetWordPosition(i);

    }

并且从现在开始就有索引字典中单词字母顺序的位置唯一需要做的就是访问它的变量DictionaryPosition

因为单词知道你只需要访问它,并且这在哈希映射


再次感谢,祝大家圣诞快乐!

I'd like to thank all of you for the effort you put in answering my question, they all were very useful and taking the best from each of them made me come up to the solution I actually implemented in my project.


What I beleive to be best answers to my single questions are:

2) There is not an Iterator defined on TreeMaps as @Isoliveira sais:

There's no such implementation in the JDK itself. 
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition, 
in spite of that the very common use case).

and as I found in this SO answer How to iterate over a TreeMap?, the only way to iterate on elements in a Map is to use map.entrySet() and use Iterators defined on Set (or some other class with Iterators).


3) It's possible to use a TreeMap to implement Dictionary, but this will garantuee a complexity of O(logN) in finding index of a contained word (cost of a lookup in a Tree Data Structure).

Using a HashMap with same procedure will instead have complexity O(1).


1) There exists no such method. Only solution is to implement it entirely.

As @Paul stated

Assumes that once getPosition() has been called, the dictionary is not changed.

assumption of solution is that once that Dictionary is created it will not be changed afterwards: in this way position of a word will always be the same.

Giving this assumption I found a solution that allows to build Dictionary with complexity O(N) and after garantuees the possibility to get index of a word contained with constat time O(1) in lookup.

I defined Dictionary as a HashMap like this:

public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();
  • key --> the String representing the word contained in Dictionary
  • value --> an Object of a created class WordStruct

where WordStruct class is defined like this:

public class WordStruct {

    private int DictionaryPosition;    // defines the position of word in dictionary once it is alphabetically ordered

    public WordStruct(){

    }

    public SetWordPosition(int pos){
        this.DictionaryPosition = pos;
    }

}

and allows me to keep memory of any kind of attribute I like to couple with the word entry of the Dictionary.

Now I fill dictionary iterating over all words contained in all files of my collection:

THE FOLLOWING IS PSEUDOCODE

for(int i = 0; i < number_of_files ; i++){

        get_file(i);

        while (file_contais_words){

            dictionary.put( word(j) , new LemmaStruct());

        }

}   

Once HashMap is filled in whatever order I use procedure indicated by @dasblinkenlight to order it once and for all with complexity O(N)

    Object[] dictionaryArray = dictionary.keySet().toArray();
    Arrays.sort(dictionaryArray);

    for(int i = 0; i < dictionaryArray.length; i++){

        String word = (String) dictionaryArray[i];
        dictionary.get(word).SetWordPosition(i);

    }

And from now on to have index position in alphatebetic order of word in dictionary only thing needed is to acces it's variable DictionaryPosition:

since word is know you just need to access it and this has constant cost in a HashMap.


Thanks again and Iwish you all a Merry Christmas!!

用心笑 2024-12-28 06:38:22

您是否想过让 TreeMap 中的值包含字典中的位置?我在这里使用 BitSet 来获取我的文件详细信息。

这与我下面的其他想法不太一样。

Map<String,Integer> dictionary = new TreeMap<String,Integer> ();

private void test () {
  // Construct my dictionary.
  buildDictionary();
  // Make my file data.
  String [] file1 = new String[] {
    "1", "3", "5"
  };
  BitSet fileDetails = getFileDetails(file1, dictionary);
  printFileDetails("File1", fileDetails);
}

private void printFileDetails(String fileName, BitSet details) {
  System.out.println("File: "+fileName);
  for ( int i = 0; i < details.length(); i++ ) {
    System.out.print ( details.get(i) ? 1: -1 );
    if ( i < details.length() - 1 ) {
      System.out.print ( "," );
    }
  }
}

private BitSet getFileDetails(String [] file, Map<String, Integer> dictionary ) {
  BitSet details = new BitSet();
  for ( String word : file ) {
    // The value in the dictionary is the index of the word in the dictionary.
    details.set(dictionary.get(word));
  }
  return details;
}

String [] dictionaryWords = new String[] {
  "1", "2", "3", "4", "5"
};

private void buildDictionary () {
  for ( String word : dictionaryWords ) {
    // Initially make the value 0. We will change that later.
    dictionary.put(word, 0);
  }
  // Make the indexes.
  int wordNum = 0;
  for ( String word : dictionary.keySet() ) {
    dictionary.put(word, wordNum++);
  }
}

这里,文件详细信息的构建包括在 TreeMap 中对文件中的每个单词进行一次查找。

如果您打算将字典 TreeMap 中的value 用于其他用途,您始终可以将其与 Integer 组合。

添加

进一步考虑一下,如果 Mapvalue 字段被指定用于某些内容,那么您始终可以使用特殊键来计算它们自己的位置Map 并像 String 一样进行比较。

private void test () {
  // Dictionary
  Map<PosKey, String> dictionary = new TreeMap<PosKey, String> ();
  // Fill it with words.
  String[] dictWords = new String[] {
                       "0", "1", "2", "3", "4", "5"};
  for ( String word : dictWords ) {
    dictionary.put( new PosKey( dictionary, word ), word );
  }
  // File
  String[] fileWords = new String[] {
                       "0", "2", "3", "5"};
  int[] file = new int[dictionary.size()];
  // Initially all -1.
  for ( int i = 0; i < file.length; i++ ) {
    file[i] = -1;
  }
  // Temp file words set.
  Set fileSet = new HashSet( Arrays.asList( fileWords ) );
  for ( PosKey key : dictionary.keySet() ) {
    if ( fileSet.contains( key.getKey() ) ) {
      file[key.getPosiion()] = 1;
    }
  }

  // Print out.
  System.out.println( Arrays.toString( file ) );
  // Prints: [1, -1, 1, 1, -1, 1]

}

class PosKey
    implements Comparable {
  final String key;
  // Initially -1
  int position = -1;
  // The map I am keying on.
  Map<PosKey, ?> map;

  public PosKey ( Map<PosKey, ?> map, String word ) {
    this.key = word;
    this.map = map;
  }

  public int getPosiion () {
    if ( position == -1 ) {
      // First access to the key.
      int pos = 0;
      // Calculate all positions in one loop.
      for ( PosKey k : map.keySet() ) {
        k.position = pos++;
      }
    }
    return position;
  }

  public String getKey () {
    return key;
  }

  public int compareTo ( Object it ) {
    return key.compareTo( ( ( PosKey )it ).key );
  }

  public int hashCode () {
    return key.hashCode();
  }
}

注意:假设一旦调用了getPosition(),字典就不会改变。

Have you thought to make the values in your TreeMap contain the position in your dictionary? I am using a BitSet here for my file details.

This doesn't work nearly as well as my other idea below.

Map<String,Integer> dictionary = new TreeMap<String,Integer> ();

private void test () {
  // Construct my dictionary.
  buildDictionary();
  // Make my file data.
  String [] file1 = new String[] {
    "1", "3", "5"
  };
  BitSet fileDetails = getFileDetails(file1, dictionary);
  printFileDetails("File1", fileDetails);
}

private void printFileDetails(String fileName, BitSet details) {
  System.out.println("File: "+fileName);
  for ( int i = 0; i < details.length(); i++ ) {
    System.out.print ( details.get(i) ? 1: -1 );
    if ( i < details.length() - 1 ) {
      System.out.print ( "," );
    }
  }
}

private BitSet getFileDetails(String [] file, Map<String, Integer> dictionary ) {
  BitSet details = new BitSet();
  for ( String word : file ) {
    // The value in the dictionary is the index of the word in the dictionary.
    details.set(dictionary.get(word));
  }
  return details;
}

String [] dictionaryWords = new String[] {
  "1", "2", "3", "4", "5"
};

private void buildDictionary () {
  for ( String word : dictionaryWords ) {
    // Initially make the value 0. We will change that later.
    dictionary.put(word, 0);
  }
  // Make the indexes.
  int wordNum = 0;
  for ( String word : dictionary.keySet() ) {
    dictionary.put(word, wordNum++);
  }
}

Here the building of the file details consists of a single lookup in the TreeMap for each word in the file.

If you were planning to use the value in the dictionary TreeMap for something else you could always compose it with an Integer.

Added

Thinking about it further, if the value field of the Map is earmarked for something you could always use special keys that calculate their own position in the Map and act just like Strings for comparison.

private void test () {
  // Dictionary
  Map<PosKey, String> dictionary = new TreeMap<PosKey, String> ();
  // Fill it with words.
  String[] dictWords = new String[] {
                       "0", "1", "2", "3", "4", "5"};
  for ( String word : dictWords ) {
    dictionary.put( new PosKey( dictionary, word ), word );
  }
  // File
  String[] fileWords = new String[] {
                       "0", "2", "3", "5"};
  int[] file = new int[dictionary.size()];
  // Initially all -1.
  for ( int i = 0; i < file.length; i++ ) {
    file[i] = -1;
  }
  // Temp file words set.
  Set fileSet = new HashSet( Arrays.asList( fileWords ) );
  for ( PosKey key : dictionary.keySet() ) {
    if ( fileSet.contains( key.getKey() ) ) {
      file[key.getPosiion()] = 1;
    }
  }

  // Print out.
  System.out.println( Arrays.toString( file ) );
  // Prints: [1, -1, 1, 1, -1, 1]

}

class PosKey
    implements Comparable {
  final String key;
  // Initially -1
  int position = -1;
  // The map I am keying on.
  Map<PosKey, ?> map;

  public PosKey ( Map<PosKey, ?> map, String word ) {
    this.key = word;
    this.map = map;
  }

  public int getPosiion () {
    if ( position == -1 ) {
      // First access to the key.
      int pos = 0;
      // Calculate all positions in one loop.
      for ( PosKey k : map.keySet() ) {
        k.position = pos++;
      }
    }
    return position;
  }

  public String getKey () {
    return key;
  }

  public int compareTo ( Object it ) {
    return key.compareTo( ( ( PosKey )it ).key );
  }

  public int hashCode () {
    return key.hashCode();
  }
}

NB: Assumes that once getPosition() has been called, the dictionary is not changed.

享受孤独 2024-12-28 06:38:22

我建议你编写一个 SkipList 来存储你的字典,因为这仍然会提供 O(log N) 查找、插入和删除,同时还能够提供索引(树实现通常不能返回索引,因为节点不返回索引)不知道,并且保持更新会产生成本)。不幸的是,ConcurrentSkipListMap 的 java 实现不提供索引,因此您需要实现自己的版本。

获取项目的索引将是 O(log N),如果您想要索引和值而不进行 2 次查找,那么您需要返回一个包含两者的包装对象。

I would suggest that you write a SkipList to store your dictionary, since this will still offer O(log N) lookups, insertion and removal while also being able to provide an index (tree implementations can generally not return an index since the nodes don't know it, and there would be a cost to keeping them updated). Unfortunately the java implementation of ConcurrentSkipListMap does not provide an index, so you would need to implement your own version.

Getting the index of an item would be O(log N), if you wanted both the index and value without doing 2 lookups then you would need to return a wrapper object holding both.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文