当前位置：文江博客话题详情

搜索哈希图

发布于 2024-10-30 05:33:27 字数 164 浏览 0 评论 0原文

您好，我正在使用dictionary.txt 文件填充哈希图，并将哈希图拆分为字长集。

我在哈希映射中搜索“a*d**k”模式时遇到问题；

谁能帮助我吗？

我需要知道如何搜索 Hashmap？

如果您能帮助我，我将非常感激。谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世态炎凉 2024-11-06 05:33:27

HashMap 对于模式搜索来说根本就是错误的数据结构。

您应该研究具有开箱即用模式搜索功能的技术，例如 Lucene

并回答此评论：

我在 Android 上使用它，它是
最快的搜索方式。

HashMap 确实非常快，但前提是您按预期使用它们。在您的场景中，哈希码并不重要，因为您知道所有键都是数字，并且您可能不会有任何长度超过 30 个字母的单词。

那么为什么不直接使用集合的 Array 或 ArrayList 而不是 HashMap 并将 map.get(string.length()) 替换为 list.get(string.length()-1) 或数组[string.length()-1]。我敢打赌，性能会比 HashMap 更好（但我们无法区分差异，除非你有一台非常旧的机器或无数的条目）。

我并不是说我的列表或数组设计更好，但您使用数据结构的目的并非如此。

说真的：将所有单词写入一个平面文件（每行一个单词，按单词长度排序，然后按字母顺序排序）并在该文件上运行正则表达式查询怎么样？如果文件太大，则流式传输文件并搜索各个行，或者如果 IO 太慢，则将其作为字符串读取并将其保留在内存中。

或者仅使用 TreeSet 和自定义 Comparator 怎么样？

示例代码：

public class PatternSearch{

    enum StringComparator implements Comparator<String>{
        LENGTH_THEN_ALPHA{

            @Override
            public int compare(final String first, final String second){

                // compare lengths
                int result =
                    Integer.valueOf(first.length()).compareTo(
                        Integer.valueOf(second.length()));
                // and if they are the same, compare contents
                if(result == 0){
                    result = first.compareTo(second);
                }

                return result;
            }
        }
    }

    private final SortedSet<String> data =
        new TreeSet<String>(StringComparator.LENGTH_THEN_ALPHA);

    public boolean addWord(final String word){
        return data.add(word.toLowerCase());
    }

    public Set<String> findByPattern(final String patternString){
        final Pattern pattern =
            Pattern.compile(patternString.toLowerCase().replace('*', '.'));
        final Set<String> results = new TreeSet<String>();
        for(final String word : data.subSet(
            // this should probably be optimized :-)
            patternString.replaceAll(".", "a"),
            patternString.replaceAll(".", "z"))){
            if(pattern.matcher(word).matches()){
                results.add(word);
            }
        }
        return results;
    }

}

A HashMap is simply the wrong data structure for a pattern search.

You should look into technologies that feature pattern searching out of the box, like Lucene

And in answer to this comment:

Im using it for Android, and its the
fastest way of searching.

HashMaps are awfully fast, that's true, but only if you use them as intended. In your scenario, hash codes are not important, as you know that all keys are numeric and you probably won't have any word that's longer than, say, 30 letters.

So why not just use an Array or ArrayList of Sets instead of a HashMap and replace map.get(string.length()) with list.get(string.length()-1) or array[string.length()-1]. I bet the performance will be better than with a HashMap (but we won't be able to tell the difference unless you have a reaaaallly old machine or gazillions of entries).

I'm not saying my design with a List or Array is nicer, but you are using a data structure for a purpose it wasn't intended for.

Seriously: How about writing all your words to a flat file (one word per line, sorted by word length and then by alphabetically) and just running the regex query on that file? Stream the file and search the individual lines if it's too large, or read it as a String and keep that in memory if IO is too slow.

Or how about just using a TreeSet with a custom Comparator?

Sample code:

public class PatternSearch{

    enum StringComparator implements Comparator<String>{
        LENGTH_THEN_ALPHA{

            @Override
            public int compare(final String first, final String second){

                // compare lengths
                int result =
                    Integer.valueOf(first.length()).compareTo(
                        Integer.valueOf(second.length()));
                // and if they are the same, compare contents
                if(result == 0){
                    result = first.compareTo(second);
                }

                return result;
            }
        }
    }

    private final SortedSet<String> data =
        new TreeSet<String>(StringComparator.LENGTH_THEN_ALPHA);

    public boolean addWord(final String word){
        return data.add(word.toLowerCase());
    }

    public Set<String> findByPattern(final String patternString){
        final Pattern pattern =
            Pattern.compile(patternString.toLowerCase().replace('*', '.'));
        final Set<String> results = new TreeSet<String>();
        for(final String word : data.subSet(
            // this should probably be optimized :-)
            patternString.replaceAll(".", "a"),
            patternString.replaceAll(".", "z"))){
            if(pattern.matcher(word).matches()){
                results.add(word);
            }
        }
        return results;
    }

}

回复收藏 0 原文

~没有更多了~