搜索哈希图
您好,我正在使用dictionary.txt 文件填充哈希图,并将哈希图拆分为字长集。
我在哈希映射中搜索“a*d**k”模式时遇到问题;
谁能帮助我吗?
我需要知道如何搜索 Hashmap?
如果您能帮助我,我将非常感激。 谢谢。
Hi I am populating a Hashmap with a dictionary.txt file and I am splitting the hashmap into sets of word lengths.
Im having trouble searching the Hashmap for a pattern of "a*d**k";
Can anyone help me?
I need to know how to search a Hashmap?
I would really appreciate if you could help me.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
HashMap
对于模式搜索来说根本就是错误的数据结构。您应该研究具有开箱即用模式搜索功能的技术,例如 Lucene
并回答此评论:
HashMap
确实非常快,但前提是您按预期使用它们。在您的场景中,哈希码并不重要,因为您知道所有键都是数字,并且您可能不会有任何长度超过 30 个字母的单词。那么为什么不直接使用集合的 Array 或 ArrayList 而不是 HashMap 并将
map.get(string.length())
替换为list.get(string.length()-1)
或数组[string.length()-1]
。我敢打赌,性能会比 HashMap 更好(但我们无法区分差异,除非你有一台非常旧的机器或无数的条目)。我并不是说我的列表或数组设计更好,但您使用数据结构的目的并非如此。
说真的:将所有单词写入一个平面文件(每行一个单词,按单词长度排序,然后按字母顺序排序)并在该文件上运行正则表达式查询怎么样?如果文件太大,则流式传输文件并搜索各个行,或者如果 IO 太慢,则将其作为字符串读取并将其保留在内存中。
或者仅使用
TreeSet
和自定义Comparator
怎么样?示例代码:
A
HashMap
is simply the wrong data structure for a pattern search.You should look into technologies that feature pattern searching out of the box, like Lucene
And in answer to this comment:
HashMaps
are awfully fast, that's true, but only if you use them as intended. In your scenario, hash codes are not important, as you know that all keys are numeric and you probably won't have any word that's longer than, say, 30 letters.So why not just use an Array or ArrayList of Sets instead of a HashMap and replace
map.get(string.length())
withlist.get(string.length()-1)
orarray[string.length()-1]
. I bet the performance will be better than with a HashMap (but we won't be able to tell the difference unless you have a reaaaallly old machine or gazillions of entries).I'm not saying my design with a List or Array is nicer, but you are using a data structure for a purpose it wasn't intended for.
Seriously: How about writing all your words to a flat file (one word per line, sorted by word length and then by alphabetically) and just running the regex query on that file? Stream the file and search the individual lines if it's too large, or read it as a String and keep that in memory if IO is too slow.
Or how about just using a
TreeSet
with a customComparator
?Sample code: