如何确定字典中存储单词的hashcode值?
我正在准备面试并遇到了这个问题:
考虑到我有 1000,000 个单词,我想创建一本字典。我可以使用的数据结构是 Map 或 B+ trees 。 但是我应该根据什么标准编写 hashcode(),以便检索可以很快。
欢迎大家意见...
I am preparing for my interview and came across this question:
Consider that i have 1000,000 words and i want to create a dictionary . The data structure i can use is Map or B+ trees .
But on what criteria should i write my hashcode(), so that the retrieval can be fast.
would welcome everybody views...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不会使用任何一个,而是将字典存储为 Patricia trie 。
它还使用更少的内存,因为您没有单独存储所有字符串的所有公共前缀。
I would use neither and store the dictionary as a Patricia trie instead.
It also uses less memory since you're not storing all the common prefixes of all strings separately.
在“过去”(1980 年代),我们倾向于使用 B*(或 B*+)树,并且对访问磁盘非常挑剔,但现在 1,000,000 个键根本无法容纳在内存中,因此将其放入字典中并保存完成了。
告诉你的面试官:与开发人员的成本相比,内存几乎是免费的。你花在这方面的时间试图变得聪明,无论你想出什么办法,都永远无法提高效率。如果他们不明白为什么这是真的,那么......呃。
In the "old days" (1980's) we tended to use B* (or B*+) trees and were very picky about hitting the disk, but nowadays 1,000,000 keys is nothing to fit in memory, so stick it in a dict and be done with it.
And tell this to your interviewer: memory is close to free compared to the cost of developers. The amount of time you spend trying to be clever on this will never be recovered in efficiency by anything you can come up. If they don't understand why that's true, then ... eh.