使用二分搜索和 Trie 的复杂性
给定文件中按字母顺序排序的大量单词,我需要编写一个程序,给定单词 x,确定 x 是否在列表中。预处理是可以的,因为我将通过不同的输入多次调用此函数。
优先事项: 1.速度。 2.
我已经知道我可以使用的内存(n是单词数,m是单词的平均长度) 1. 一个 trie,时间为 O(log(n)),空间(最好情况)为 O(log(nm)),空间(最坏情况)为 O(nm)。< br> 2.将完整列表加载到内存中,然后二分查找,时间为O(log(n)),空间为O(n*m)
我不确定tri的复杂度,如果错误请纠正我。还有其他好的方法吗?
given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs.
priorties: 1. speed. 2. memory
I already know I can use (n is number of words, m is average length of the words)
1. a trie, time is O(log(n)), space(best case) is O(log(nm)), space(worst case) is O(nm).
2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)
I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
特里树的时间为 O(m),二分查找的时间为 O(mlog(n))。对于任何合理的方法,该空间都是渐近 O(nm) 的,在某些情况下您可以使用压缩来减少该空间。从理论上讲,trie 结构在内存方面要好一些,但实际上它在实现细节中隐藏着魔鬼:存储指针所需的内存以及潜在的错误缓存访问。
还有其他用于实现集合结构的选项 - 在大多数语言中,哈希集和树集都是简单的选择。我会选择哈希集,因为它高效且简单。
It is O(m) time for the trie, and up to O(mlog(n)) for the binary search. The space is asymptotically O(nm) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.
There are other options for implementing a set structure - hashset and treeset are easy choices in most languages. I'd go for the hash set as it is efficient and simple.
我认为 HashMap 非常适合您的情况,因为 put 和 get 操作的时间复杂度都是 O(1)。即使您没有排序列表,它也能正常工作。!!!
I think HashMap is perfectly fine for your case, since the time complexity for both put and get operations is O(1). It works perfectly fine even if you dont have a sorted list.!!!
作为一个深思熟虑的问题,您是否考虑从输入数据创建一个集合,然后使用特定的哈希进行搜索?第一次构建集合需要更多时间,但如果输入数量有限并且您可以返回它们,那么集合可能是一个好主意,使用 O(1) 进行“包含”操作以获得良好的哈希函数。
As a food for thought, do you consider creating a set from the input data and then searching using particular hash? It will take more time process for the first time to build a set but if number of inputs is limited and you may return to them then set might be good idea with O(1) for "contains" operation for a good hash function.
我推荐一个哈希图。您可以在 VC 和 GCC 中找到用于此目的的 C++ 扩展。
I'd recommend a hashmap. You can find an extension to C++ for this in both VC and GCC.
使用布隆过滤器。即使对于非常大的数据,它也能节省空间,并且是一种快速拒绝技术。
Use a bloom filter. It is space efficient even for very large data and it is a fast rejection technique.