trie 使用哪个节点数据结构
我是第一次使用 trie。我想知道哪种数据结构最适合用于 trie,同时决定应该遍历的下一个分支。我正在寻找一个数组、一个哈希图和一个链表。
I am using trie for the first time.I wanted to know which is the best data structure to use for a trie while deciding which is the next branch that one is supposed to traverse. I was looking among an array,a hashmap and a linked list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这些选项中的每一个都有其优点和缺点。
如果将子节点存储在数组中,则只需在数组中建立索引即可极其高效地查找要访问的子节点。但是,每个节点的空间使用量会很高:O(|Σ|),其中 Σ 是可以构成单词的字母集,即使这些子节点中的大多数都为空。
如果将子节点存储在链表中,则查找子节点所需的时间将为 O(|Σ|),因为您可能需要扫描链表的所有节点才能找到所需的子节点。另一方面,空间效率会非常好,因为您只存储正在使用的子项。您还可以考虑在这里使用固定大小的数组,它具有更好的空间利用率,但会导致非常昂贵的插入和删除操作。
如果将子节点存储在哈希表中,那么查找子节点的(预期)时间将为 O(1),并且内存使用量将仅与您拥有的子节点数量成正比(大致)。有趣的是,因为您事先知道要散列的值,所以您可以考虑使用 动态完美哈希表来确保最坏情况下的 O(1) 查找,但代价是一些预计算。
另一种选择是将子节点存储在二叉搜索树中。这就产生了三元搜索树数据结构。这种选择介于链表和哈希表选项之间 - 空间使用率较低,您可以有效地执行前驱和后继查询,但由于 BST 中的搜索成本,执行查找的成本略有增加。如果您有一个永远不会发生插入的静态特里树,则可以考虑使用权重平衡树 作为每个点的 BST;这为搜索提供了出色的运行时间(O(n + log k),其中 n 是要搜索的字符串的长度,k 是 trie 中的单词总数)。
简而言之,数组查找速度最快,但在最坏情况下其空间使用率也是最差的。静态大小的数组具有最佳的内存使用率,但插入和删除的成本很高。哈希表具有相当快的查找速度和良好的内存使用率(平均而言)。二叉搜索树位于中间的某个位置。我建议在这里使用哈希表,但如果您重视空间并且不关心查找时间,则链接列表可能会更好。另外,如果你的字母表很小(比如,你正在制作一个二进制特里树),那么数组开销不会太糟糕,你可能想使用它。
希望这有帮助!
Each of these options has their advantages and disadvantages.
If you store the child nodes in an array, then you can look up which child to visit extremely efficiently by just indexing into the array. However, the space usage per node will be high: O(|Σ|), where Σ is the set of letters that your words can be formed from, even if most of those children are null.
If you store the child nodes in a linked list, then the time required to find a child will be O(|Σ|), since you may need to scan across all of the nodes of the linked list to find the child you want. On the other hand, the space efficiency will be quite good, because you only store the children that you're using. You could also consider using a fixed-sized array here, which has even better space usage but leads to very expensive insertions and deletions.
If you store the child nodes in a hash table, then the (expected) time to find a child will be O(1) and the memory usage will only be proportional (roughly) to the number of children you have. Interestingly, because you know in advance what values you're going to be hashing, you could consider using a dynamic perfect hash table to ensure worst-case O(1) lookups, at the expense of some precomputation.
Another option would be to store the child nodes in a binary search tree. This gives rise to the ternary search tree data structure. This choice is somewhere between the linked list and hash table options - the space usage is low and you can perform predecessor and successor queries efficiently, but there's a slight increase in the cost of performing a lookup due to the search cost in the BST. If you have a static trie where insertions never occur, you can consider using weight-balanced trees as the BSTs at each point; this gives excellent runtime for searches (O(n + log k), where n is the length of the string to search for and k is the total number of words in the trie).
In short, the array lookups are fastest but its space usage in the worst case is the worst. A statically-sized array has the best memory usage but expensive insertions and deletions. The hash table has decently fast lookups and good memory usage (on average). Binary search trees are somewhere in the middle. I would suggest using the hash table here, though if you put a premium on space and don't care about lookup times the linked list might be better. Also, if your alphabet is small (say, you're making a binary trie), the array overhead won't be too bad and you may want to use that.
Hope this helps!
如果您尝试仅为字母表构建特里树,我建议使用数组,然后使用particia树(空间优化特里树)。
http://en.wikipedia.org/wiki/Radix_tree
这将允许您进行快速查找与数组一起使用,如果某个节点的分支因子较低,则不会浪费太多空间。
If you are trying to build trie just for alphabets, I would suggest to use array and then use particia tree (space optimized trie).
http://en.wikipedia.org/wiki/Radix_tree
This will allow you to do fast lookup with array and doesn't waste too much of space if branching factor of certain node is low.