Radix/Patricia Trie 的 STLish lower_bound 函数
最近,我一直在研究 Patricia attempts,并使用一个非常好的 C++ 实现,它可以用作 STL 排序关联容器。帕特里夏尝试与普通二叉树不同,因为叶节点具有指向内部节点的反向指针。尽管如此,如果您仅通过叶节点反向指针访问内部节点,则可以通过进行中序遍历来按字母顺序遍历 Patricia trie。
这让我想到一个问题:是否可以使用 Patricia trie 实现 STL lower_bound
和 upper_bound
函数?事实上,我使用的实现确实实现了这些功能,但它们没有按预期工作。
例如:
typedef uxn::patl::trie_set<std::string> trie;
trie ts;
ts.insert("LR");
ts.insert("BLQ");
ts.insert("HCDA");
trie::iterator it = ts.lower_bound("GG");
std::cout << *it << std::endl;
当我期望它输出 HCDA 时,它输出 BLQ。 (例如,std::set 肯定会在这里输出 HCDA。)
我给制作这个库的开发人员发了电子邮件,但从未得到回复。不管怎样,我觉得我对帕特里夏如何尝试工作有很好的理解,而且我无法弄清楚像 lower_bound 这样的东西是如何可能的。问题在于 lower_bound 似乎依赖于按字典顺序比较两个字符串的能力。由于树中不存在“GG”,因此我们需要找出哪个元素>= GG。但是 Radix/Patricia 尝试不使用字典比较来从一个节点移动到另一个节点;相反,每个节点存储一个位索引,用于对搜索关键字执行位比较。位比较的结果告诉您是向左移动还是向右移动。这使得在树中查找特定前缀变得容易。但是,如果树中不存在前缀(就像我搜索“GG”的情况一样),除了字典顺序比较之外,似乎没有任何方法可以获取 lower_bound。
我使用的 C++ 实现似乎没有正确实现 lower_bound ,这一事实证实了我的怀疑:这可能是不可能的。尽管如此,您可以按字母顺序迭代树这一事实让我认为可能有一种方法可以做到这一点。
有谁有这方面的经验,或者知道是否可以使用 Patricia Trie 实现 lower_bound 功能?
Lately I've been studying Patricia tries, and working with a really good C++ implementation which can be used as an STL Sorted Associative Container. Patricia tries differ from normal binary trees because leaf nodes have back pointers which point back to internal nodes. Nonetheless, it's possible to traverse a Patricia trie in alphabetical order by doing an in-order traversal, if you only visit internal nodes through leaf-node back pointers.
Which brings me to the question: is it possible to implement the STL lower_bound
and upper_bound
functions with a Patricia trie? The implementation I'm using does in fact, implement these functions, but they don't work as expected.
For example:
typedef uxn::patl::trie_set<std::string> trie;
trie ts;
ts.insert("LR");
ts.insert("BLQ");
ts.insert("HCDA");
trie::iterator it = ts.lower_bound("GG");
std::cout << *it << std::endl;
This outputs BLQ, when I would expect it to output HCDA. (An std::set
, for example, would certainly output HCDA here.)
I emailed the developer who made this library, but never got a response. Regardless, I feel I have a pretty good understanding of how Patricia tries work, and I can't figure out how something like lower_bound would even be possible. The problem is that lower_bound seems to rely on the ability to lexicographically compare the two strings. Since "GG" doesn't exist in the tree, we'd need to find out which element is >= to GG. But Radix/Patricia tries don't use lexicographical comparison to move from node to node; rather each node stores a bit index which is used to perform a bit comparison on the search key. The result of the bit comparison tells you whether to move left or right. This makes it easy to find a particular prefix in the tree. But if the prefix doesn't exist in the tree, (as in the case of my search for "GG"), there doesn't seem to be any way, short of a lexicographical comparison, to get the lower_bound.
The fact that the C++ implementation I'm using doesn't seem to implement lower_bound properly confirms my suspicion that it may not be possible. Still, the fact that you can iterate over the tree in alphabetical order makes me think there might be a way to do it.
Does anyone have experience with this, or know if it is possible to implement a lower_bound functionality with a Patricia Trie?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,这是可能的。我已经实现了一个变体来执行此操作,DJ Bernstein 的页面将其描述为快速操作之一。
http://cr.yp.to/critbit.html
原则上,你保持匹配前缀直到无法再匹配为止,然后转到下一个值,这就是您要查找的节点。
Yes, it is possible. I have implemented a variant which does this, and D. J. Bernstein's page describes that as one of the fast operations.
http://cr.yp.to/critbit.html
In principle, you keep matching the prefix until you can't match any more, and then you go to the next value, and there's the node you're after.