有人可以解释何时以及如何扩展后缀树吗?
我正在编写一个 php 脚本,它必须找到最长的重复子字符串。我发现了这个后缀树的东西。我正在尝试实现 Ukkonnen 的算法,但我不知道何时以及如何扩展树。
如果我有不在树中的新字符也没关系,但我必须从根为其创建一个新节点和egde。但我该如何知道是否必须分割一条边呢?
我找到了它的 C++ 实现(链接),我尝试翻译它到 php 但我想我在里面有一个打字,因为它给出了几乎很好的结果,问题是我无法修复它,直到我不完全理解它......
我读了很多描述后缀树,但其中一些并没有深入其中,另一些在第二句话后让我头痛。
这是我现在拥有的代码: Suffix-tree.php (抱歉,但这个编辑器无法接受)我使用这个网站来检查结果。
因此,任何建议将不胜感激...
编辑:我从上述网站上找到的 JavaScript 内容重写了它。以下是来源链接:Suffix-Tree v0.1
I'm working on a php script which has to find the longest repeated substring. I found this Suffix-Tree thing. I'm trying to implement Ukkonnen's algorithm, but I can't get when and how to extend the tree.
It's okay if i have new charachter which is not in the tree yet I have to create a new node and egde from the root for it. But how should I know if I have to split up an edge?
I found a C++ implementation of it (link) and I tried to translate it to php but I think I've got a typeo in it because it gives an almost good result, the problem is I can't fix it until I do not understand it fully...
I read a dozen of descriptions of Suffix-Trees but some of them do not go too deep in it, others give me headache after the second sentece.
Here is the code what I have now: Suffix-tree.php (Sorry but this editor couldn't take it) I used this site to check the result.
So any advise would be appreciated...
EDIT: I rewrote it from the JavaScript stuff found on the mentioned site. Here is the link to the source: Suffix-Tree v0.1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
数据压缩专家 Matt Mahoney 给出了很好的解释。但我也不明白其实现,这相当困难。仅供参考,我已经成功运行了 suffix-tree php 扩展。如果有帮助的话,你可以在 sourceforge 上找到我的代码。我很想看看你的最终代码!
A good explanation is given by Matt Mahoney, a data compression expert. But I didn't understand the implementation too, it's quite difficult. FYI I've managed to run a suffix-tree php extension. You can find my code at sourceforge if it helps. I would love to see your final code though!