使用优化的 Levenshtein 算法查找最近邻居
我最近发布了一个关于优化计算 Levenshtein 距离的算法的问题 ,这些回复将我引向维基百科关于 Levenshtein Distance 的文章。
文章提到,如果给定查询可能结果的最大距离有一个限制 k,那么运行时间可以从 O(mn) 减少O(kn)、m 和 n 是字符串的长度。我查阅了算法,但我无法真正弄清楚如何实现它。我希望在这里能得到一些线索。
优化是“可能的改进”下的#4。
让我困惑的部分是我们只需要计算宽度 2k+1 的对角线条纹,以主对角线为中心(主对角线定义为坐标 (i,i) )。
如果有人可以提供一些帮助/见解,我将非常感激。如果需要的话,我可以在这里发布书中算法的完整描述作为答案。
I recently posted a question about optimizing the algorithm to compute the Levenshtein Distance, and the replies lead me to the Wikipedia article on Levenshtein Distance.
The article mentioned that if there is a bound k on the maximum distance a possible result can be from the given query, then the running time can be reduced from O(mn) to O(kn), m and n being the lengths of the strings. I looked up the algorithm, but I couldn't really figure out how to implement it. I was hoping to get some leads on that here.
The optimization is #4 under "Possible Improvements".
The part that confused me was the one that said that we only need to compute a diagonal stripe of width 2k+1, centered on the main diagonal (the main diagonal is defined as coordinates (i,i)).
If someone could offer some help/insight, I would really appreciate it. If needed, I can post the complete description of the algorithm in the book as an answer here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我已经做过很多次了。我的方法是对可能变化的游戏树进行递归深度优先树遍历。我有一个 k 项更改预算,用于修剪树。有了这个例程,我首先使用 k=0 运行它,然后 k=1,然后 k=2,直到我得到命中或者我不想再更高。
添加用于解释 trie 搜索:
现在,为了将其限制在预算范围内,如果 hdis 太大,则拒绝下降。
I've done it a number of times. The way I do it is with a recursive depth-first tree-walk of the game tree of possible changes. There is a budget k of changes, that I use to prune the tree. With that routine in hand, first I run it with k=0, then k=1, then k=2 until I either get a hit or I don't want to go any higher.
Added to explain trie-search:
Now, to limit it to a budget, just refuse to descend if hdis is too large.