最长公共子序列
考虑 2 个序列 X[1..m] 和 Y[1..n]。记忆算法将在 O(m*n) 时间内计算 LCS。有没有更好的算法来找出 LCS wrt 时间?我猜对角线进行记忆可以给我们带来 O(min(m,n)) 时间复杂度。
Consider 2 sequences X[1..m] and Y[1..n]. The memoization algorithm would compute the LCS in time O(m*n). Is there any better algorithm to find out LCS wrt time? I guess memoization done diagonally can give us O(min(m,n)) time complexity.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Gene Myers 在 1986 年为此提出了一个非常好的算法,如下所述:一种 O(ND) 差分算法及其变体。
该算法所花费的时间与序列之间的编辑距离成正比,因此当差异较小时,速度要快得多。它的工作原理是从 0 开始循环遍历所有可能的编辑距离,直到找到可以构建编辑脚本(在某些方面是 LCS 的对偶)的距离。这意味着,如果差异超过某个阈值,您可以“提前退出”,这有时很方便。
我相信这个算法仍然在许多
diff
实现中使用。Gene Myers in 1986 came up with a very nice algorithm for this, described here: An O(ND) Difference Algorithm and Its Variations.
This algorithm takes time proportional to the edit distance between sequences, so it is much faster when the difference is small. It works by looping over all possible edit distances, starting from 0, until it finds a distance for which an edit script (in some ways the dual of an LCS) can be constructed. This means that you can "bail out early" if the difference grows above some threshold, which is sometimes convenient.
I believe this algorithm is still used in many
diff
implementations.如果您事先知道您关心的最大大小k的上限,则可以通过在内循环中添加额外的检查来强制 LCS 算法提前退出。这意味着当k <<时min(m,n) 尽管您正在进行 LCS,但您可以获得较短的运行时间。
If you know a priori an upper bound on the maximum size k you care about, you can force the LCS algorithm to exit early by adding an extra check in the inner loop. This means then when k << min(m,n) you can get small running times in spite of the fact you are doing LCS.
是的,我们可以创建一个比 Order O(m*n) 更好的算法——
即 O(min(m,n))。找到一个长度......
只需比较对角线元素。每当增量完成时,假设它发生在 c[2,2] 中,然后将 c[2,2++] 和 c[2++,2] 中的所有值增加 1..
并继续直到 c[m,m]..(假设 m
yes we could create a better algorithm than Order O(m*n)---
i.e O(min(m,n)). to find a length.....
just compare the diagonal elements.and whenever the increment is done suppose it occured in c[2,2] then increment all the value from c[2,2++] and c[2++,2] by 1..
and proceed till c[m,m]..(suppose m