多个序列的最长公共子序列
我已经做了很多研究来寻找 M = 2 序列的最长序列,但我试图弄清楚如何为 M ≥ 2 序列做到这一点,
我得到了 N 和 M: M 序列,具有 N 个独特元素。 N 是 {1 - N} 的集合。我已经考虑过动态编程方法,但我仍然对如何实际合并它感到困惑。
示例输入
5 3
5 3 4 1 2
2 5 4 3 1
5 2 3 1 4
这里的最大序列可以看作
5 3 1
预期输出
Length = 3
I have done a bunch of research for finding the longest for M = 2 sequences, but I am trying to figure out how to do it for M ≥ 2 sequences
I am being given N and M: M sequences, with N unique elements. N is the set of {1 - N}. I have thought about the dynamic programming approach, but I am still confused as to how to actually incorporate it.
Example input
5 3
5 3 4 1 2
2 5 4 3 1
5 2 3 1 4
The max sequence here can be seen to be
5 3 1
Expected output
Length = 3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一个简单的想法。
对于
1
和N
之间的每个数字i
,计算最后一个数字为i
的最长子序列。 (我们称之为a[i]
)为此,我们将从头到尾迭代第一个序列中的数字
i
。如果a[i]> 1
,然后是数字j
,在每个序列中它都位于i
之前。现在我们可以检查
j
的所有可能值,并且(如果前面的条件成立)执行a[i] = max(a[i], a[j] + 1)
。作为最后一位,因为在第一个序列中
j
出现在i
之前,这意味着a[j]
已经计算出来了。如果你事先计算位置矩阵,它的复杂度是
O(N^2*M)
。A simple idea.
For each number
i
between1
andN
, calculate the longest subsequence where the last number isi
. (Let's call ita[i]
)To do that, we'll iterate over numbers
i
in the first sequence from start to end. Ifa[i] > 1
, then there's numberj
such that in each sequence it comes beforei
.Now we can just check all possible values of
j
and (if previous condition holds) doa[i] = max(a[i], a[j] + 1)
.As the last bit, because
j
comes beforei
in first sequence, it meansa[j]
is already calculated.It's
O(N^2*M)
, if you calculate matrix of positions beforehand.由于您有独特的元素,@Nikita Rybak 的答案是可以使用的答案,但既然您提到了动态编程,那么当您有两个以上序列时,以下是使用 DP 的方法:
要获取实际的子序列,请使用递归函数从 dp[a.Length, b.Length, c.Length] 开始,基本上反转了上面的公式:如果三个元素相等,则回溯到 dp[a.Length - 1 , b.Length - 1, c.Length - 1] 并打印该字符。如果不是,则按照上述值中的最大值回溯。
Since you have unique elements, @Nikita Rybak's answer is the one to go with, but since you mentioned dynamic programming, here's how you'd use DP when you have more than two sequences:
To get the actual subsequence back, use a recursive function that starts from
dp[a.Length, b.Length, c.Length]
and basically reverses the above formulas: if the three elements are equal, backtrack todp[a.Length - 1, b.Length - 1, c.Length - 1]
and print the character. If not, backtrack according to the max of the above values.您可以查看“设计用于查找常见 DNA 子序列的新确定性算法”论文。您可以使用此算法构建 DAG(第 8 页,图 5)。从 DAG 中读取最大的公共不同子序列。然后尝试使用动态规划方法,使用 M 的值来决定每个序列需要构建多少个 DAG。基本上使用这些子序列作为键,并将相应的序列号存储在找到的地方,然后尝试找到最大的子序列(可以大于1)。
You can look into "Design of a new Deterministic Algorithm for finding Common DNA Subsequence" paper. You can use this algorithm to construct the DAG (pg 8, figure 5). From the DAG, read the largest common distinct subsequences. Then try a dynamic programming approach on that using the value of M to decide how many DAGs you need to construct per sequence. Basically use these subsequences as key and store the corresponding sequence numbers where it is found and then try to find the largest subsequence (which can be more than 1).