多个序列的最长公共子序列

发布于 2024-11-02 16:36:30 字数 353 浏览 5 评论 0原文

我已经做了很多研究来寻找 M = 2 序列的最长序列,但我试图弄清楚如何为 M ≥ 2 序列做到这一点,

我得到了 N 和 M: M 序列,具有 N 个独特元素。 N 是 {1 - N} 的集合。我已经考虑过动态编程方法,但我仍然对如何实际合并它感到困惑。

示例输入

5 3
5 3 4 1 2
2 5 4 3 1
5 2 3 1 4

这里的最大序列可以看作

5 3 1

预期输出

Length = 3

I have done a bunch of research for finding the longest for M = 2 sequences, but I am trying to figure out how to do it for M ≥ 2 sequences

I am being given N and M: M sequences, with N unique elements. N is the set of {1 - N}. I have thought about the dynamic programming approach, but I am still confused as to how to actually incorporate it.

Example input

5 3
5 3 4 1 2
2 5 4 3 1
5 2 3 1 4

The max sequence here can be seen to be

5 3 1

Expected output

Length = 3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浴红衣 2024-11-09 16:36:30

一个简单的想法。

对于 1N 之间的每个数字 i,计算最后一个数字为 i 的最长子序列。 (我们称之为a[i]

为此,我们将从头到尾迭代第一个序列中的数字i。如果a[i]> 1,然后是数字 j,在每个序列中它都位于 i 之前。
现在我们可以检查 j 的所有可能值,并且(如果前面的条件成立)执行 a[i] = max(a[i], a[j] + 1)

作为最后一位,因为在第一个序列中 j 出现在 i 之前,这意味着 a[j] 已经计算出来了。

for each i in first_sequence
    // for the OP's example, 'i' would take values [5, 3, 4, 1, 2], in this order
    a[i] = 1;
    for each j in 1..N
        if j is before i in each sequence
            a[i] = max(a[i], a[j] + 1)
        end
    end
end

如果你事先计算位置矩阵,它的复杂度是O(N^2*M)

A simple idea.

For each number i between 1 and N, calculate the longest subsequence where the last number is i. (Let's call it a[i])

To do that, we'll iterate over numbers i in the first sequence from start to end. If a[i] > 1, then there's number j such that in each sequence it comes before i.
Now we can just check all possible values of j and (if previous condition holds) do a[i] = max(a[i], a[j] + 1).

As the last bit, because j comes before i in first sequence, it means a[j] is already calculated.

for each i in first_sequence
    // for the OP's example, 'i' would take values [5, 3, 4, 1, 2], in this order
    a[i] = 1;
    for each j in 1..N
        if j is before i in each sequence
            a[i] = max(a[i], a[j] + 1)
        end
    end
end

It's O(N^2*M), if you calculate matrix of positions beforehand.

浊酒尽余欢 2024-11-09 16:36:30

由于您有独特的元素,@Nikita Rybak 的答案是可以使用的答案,但既然您提到了动态编程,那么当您有两个以上序列时,以下是使用 DP 的方法:

dp[i, j, k] = length of longest common subsequence considering the prefixes
              a[1..i], b[1..j], c[1..k].


dp[i, j, k] = 1 + dp[i - 1, j - 1, k - 1] if a[i] = b[j] = c[k]
            = max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise

要获取实际的子序列,请使用递归函数从 dp[a.Length, b.Length, c.Length] 开始,基本上反转了上面的公式:如果三个元素相等,则回溯到 dp[a.Length - 1 , b.Length - 1, c.Length - 1] 并打印该字符。如果不是,则按照上述值中的最大值回溯。

Since you have unique elements, @Nikita Rybak's answer is the one to go with, but since you mentioned dynamic programming, here's how you'd use DP when you have more than two sequences:

dp[i, j, k] = length of longest common subsequence considering the prefixes
              a[1..i], b[1..j], c[1..k].


dp[i, j, k] = 1 + dp[i - 1, j - 1, k - 1] if a[i] = b[j] = c[k]
            = max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise

To get the actual subsequence back, use a recursive function that starts from dp[a.Length, b.Length, c.Length] and basically reverses the above formulas: if the three elements are equal, backtrack to dp[a.Length - 1, b.Length - 1, c.Length - 1] and print the character. If not, backtrack according to the max of the above values.

真心难拥有 2024-11-09 16:36:30

您可以查看“设计用于查找常见 DNA 子序列的新确定性算法”论文。您可以使用此算法构建 DAG(第 8 页,图 5)。从 DAG 中读取最大的公共不同子序列。然后尝试使用动态规划方法,使用 M 的值来决定每个序列需要构建多少个 DAG。基本上使用这些子序列作为键,并将相应的序列号存储在找到的地方,然后尝试找到最大的子序列(可以大于1)。

You can look into "Design of a new Deterministic Algorithm for finding Common DNA Subsequence" paper. You can use this algorithm to construct the DAG (pg 8, figure 5). From the DAG, read the largest common distinct subsequences. Then try a dynamic programming approach on that using the value of M to decide how many DAGs you need to construct per sequence. Basically use these subsequences as key and store the corresponding sequence numbers where it is found and then try to find the largest subsequence (which can be more than 1).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文