如何求指数时间内的最长公共子序列?

发布于 2024-12-28 18:18:20 字数 109 浏览 4 评论 0 原文

我可以使用动态编程以正确的方式做到这一点,但我不知道如何在指数时间内做到这一点。

我正在寻找两个字符串之间最大的公共子序列。 注意:我的意思是子序列而不是子串,组成序列的符号不必是连续的。

I can do this the proper way using dynamic programming but I can't figure out how to do it in exponential time.

I'm looking to find the largest common sub-sequence between two strings.
Note: I mean subsequences and not sub-strings the symbols that make up a sequence need not be consecutive.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

英雄似剑 2025-01-04 18:18:20

只需用递归调用替换动态编程代码中表中的查找即可。换句话说,只需实现 LCS 问题的递归公式:

在此处输入图像描述

编辑

在伪代码中(实际上几乎是 python):

def lcs(s1, s2):
 if len(s1)==0 or len(s2)==0: return 0
 if s1[0] == s2[0]: return 1 + lcs(s1[1:], s2[1:])
 return max(lcs(s1, s2[1:]), lcs(s1[1:], s2))

Just replace the lookups in the table in your dynamic programming code with recursive calls. In other words, just implement the recursive formulation of the LCS problem:

enter image description here

EDIT

In pseudocode (almost-python, actually):

def lcs(s1, s2):
 if len(s1)==0 or len(s2)==0: return 0
 if s1[0] == s2[0]: return 1 + lcs(s1[1:], s2[1:])
 return max(lcs(s1, s2[1:]), lcs(s1[1:], s2))
待天淡蓝洁白时 2025-01-04 18:18:20

假设您有两个长度为 n 的字符串 ab。最长公共子序列将是字符串 a 中最长的子序列,同时也出现在字符串 b 中。

因此,我们可以迭代 a 中所有可能的子序列,并查看它在 b 中。

一个高级伪代码是:

for i=n to 0
    for all length i subsequences s of a
        if s is a subsequence of b
            return s

Let's say you have two strings a and b of length n. The longest common subsequence is going to be the longest subsequence in string a that is also present in string b.

Thus we can iterate through all possible subsequences in a and see it is in b.

A high-level pseudocode for this would be:

for i=n to 0
    for all length i subsequences s of a
        if s is a subsequence of b
            return s
梦太阳 2025-01-04 18:18:20

字符串A和字符串B。一个递归算法,也许它很幼稚,但很简单:

看看A的第一个字母。这要么是一个共同的序列,要么不是。当考虑“not”选项时,我们修剪掉第一个字母并递归调用。当考虑“处于公共序列中”选项时,我们也会将其修剪掉,并且还会从 B 的开头一直修剪到 B 中的相同字母(包括 B 中的相同字母)。一些伪代码:

def common_subsequences(A,B, len_subsequence_so_far = 0):
    if len(A) == 0 or len(B) == 0:
        return
    first_of_A = A[0] // the first letter in A.
    A1 = A[1:] // A, but with the first letter removed
    common_subsequences(A1,B,len_subsequence_so_far) // the first recursive call
    if(the_first_letter_of_A_is_also_in_B):
        Bn = ... delete from the start of B up to, and including,
             ... the first letter which equals first_of_A
        common_subsequences(A1,Bn, 1+len_subsequence_so_far )

您可以从它开始,然后通过以下方式进行优化记住迄今为止找到的最长子序列,然后在当前函数无法击败该子序列时(即当 min(len(A), len(B))+len_subsequence_so_far 小于找到的最长子序列时提前返回)所以 远的。

String A and String B. A recursive algorithm, maybe it's naive but it is simple:

Look at the first letter of A. This will either be in a common sequence or not. When considering the 'not' option, we trim off the first letter and call recursively. When considering the 'is in a common sequence' option we also trim it off and we also trim off from the start of B up to, and including, the same letter in B. Some pseudocode:

def common_subsequences(A,B, len_subsequence_so_far = 0):
    if len(A) == 0 or len(B) == 0:
        return
    first_of_A = A[0] // the first letter in A.
    A1 = A[1:] // A, but with the first letter removed
    common_subsequences(A1,B,len_subsequence_so_far) // the first recursive call
    if(the_first_letter_of_A_is_also_in_B):
        Bn = ... delete from the start of B up to, and including,
             ... the first letter which equals first_of_A
        common_subsequences(A1,Bn, 1+len_subsequence_so_far )

You could start with that and then optimize by remembering the longest subsequence found so far, and then returning early when the current function cannot beat that (i.e. when min(len(A), len(B))+len_subsequence_so_far is smaller than the longest length found so far.

2025-01-04 18:18:20

本质上,如果您不使用动态编程范例,您就会达到指数时间。这是因为,如果不存储部分值,您将多次重新计算部分值。

Essentially if you don't use dynamic programming paradigm - you reach exponential time. This is because, by not storing your partial values - you are recomputing the partial values multiple times.

往日 2025-01-04 18:18:20

为了实现指数时间,生成两个数组的所有子序列并将每个子序列相互比较就足够了。如果匹配两个相同的长度,请检查它们的长度是否大于当前的最大值。伪代码是:

Generate all subsequences of `array1` and `array2`.
for each subsequence of `array1` as s1
    for each subsequece of `array2` as s2
        if s1 == s2 //check char by char
            if len(s1) > currentMax
                currentMax = len(s1)
for i = 0; i < 2^2; i++;

这绝对不是最佳。它甚至不尝试。然而问题是关于非常低效的算法,所以我提供了一个。

To achieve exponential time it's enough to generate all subsequences of both arrays and compare each one with each other. If you match two that are identical check if their length is larger then current maximum. The pseudocode would be:

Generate all subsequences of `array1` and `array2`.
for each subsequence of `array1` as s1
    for each subsequece of `array2` as s2
        if s1 == s2 //check char by char
            if len(s1) > currentMax
                currentMax = len(s1)
for i = 0; i < 2^2; i++;

It's absolutely not optimal. It doesn't even try. However the question is about the very inefficient algorithm so I've provided one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文