当前位置：文江博客话题详情

最短异常子序列

发布于 2025-01-11 04:31:04 字数 166 浏览 4 评论 0原文

给定两个字符串 s 和 t，确定最短字符串 z 的长度，使得 z 是 s 的子序列而不是 t 的子序列。

例如：

s :babab, t :babba

sol : 3 (aab)

不寻找复制粘贴代码，如果有人可以帮助直觉解决这个问题。

多谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮萍、无处依 2025-01-18 04:31:04

您可以使用动态规划在二次时间内解决这个问题，就像最长公共子序列一样。我将展示该公式以及您将如何得出它。

首先，一些定义。令m为S的长度，令n为T的长度。将 DP(i, j) 定义为 S[:i] 中非 T[:j] 子序列的最短子序列的长度code> 或 'INF'（如果不存在）。这里，表达式 S[:i] 是切片符号，意思是“S 的第一个 i 个字符，因此 S[:0” ] 为空，S[:m] 是 S 的全部。我们需要DP(m, n)。

有两种简单的基本情况：由于 T[:0] 为空，因此 S 中的任何字符都可以工作，因此如果 i 则 DP(i, 0) = 1 > 0 。同样，对于所有 j，DP(0, j) = 'INF'。

现在，我们只需为 DP(i, j) 编写一个通用公式，该公式仅取决于小于 (i ，j)。 S[:i] 的最后一个字符只是某个字符 S[i-1]。 S[:i], T[:j] 的最佳子序列要么以 S[i-1] 结尾，要么不是。

如果我们的最佳子序列不是以 S[i-1] 结尾，那么我们可以从考虑中删除 S[i-1]，我们的答案是 DP（i，j）= DP（i-1，j）。

如果我们的最佳子序列确实以 S[i-1] 结尾，那么我们需要知道 S[i-1] 在 T[ 中最右边出现的位置： j]。

如果 S[i-1] 根本没有出现在 T[:j] 中，则 S[i-1] 本身就是最短子序列，因此 DP(i, j) = 1。
否则，令 Rightmost(c, j) 为 T[:j] 的最右边索引，等于某个字符 c。由于我们使用 S[i-1] 来结束最优子序列，因此我们可以忽略 T[:j] 中最右边出现 之后的所有字符S[i-1]：它们不再影响字符串是否是 T[:j] 的子序列。那么DP(i, j) = DP(i-1, Rightmost(S[i-1], j)) + 1，其中+1来自事实上，我们确实选择使用 S[i-1]。

将它们放在一起，得到 i, j > 的一般公式： 0 变为：

DP(i, j) = 1 if (S[i-1] not in T[:j]), or 

         = min(DP(i-1, j),
               DP(i-1, Rightmost(S[i-1], j)) + 1) otherwise.

由于根据定义，Rightmost(c, j) 始终小于 j，因此我们仅使用小于（按字典顺序）的索引实现了一个公式(i, j)，我们可以直接使用该公式进行递归算法。

You can use dynamic programming to solve this in quadratic time just like the longest common subsequence. I'll show the formula and how you would come up with it.

First, some definitions. Let m be the length of S, and let n be the length of T. Define DP(i, j) as the length of the shortest subsequence of S[:i] that is not a subsequence of T[:j], or 'INF' if none exists. Here, the expression S[:i] is slice notation meaning 'the first i characters of S, so S[:0] is empty and S[:m] is all of S. We want DP(m, n).

There's two easy base cases: Since T[:0] is empty, any character in S will work, so DP(i, 0) = 1 if i > 0. Similarly, DP(0, j) = 'INF' for all j.

Now, we just have to write a general formula for DP(i, j) which only depends on the value of DP() on indices smaller than (i, j). The last character of S[:i] is just some character S[i-1]. Either our optimal subsequence for S[:i], T[:j] ends with S[i-1] or it doesn't.

If our optimal subsequence doesn't end with S[i-1], then we can delete S[i-1] from consideration, and our answer is DP(i, j) = DP(i-1, j).

If our optimal subsequence does end with S[i-1], then we need to know the rightmost occurrence of S[i-1] in T[:j].

If S[i-1] does not occur in T[:j] at all, then S[i-1] by itself is a shortest subsequence, so DP(i, j) = 1.
Otherwise, let Rightmost(c, j) be the rightmost index of T[:j] equal to some character c. Since we are using S[i-1] to end our optimal subsequence, we can ignore all the characters in T[:j] after the rightmost occurrence of S[i-1]: they can no longer affect whether a string is a subsequence of T[:j]. So then DP(i, j) = DP(i-1, Rightmost(S[i-1], j)) + 1, where the +1 comes from the fact that we did choose to use S[i-1].

Putting those together, the general formula for i, j > 0 becomes:

DP(i, j) = 1 if (S[i-1] not in T[:j]), or 

         = min(DP(i-1, j),
               DP(i-1, Rightmost(S[i-1], j)) + 1) otherwise.

Since Rightmost(c, j) is always less than j by definition, we've achieved a formula using only indices smaller (lexicographically) than (i, j), and we can use that formula directly for a recursive algorithm.

回复收藏 0 原文

~没有更多了~