用更少的内存找到最长的回文子序列

发布于 2024-11-09 15:03:14 字数 1778 浏览 0 评论 0原文

我正在尝试解决 Cormem 的算法简介第三版中的动态规划问题（第 405 页）要求以下内容：

回文是一个非空字符串一些读起来相同的字母表前进和后退。的例子回文都是长度的字符串 1、civic、racecar 和 aibohphobia （对回文的恐惧）。
给出一个高效的算法来查找最长的回文是给定输入字符串的子序列。例如，给定输入 字符，你的算法应该返回carac。

好吧，我可以通过两种方式解决它：

第一种解决方案：

字符串的最长回文子序列（LPS）就是自身及其反向的最长公共子序列。（我在解决另一个相关问题后构建了这个解决方案，该问题要求最长递增子序列一个序列）。由于它只是 LCS 变体，因此也需要 O(n²) 时间和 O(n²) 内存。

第二个解决方案：

第二个解决方案更详细一些，但也遵循通用的 LCS 模板。它来自以下递归：

lps(s[i..j]) = 
    s[i] + lps(s[i+1]..[j-1]) + s[j], if s[i] == s[j];
    max(lps(s[i+1..j]), lps(s[i..j-1])) otherwise

计算 lps 长度的伪代码如下：

compute-lps(s, n):

    // palindromes with length 1
    for i = 1 to n:
        c[i, i] = 1
    // palindromes with length up to 2
    for i = 1 to n-1:
        c[i, i+1] = (s[i] == s[i+1]) ? 2 : 1

    // palindromes with length up to j+1
    for j = 2 to n-1:
        for i = 1 to n-i:
            if s[i] == s[i+j]:
                c[i, i+j] = 2 + c[i+1, i+j-1]
            else:
                c[i, i+j] = max( c[i+1, i+j] , c[i, i+j-1] )

如果我想有效构造 lps，它仍然需要 O(n²) 时间和内存（因为我将需要表格上的所有单元格）。分析相关问题，例如LIS，可以使用类似于LCS以外的方法以更少的内存来解决（LIS可以用O(n)内存来解决），我想知道是否可以用O(n)内存来解决它，也。

LIS 通过链接候选子序列来实现此界限，但对于回文来说，这就更难了，因为这里重要的不是子序列中的前一个元素，而是第一个元素。有谁知道是否可以做到这一点，或者以前的解决方案内存是否最佳？

原文

I am trying to solve a dynamic programming problem from Cormem's Introduction to Algorithms 3rd edition (pg 405) which asks the following:

A palindrome is a nonempty string over
some alphabet that reads the same
forward and backward. Examples of
palindromes are all strings of length
1, civic, racecar, and aibohphobia
(fear of palindromes).
Give an efficient algorithm to find
the longest palindrome that is a
subsequence of a given input string.
For example, given the input
character, your algorithm should
return carac.

Well, I could solve it in two ways:

First solution:

The Longest Palindrome Subsequence (LPS) of a string is simply the Longest Common Subsequence of itself and its reverse. (I've build this solution after solving another related question which asks for the Longest Increasing Subsequence of a sequence).
Since it's simply a LCS variant, it also takes O(n²) time and O(n²) memory.

Second solution:

The second solution is a bit more elaborated, but also follows the general LCS template. It comes from the following recurrence:

lps(s[i..j]) = 
    s[i] + lps(s[i+1]..[j-1]) + s[j], if s[i] == s[j];
    max(lps(s[i+1..j]), lps(s[i..j-1])) otherwise

The pseudocode for calculating the length of the lps is the following:

compute-lps(s, n):

    // palindromes with length 1
    for i = 1 to n:
        c[i, i] = 1
    // palindromes with length up to 2
    for i = 1 to n-1:
        c[i, i+1] = (s[i] == s[i+1]) ? 2 : 1

    // palindromes with length up to j+1
    for j = 2 to n-1:
        for i = 1 to n-i:
            if s[i] == s[i+j]:
                c[i, i+j] = 2 + c[i+1, i+j-1]
            else:
                c[i, i+j] = max( c[i+1, i+j] , c[i, i+j-1] )

It still takes O(n²) time and memory if I want to effectively construct the lps (because I 'll need all cells on the table). Analysing related problems, such as LIS, which can be solved with approaches other than LCS-like with less memory (LIS is solvable with O(n) memory), I was wondering if it's possible to solve it with O(n) memory, too.

LIS achieves this bound by linking the candidate subsequences, but with palindromes it's harder because what matters here is not the previous element in the subsequence, but the first. Does anyone know if is possible to do it, or are the previous solutions memory optimal?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

长梦不多时 2024-11-16 15:03:14

这是一个非常节省内存的版本。但我还没有证明它总是 O(n) 内存。（通过预处理步骤，它可以比 O(n²) CPU 更好，尽管 O(n²)是最坏的情况。）

从最左边的位置开始。对于每个位置，跟踪最远点的表，在这些点上您可以生成长度为 1、2、3 等的反射子序列（这意味着我们点左侧的子序列会反射到右侧。）每个反射子序列我们存储一个指向子序列下一部分的指针。

当我们按照正确的方式工作时，我们从字符串的右侧搜索到当前元素出现的任何位置，并尝试使用这些匹配来改进我们之前的边界。完成后，我们查看最长的镜像子序列，我们可以轻松构建最佳回文。

让我们考虑一下角色。

我们从最好的回文字母“c”开始，并通过位于字符串末尾的一对 (0, 11) 到达我们的镜像子序列。
接下来考虑位置 1 处的“c”。 (length, end, start) 形式的最佳镜像子序列现在是 [(0, 11, 0), (1, 6, 1)]。（我将省略实际查找回文所需生成的链表。
接下来考虑位置 2 处的 h。我们不会改进边界 [(0, 11, 0 ), (1, 6, 1)]
接下来考虑位置 3 处的 a。我们将边界改进为 [(0, 11, 0), (1, 6, 1), (2, 5, 3)]
接下来考虑位置 4 处的 r。我们将边界改进为 [(0, 11, 0), (1, 10, 4), (2, 5, 3)]（这就是链表有用的地方。

通过列表的其余部分，我们不会改进该组边界。

所以我们最终最长的镜像列表的长度为 2。我们将沿着链表（我没有在本描述中记录它）发现它是 ac。因为该列表的末尾是在位置 (5, 3) 处，我们可以翻转列表，插入字符 4，然后追加列表以获取 carac

一般情况下的最大值。其需要的存储器是存储最大镜像子序列的所有长度加上存储所述子序列的链表的存储器。通常，这将是非常小的内存量。

在经典的内存/CPU 权衡中，您可以及时对列表进行 O(n) 预处理，以生成特定序列元素出现位置的 O(n) 大小的数组散列。这可以让您扫描“通过此配对改进镜像子序列”，而不必考虑整个字符串，对于较长的字符串，这通常应该是 CPU 的主要节省。

Here is a very memory efficient version. But I haven't demonstrated that it is always O(n) memory. (With a preprocessing step it can better than O(n²) CPU, though O(n²) is the worst case.)

Start from the left-most position. For each position, keep track of a table of the farthest out points at which you can generate reflected subsequences of length 1, 2, 3, etc. (Meaning that a subsequence to the left of our point is reflected to the right.) For each reflected subsequence we store a pointer to the next part of the subsequence.

As we work our way right, we search from the RHS of the string to the position for any occurrences of the current element, and try to use those matches to improve the bounds we previously had. When we finish, we look at the longest mirrored subsequence and we can easily construct the best palindrome.

Let's consider this for character.

We start with our best palindrome being the letter 'c', and our mirrored subsequence being reached with the pair (0, 11) which are off the ends of the string.
Next consider the 'c' at position 1. Our best mirrored subsequences in the form (length, end, start) are now [(0, 11, 0), (1, 6, 1)]. (I'll leave out the linked list you need to generate to actually find the palindrome.
Next consider the h at position 2. We do not improve the bounds [(0, 11, 0), (1, 6, 1)].
Next consider the a at position 3. We improve the bounds to [(0, 11, 0), (1, 6, 1), (2, 5, 3)].
Next consider the r at position 4. We improve the bounds to [(0, 11, 0), (1, 10, 4), (2, 5, 3)]. (This is where the linked list would be useful.

Working through the rest of the list we do not improve that set of bounds.

So we wind up with the longest mirrored list is of length 2. And we'd follow the linked list (that I didn't record in this description to find it is ac. Since the ends of that list are at positions (5, 3) we can flip the list, insert character 4, then append the list to get carac.

In general the maximum memory that it will require is to store all of the lengths of the maximal mirrored subsequences plus the memory to store the linked lists of said subsequences. Typically this will be a very small amount of memory.

At a classic memory/CPU tradeoff you can preprocess the list once in time O(n) to generate a O(n) sized hash of arrays of where specific sequence elements appear. This can let you scan for "improve mirrored subsequence with this pairing" without having to consider the whole string, which should generally be a major saving on CPU for longer strings.

回复收藏 0 原文

梦开始←不甜 2024-11-16 15:03:14

@Luiz Rodrigo 问题中的第一个解决方案是错误的：字符串的最长公共子序列（LCS）及其反向不一定是回文。

示例：对于字符串 CBACB，CAB 是字符串的 LCS 及其反转，它显然不是回文。
然而，有一种方法可以让它发挥作用。字符串及其反向的 LCS 建立后，取其左半部分（对于奇数长度字符串，包括中间字符），并在右侧用反转的左半部分对其进行补码（如果字符串长度为奇数，则不包括中间字符））。
显然它是一个回文，并且可以简单地证明它将是字符串的子序列。

对于上述LCS，以这种方式构建的回文将是CAC。

回复收藏 0 原文

~没有更多了~