如何查找字符串中不同子序列的数量?
Here is another spoj problem that asks how to find the number of distinct subsequences of a string ?
For example,
Input
AAA
ABCDEFG
CODECRAFTOutput
4
128
496
How can I solve this problem ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个经典的动态规划问题。
设:
空字符串有一个子序列,因此
dp[0] = 1
。解释
最初,我们假设我们可以将
a[i]
附加到以先前字符结尾的所有子序列,但这可能违反计数子序列必须不同的条件。请记住,last[a[i]]
为我们提供了a[i]
迄今为止出现的最后位置。我们唯一过量计算的子序列是那些附加了之前的a[i]
的子序列,因此我们将它们减去。根据定义更新这些值。
如果您的索引从 0 开始,请在我使用
a[i]
的地方使用a[i - 1]
。如果您要提交代码,还请记住将计算包装在mod
函数中。应该这样实现:为了正确处理某些语言(例如 C/C++)中的负值。
It's a classic dynamic programming problem.
Let:
A null string has one subsequence, so
dp[0] = 1
.Explanation
Initially, we assume we can append
a[i]
to all subsequences ending on previous characters, but this might violate the condition that the counted subsequences need to be distinct. Remember thatlast[a[i]]
gives us the last positiona[i]
appeared on until now. The only subsequences we overcount are those that the previousa[i]
was appended to, so we subtract those.Update these values as per their definition.
If your indexing starts from 0, use
a[i - 1]
wherever I useda[i]
. Also remember to wrap your computations in amod
function if you're going to submit code. This should be implemented like this:In order to correctly handle negative values in some languages (such as C/C++).
对于这个问题有一个更简单的解决方案。
这个想法是:如果字符串中的所有字符都不同,则子序列的总数为
2^n。
现在,如果我们发现之前已经出现过的任何字符,我们应该只考虑它的最后一次出现(否则顺序不会不同)。因此,我们必须减去由于其先前出现而产生的子序列的数量。我的实现是这样的:
There exists an easier solution to this problem.
The idea is: If all character of the string are distinct, total number of subsequences is
2^n.
Now, if we find any character that have already occurred before, we should consider its last occurrence only (otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to its previous occurrence.My implementation is like this:
这是我的代码:
解释:
我从字符串的后面扫描,即从最后一个元素到第一个元素,因此发送第一个
n-1< /code> 字符用于在递归中进一步扫描。
一旦
n==-1或n<0(两者相同)
,我到达空字符串并返回1,因为没有。空字符串的子序列是 1。因此,在从递归返回时,我们知道将当前的非重复字符添加到前一个字符串会使数字加倍。的子序列。发生加倍是因为现在我可以在所有先前子序列的末尾添加这个字符。因此,
with
和without
这个字符意味着所有先前子序列的两倍。假设当前字符不是重复的,我将乘以前一个字符。子序列与 2.
在总数之后。前
n-1
字符的子序列已计算完毕,我们将前n
字符的子序列加倍。但是,假设当前遇到的字符(第 n 个字符)已经存在于之前的前 n-1 个字符中(即 - 在字符串 s[0....n-1] 中找到(注意: s[n] 是当前字符)),然后我们必须减去那些数字。可能的子序列的数量,最多(不包括)上次遇到当前字符时 s 的部分,并且该部分已被计算并存储在 L['此特定字符'] 中。
即 -
BACA
- 对于给定的字符串,之前已经遇到过第 4 个A
(从递归返回时,我们首先遇到B
,然后是A
,然后是C
,最后是A
),所以我们扣除了编号。计算到(不包括)第二个A
(为 2(因为A
之前的子序列数量为 2))的子序列数。所以,每次我们都计算出没有。前
n-1
个字符的子序列,我们将它们存储在数组L中。注意:L[k]存储第n个字符。第 k 个索引之前的子序列。
我使用了访问数组来检查我当前所在的给定字符是否已经被扫描过。
遇到当前字符时,我将当前位置的位置更新为
n
访问的数组。需要这样做是因为我们必须排除重复的序列。注意:
visited[]
初始化为全-1,因为字符串s
中任何字符的位置都是非负的(基于0的索引)。摘要:
如何得出重复项的数量?假设当前字符在 i 处的最后一次出现是在第 j 个位置。然后,我们将有重复的子序列:考虑从第 i 个字符开始,然后是 [0,j-1] 中可能的所有子序列,而不是从第 j 个字符开始,然后是 [0,j-1] 中可能的所有子序列。因此,为了消除这个问题,您可以从 upto(不包括)j 中减去可能的子序列数,其中 L[0]=1 意味着 upto(不包括 0),没有。 subseq 的个数为 1(空字符串有 1 个子序列)。
Here is my CODE:
Explanation :
I scan from the back of a string ie- from the last element to the first and therefore send the first
n-1
characters for further scanning in the recursion.Once
n==-1 or n<0(both are same)
, I reach on the empty string and return 1 because no. of subsequences of an empty string is 1.So, on returning back from recursion, we know that adding the current non-duplicate character to the previous string doubles the no. of subsequences. Doubling happens because now I can add this character at the end of all the previous subsequences. So,
with
andwithout
this character means double of all previous subsequences.Assuming that the current character is not a duplicate, I multiply the previous no. of subsequences with 2.
After the total no. of subsequences of the first
n-1
characters has been computed, we double them for the firstn
characters.But, suppose the character currently encountered(nth character) has already been present in the first
n-1
characters before(ie - found within the string s[0....n-1] (Note: s[n] is the current character)), then we have to subtract those no. of subsequences possible from up to (excluding) that part of s when the last time this current character was encountered and which has already been computed and stored in L['this particular character'].ie -
BACA
- for the given string, the 4thA
has already been encountered before(while returning from the recursion, we first encounterB
, thenA
, thenC
and at lastA
) and so we deduct the no. of subsequences calculated upto (excluding) the 2ndA
(which is 2 (because no. of subseq. beforeA
is 2)).So, every time we have calculated the no. of subsequences for the first
n-1
characters, we store them in the array L.Notice : L[k] store the no. of subsequences before the kth index.
I've used the visited array in order to check whether the given character that I'm currently present at has already been scanned through or not.
On encountering the current character, I update the visited array with the position of current position as
n
. This need to be done because we have to exclude the duplicate sequences.Note:
visited[]
is initialized with all -1 because the position of any character in the strings
is non-negative (0 based indexing).Summary:
How do you arrive at the number of duplicates? Let's say the last occurrence of current character at i, was at j'th position. Then, we will have duplicate subsequences: consider starting with i'th character and then all subsequences possible from [0,j-1] vs. starting at j'th character and then all subsequences possible from [0,j-1]. So, to eliminate this, you subtract the number of subsequences possible from upto (excluding) j with L[0]=1 mean that upto(excluding 0), no. of subseq are 1(empty string has 1 subsequence).