当前位置：文江博客话题详情

如何查找字符串中不同子序列的数量？

发布于 2024-10-19 13:52:18 字数 284 浏览 4 评论 0原文

这是另一个 spoj 问题，询问如何找到字符串中不同子序列的数量？

例如，

输入
AAA
ABCDEFG
编解码器
输出
4
128
496

我该如何解决这个问题？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羁绊已千年 2024-10-26 13:52:18

这是一个经典的动态规划问题。

设：

dp[i] = number of distinct subsequences ending with a[i]
sum[i] = dp[1] + dp[2] + ... + dp[i]. So sum[n] will be your answer.
last[i] = last position of character i in the given string.

空字符串有一个子序列，因此dp[0] = 1。

read a
n = strlen(a)
for i = 1 to n
  dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
  sum[i] = sum[i - 1] + dp[i]
  last[a[i]] = i

return sum[n]

解释

dp[i] = sum[i - 1] - sum[last[a[i]] - 1]

最初，我们假设我们可以将 a[i] 附加到以先前字符结尾的所有子序列，但这可能违反计数子序列必须不同的条件。请记住，last[a[i]] 为我们提供了 a[i] 迄今为止出现的最后位置。我们唯一过量计算的子序列是那些附加了之前的 a[i] 的子序列，因此我们将它们减去。

sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i

根据定义更新这些值。

如果您的索引从 0 开始，请在我使用 a[i] 的地方使用 a[i - 1]。如果您要提交代码，还请记住将计算包装在 mod 函数中。应该这样实现：

mod(x) = (x % m + m) % m

为了正确处理某些语言（例如 C/C++）中的负值。

It's a classic dynamic programming problem.

Let:

dp[i] = number of distinct subsequences ending with a[i]
sum[i] = dp[1] + dp[2] + ... + dp[i]. So sum[n] will be your answer.
last[i] = last position of character i in the given string.

A null string has one subsequence, so dp[0] = 1.

read a
n = strlen(a)
for i = 1 to n
  dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
  sum[i] = sum[i - 1] + dp[i]
  last[a[i]] = i

return sum[n]

Explanation

dp[i] = sum[i - 1] - sum[last[a[i]] - 1]

Initially, we assume we can append a[i] to all subsequences ending on previous characters, but this might violate the condition that the counted subsequences need to be distinct. Remember that last[a[i]] gives us the last position a[i] appeared on until now. The only subsequences we overcount are those that the previous a[i] was appended to, so we subtract those.

sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i

Update these values as per their definition.

If your indexing starts from 0, use a[i - 1] wherever I used a[i]. Also remember to wrap your computations in a mod function if you're going to submit code. This should be implemented like this:

mod(x) = (x % m + m) % m

In order to correctly handle negative values in some languages (such as C/C++).

回复收藏 0 原文

伤痕我心 2024-10-26 13:52:18

对于这个问题有一个更简单的解决方案。

这个想法是：如果字符串中的所有字符都不同，则子序列的总数为2^n。现在，如果我们发现之前已经出现过的任何字符，我们应该只考虑它的最后一次出现（否则顺序不会不同）。因此，我们必须减去由于其先前出现而产生的子序列的数量。

我的实现是这样的：

read s
dp[0] = 1
len = strlen(s)
last[s.length()] = {-1} //declaring `last` array with same as length of string `s` and all elements initialized with -1.

for (i = 1; i <= len; i++) 
{
    dp[i] = (dp[i - 1] * 2)
    if (last[s[i]] > 0) dp[i] = (dp[i] - dp[last[s[i]] - 1])
    last[s[i]] = i
}

There exists an easier solution to this problem.

The idea is: If all character of the string are distinct, total number of subsequences is 2^n. Now, if we find any character that have already occurred before, we should consider its last occurrence only (otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to its previous occurrence.

My implementation is like this:

read s
dp[0] = 1
len = strlen(s)
last[s.length()] = {-1} //declaring `last` array with same as length of string `s` and all elements initialized with -1.

for (i = 1; i <= len; i++) 
{
    dp[i] = (dp[i - 1] * 2)
    if (last[s[i]] > 0) dp[i] = (dp[i] - dp[last[s[i]] - 1])
    last[s[i]] = i
}

回复收藏 0 原文

神也荒唐 2024-10-26 13:52:18

这是我的代码：

#include<iostream>
typedef long long ll;

ll fun(std::string s,ll visited[256],ll n,ll L[]){  

    ll ans=0;
    if(n<0){
        return 1;
    }
    //std::cout<<s.substr(0,n+1)<<" "<<n<<endl;

    ans=fun(s,visited,n-1,L);
    L[n]=ans;
    ans=ans*2;
    if(visited[int(s[n])]>=0){
        ans -= L[visited[int(s[n])]];
    }
    visited[int(s[n])]=n;

    return ans;

}
int main(){

    std::string s;
    std::cin>>s;
    ll n=s.length();

    ll visited[256];
    ll L[n];
    memset(visited,-1,sizeof(visited));
    memset(L,-1,sizeof(L));

    std::cout<<fun(s,visited,n-1,L);

    return 0;
}

解释：

我从字符串的后面扫描，即从最后一个元素到第一个元素，因此发送第一个n-1< /code> 字符用于在递归中进一步扫描。

一旦n==-1或n<0（两者相同），我到达空字符串并返回1，因为没有。空字符串的子序列是 1。

因此，在从递归返回时，我们知道将当前的非重复字符添加到前一个字符串会使数字加倍。的子序列。发生加倍是因为现在我可以在所有先前子序列的末尾添加这个字符。因此，with 和 without 这个字符意味着所有先前子序列的两倍。

假设当前字符不是重复的，我将乘以前一个字符。子序列与 2.

在总数之后。前 n-1 字符的子序列已计算完毕，我们将前 n 字符的子序列加倍。

但是，假设当前遇到的字符（第 n 个字符）已经存在于之前的前 n-1 个字符中（即 - 在字符串 s[0....n-1] 中找到（注意: s[n] 是当前字符）），然后我们必须减去那些数字。可能的子序列的数量，最多（不包括）上次遇到当前字符时 s 的部分，并且该部分已被计算并存储在 L['此特定字符'] 中。

即 - BACA - 对于给定的字符串，之前已经遇到过第 4 个 A（从递归返回时，我们首先遇到 B，然后是A，然后是C，最后是A），所以我们扣除了编号。计算到（不包括）第二个 A（为 2（因为 A 之前的子序列数量为 2））的子序列数。

所以，每次我们都计算出没有。前n-1个字符的子序列，我们将它们存储在数组L中。

注意：L[k]存储第n个字符。第 k 个索引之前的子序列。

我使用了访问数组来检查我当前所在的给定字符是否已经被扫描过。

遇到当前字符时，我将当前位置的位置更新为 n 访问的数组。需要这样做是因为我们必须排除重复的序列。

注意：visited[]初始化为全-1，因为字符串s中任何字符的位置都是非负的（基于0的索引）。

摘要：

如何得出重复项的数量？假设当前字符在 i 处的最后一次出现是在第 j 个位置。然后，我们将有重复的子序列：考虑从第 i 个字符开始，然后是 [0,j-1] 中可能的所有子序列，而不是从第 j 个字符开始，然后是 [0,j-1] 中可能的所有子序列。因此，为了消除这个问题，您可以从 upto（不包括）j 中减去可能的子序列数，其中 L[0]=1 意味着 upto（不包括 0），没有。 subseq 的个数为 1（空字符串有 1 个子序列）。

Here is my CODE:

#include<iostream>
typedef long long ll;

ll fun(std::string s,ll visited[256],ll n,ll L[]){  

    ll ans=0;
    if(n<0){
        return 1;
    }
    //std::cout<<s.substr(0,n+1)<<" "<<n<<endl;

    ans=fun(s,visited,n-1,L);
    L[n]=ans;
    ans=ans*2;
    if(visited[int(s[n])]>=0){
        ans -= L[visited[int(s[n])]];
    }
    visited[int(s[n])]=n;

    return ans;

}
int main(){

    std::string s;
    std::cin>>s;
    ll n=s.length();

    ll visited[256];
    ll L[n];
    memset(visited,-1,sizeof(visited));
    memset(L,-1,sizeof(L));

    std::cout<<fun(s,visited,n-1,L);

    return 0;
}

Explanation :

I scan from the back of a string ie- from the last element to the first and therefore send the first n-1 characters for further scanning in the recursion.

Once n==-1 or n<0(both are same), I reach on the empty string and return 1 because no. of subsequences of an empty string is 1.

So, on returning back from recursion, we know that adding the current non-duplicate character to the previous string doubles the no. of subsequences. Doubling happens because now I can add this character at the end of all the previous subsequences. So, with and without this character means double of all previous subsequences.

Assuming that the current character is not a duplicate, I multiply the previous no. of subsequences with 2.

After the total no. of subsequences of the first n-1 characters has been computed, we double them for the first n characters.

But, suppose the character currently encountered(nth character) has already been present in the first n-1 characters before(ie - found within the string s[0....n-1] (Note: s[n] is the current character)), then we have to subtract those no. of subsequences possible from up to (excluding) that part of s when the last time this current character was encountered and which has already been computed and stored in L['this particular character'].

ie - BACA - for the given string, the 4th A has already been encountered before(while returning from the recursion, we first encounter B, then A, then C and at last A) and so we deduct the no. of subsequences calculated upto (excluding) the 2nd A(which is 2 (because no. of subseq. before A is 2)).

So, every time we have calculated the no. of subsequences for the first n-1 characters, we store them in the array L.

Notice : L[k] store the no. of subsequences before the kth index.

I've used the visited array in order to check whether the given character that I'm currently present at has already been scanned through or not.

On encountering the current character, I update the visited array with the position of current position as n. This need to be done because we have to exclude the duplicate sequences.

Note: visited[] is initialized with all -1 because the position of any character in the string s is non-negative (0 based indexing).

Summary:

How do you arrive at the number of duplicates? Let's say the last occurrence of current character at i, was at j'th position. Then, we will have duplicate subsequences: consider starting with i'th character and then all subsequences possible from [0,j-1] vs. starting at j'th character and then all subsequences possible from [0,j-1]. So, to eliminate this, you subtract the number of subsequences possible from upto (excluding) j with L[0]=1 mean that upto(excluding 0), no. of subseq are 1(empty string has 1 subsequence).

回复收藏 0 原文

凤舞天涯 2024-10-26 13:52:18

///i get wa 
int finding_dist_subs(int len,char data[])
{
  dp[0]=1;
  for(int i=1;i<len;i++)
  {
      dp[i]=(dp[i-1]*2+1)%1000000007;
      for(int j=i-1;j>=0;j--)
      {
          if(data[i]==data[j])
          {
              if(j!=0)
           dp[i]=(dp[i]-(dp[j-1])-1)%1000000007;
           else dp[i]=(dp[i]-1)%1000000007;

            break;
          }
      }
  }
  return dp[len-1];
}

///i get wa 
int finding_dist_subs(int len,char data[])
{
  dp[0]=1;
  for(int i=1;i<len;i++)
  {
      dp[i]=(dp[i-1]*2+1)%1000000007;
      for(int j=i-1;j>=0;j--)
      {
          if(data[i]==data[j])
          {
              if(j!=0)
           dp[i]=(dp[i]-(dp[j-1])-1)%1000000007;
           else dp[i]=(dp[i]-1)%1000000007;

            break;
          }
      }
  }
  return dp[len-1];
}

回复收藏 0 原文

~没有更多了~