查找具有特定字符串后缀的字符串在最佳时间内?

发布于 2025-02-06 12:56:41 字数 257 浏览 1 评论 0原文

我们有一个数组“ a”字符串和一个字符串阵列“ b”。对于B中的每个字符串“ S”,我们必须在“ A”中找到具有“ S”的“ A”中的字符串数?

a≤10^51≤b≤10

^51≤1≤10

| ai |≤10^21≤10^

21≤10| bi |≤10^2

的1≤Sizesize naive方法只是通过'b'穿越,而对于每个字符串在b上迭代a以找到一个数字,但时间复杂性为n^2。我们需要一个具有更好时间复杂的解决方案。

We have an array ‘A’ of strings and an array ‘B’ of strings. For each string ‘s’ in B, we have to find the number of strings in ‘A’ which have the suffix as ‘s’?

1≤size of A ≤10^5

1≤size of B ≤10^5

1≤|Ai|≤10^2

1≤|Bi|≤10^2

The naive approach is simply traversing through 'B' and for each string in B iterate over A to find a number but it has a time complexity of N^2. We need a solution with better time complexity.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

帝王念 2025-02-13 12:56:41

构造A 前缀树基于a。在树的每个顶点中,还保留了有关其中数量“通过”的信息。

然后,对于b中的每个s,在前缀树中找到一个与s对应的顶点,然后只读来自>的多少个字符串通过它(已经存在的信息)。

将来自a的单词添加到前缀树的反转,以便您可以在后缀和前缀上操作。

时间复杂度为o(size(a) + size(b))

伪代码:

struct node
{
    node* children[ALPHABET_SIZE]
    int num_words;
}

func solve(string[] a, string[] b)
{
    node* treeRoot = new node()
    
    for (string s in a)
    {
        string x = s.reverse()

        node* currNode = treeRoot
        
        for (char ch in x)
        {
            currNode.num_words++
            currNode.children[ch] = new node()
            currNode = currNode.children[ch]
        }
        currNode.num_words++
    }

    int answer[len(b)]

    for (int i=0;i<len(b);++i)
    {
        string x = b[i].reverse()
        node* currNode = treeRoot
        bool brk = false
        
        for (char ch in x)
        {
            if (currNode.children[ch] == null)
            {
                brk = true
                break
            }
            currNode = currNode.children[ch]
        }

        if (brk)
            answer[i] = 0
        else
            answer[i] = currNode.num_words
    }

    return answer
}

编辑:
size(a)我的意思是a中的所有字符串中的总数

Construct a prefix tree based on A. In each vertex of the tree also keep information on how many strings 'pass' through it.

Then, for each s in B, find a vertex in the prefix tree that corresponds to s and just read how many strings from A passed through it (the information that is already there).

Add words from A to prefix tree reversed, so you can operate on suffixes, and not prefixes.

Time complexity is O(size(A) + size(B))

Pseudo code:

struct node
{
    node* children[ALPHABET_SIZE]
    int num_words;
}

func solve(string[] a, string[] b)
{
    node* treeRoot = new node()
    
    for (string s in a)
    {
        string x = s.reverse()

        node* currNode = treeRoot
        
        for (char ch in x)
        {
            currNode.num_words++
            currNode.children[ch] = new node()
            currNode = currNode.children[ch]
        }
        currNode.num_words++
    }

    int answer[len(b)]

    for (int i=0;i<len(b);++i)
    {
        string x = b[i].reverse()
        node* currNode = treeRoot
        bool brk = false
        
        for (char ch in x)
        {
            if (currNode.children[ch] == null)
            {
                brk = true
                break
            }
            currNode = currNode.children[ch]
        }

        if (brk)
            answer[i] = 0
        else
            answer[i] = currNode.num_words
    }

    return answer
}

EDIT:
By size(A) I mean total number of chars in all strings in A

赴月观长安 2025-02-13 12:56:41

您可以在O(n)时间中执行此操作,其中n是输入的大小(字符串数 *平均长度),而没有任何复杂的数据结构。

首先将所有字符串从后缀问题更改为前缀问题。现在,任务是在A中找到A中的字符串数,而B中的每个字符串作为前缀。

对于B中的每个字符串s,请记住这两个字符串:

  • S_start只是s。这是最高的字符串,因此所有具有s前缀的字符串都是词法&gt; = s_start
  • s_end是最小的字符串,因此所有带有s的字符串作为前缀均为&lt;发送。您可以通过增加s的最后一个字符来获得此字符串。

例如,如果S为“ ABC”,则S_START =“ ABC”和S_END =“ ABD”。如果我们有“ ABC”&lt; = X&lt; “ ABD”,然后“ ABC”是X的前缀。

现在对A进行排序,对B启动列表进行排序,然后对B端的列表进行排序。如果使用radix排序,则需要O(n)时间。

然后浏览三个列表,以便将它们合并到一个排序列表中。如果在多个列表中找到相同的字符串,请在字符串之前处理B字符串。

跟踪随着消耗的数量,并且:

  • 每当看到s_start时,
  • 请在每当看到s_end时设置start [s] = as_consumed_so_far,seled end end [s] = as_consumed_so_far

完成后,对于b,对于b,对于b,对于b的每个s, end [s] - 启动[s]是a中的字符串数,作为前缀。

You can do this in O(N) time, where N is the size of the input (number of strings * average length), and without any complicated data structures.

First reverse all the strings to change this from a suffix problem into a prefix problem. The task is now to find the number of strings in A with each string in B as a prefix.

For each string s in B, remember these two strings:

  • s_start is just s. This is the highest string such that all strings with s as a prefix are lexographically >= s_start
  • s_end is the smallest string such that all strings with s as a prefix are < s_end. You can get this string by incrementing the last character of s.

For example, if s is "abc", then s_start = "abc" and s_end = "abd". Iff we have "abc" <= x < "abd", then "abc" is a prefix of x.

Now sort A, sort the list of B starts, and sort the list of B ends. If you use a radix sort, this takes O(N) time.

Then walk through the 3 lists in order as if you were merging them into one sorted list. If you find the same string in multiple lists, process the B strings before A strings.

Keep track of the number of As you consume, and:

  • Whenever you see an s_start, set start[s] = As_consumed_so_far
  • Whenever you see an s_end, set end[s] = As_consumed_so_far

When you're done, for each s in B, end[s]-start[s] is the number of strings in A with s as a prefix.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文