所有后缀的最长前缀字符串长度

发布于 2024-12-25 00:16:04 字数 1384 浏览 0 评论 0原文

查找字符串所有后缀的最长前缀字符串的长度。

例如,字符串 ababaa 的后缀为 ababaababaaabaabaaaaa。这些字符串中的每一个与字符串“ababaa”的相似度分别为6,0,3,0,1,1。因此,答案是 6 + 0 + 3 + 0 + 1 + 1 = 11。

我编写了以下代码,

#include <iostream>
#include <string.h>

#include <stdio.h>
#include <time.h>

int main ( int argc, char **argv) {
    size_t T;
    std::cin >> T;
    char input[100000];
    for ( register size_t i = 0; i < T; ++i) {
        std::cin >> input;

        double t = clock();

        size_t len    = strlen(input);
        char *left    = input;
        char *right   = input + len - 1;
        long long sol = 0;
        int end_count = 1;
        while ( left < right ) {
            if ( *right != '\0') {
                if ( *left++ == *right++ ) {
                    sol++;
                    continue;
                }
            }
            end_count++;
            left = input; // reset the left pointer
            right = input + len - end_count; // set right to one left.
        }
        std::cout << sol + len << std::endl;
        printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
    }
}

工作正常,但对于 100000 长且具有相同字符的字符串,即 aaaaaaaaaa。 ......a,花了很长时间,我该如何进一步优化这个。

Find the length of the longest prefix string for all the suffixes of the string.

For example suffixes of the string ababaa are ababaa, babaa, abaa, baa, aa and a. The similarities of each of these strings with the string "ababaa" are 6,0,3,0,1,1 respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.

I wrote following code

#include <iostream>
#include <string.h>

#include <stdio.h>
#include <time.h>

int main ( int argc, char **argv) {
    size_t T;
    std::cin >> T;
    char input[100000];
    for ( register size_t i = 0; i < T; ++i) {
        std::cin >> input;

        double t = clock();

        size_t len    = strlen(input);
        char *left    = input;
        char *right   = input + len - 1;
        long long sol = 0;
        int end_count = 1;
        while ( left < right ) {
            if ( *right != '\0') {
                if ( *left++ == *right++ ) {
                    sol++;
                    continue;
                }
            }
            end_count++;
            left = input; // reset the left pointer
            right = input + len - end_count; // set right to one left.
        }
        std::cout << sol + len << std::endl;
        printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
    }
}

Working fine, but for a string which is 100000 long and having same character i.e. aaaaaaaaaa.......a, it is taking long time , how can i optimize this one more.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

神妖 2025-01-01 00:16:04

您可以使用后缀数组: http://en.wikipedia.org/wiki/Suffix_array

凉薄对峙 2025-01-01 00:16:04

假设您的 ababaa 是一个模式 P。
我认为您可以使用以下算法:

  1. 为 P 的所有可能后缀创建一个后缀自动机。
  2. 使用 P 作为输入遍历自动机,计算到目前为止遍历的边数。对于自动机的每个接受状态,将当前边数添加到总和中。遍历自动机,直到到达输入的末尾或没有更多的边可以通过。
  3. 总和就是结果。

Let's say your ababaa is a pattern P.
I think you could use the following algorithm:

  1. Create a suffix automata for all possible suffixes of P.
  2. Walk the automata using P as input, count edges traversed so far. For each accepting state of the automata add the current edge count to total sum. Walk the automata until you either reach the end of the input or there are no more edges to go through.
  3. The total sum is the result.
疑心病 2025-01-01 00:16:04

使用Z算法计算所有子字符串的长度,其前缀也是O(n),然后扫描结果数组并将其值求和。

参考:https://www .geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/

Use Z algorithm to calculate length of all substrings, which also prefixes in O(n) and then scan resulting array and sum its values.

Reference: https://www.geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/

如何视而不见 2025-01-01 00:16:04

据我所知,您正在使用普通数组来评估后缀,尽管它可能对某些数据集有效,但对于某些情况(例如您提到的情况)可能无效。

您需要实现一个 Prefix-TreeTrie 之类的数据结构。这些代码并不简单,所以如果您不熟悉它们,我建议您阅读一些有关它们的内容。

From what I see, you are using plain array to evaluate the suffix and though it may turn out to be efficient for some data set, it would fail to be efficient for some cases, such as the one you mentioned.

You would need to implement a Prefix-Tree or Trie like Data Structure. The code for those aren't straightforward, so if you are not familiar with them, I would suggest you read a little bit about them.

So要识趣 2025-01-01 00:16:04

我不确定 Trie 是否会给你带来很大的性能提升..但我肯定会考虑它。

我的另一个想法是尝试压缩你的字符串。我并没有真正考虑过它,只是一个疯狂的想法......

如果你有一个像这样的字符串:ababaa 将其压缩为:abab2a。然后你必须想出一种技术,可以将你的算法用于这些字符串。优点是您可以有效地相互比较长字符串 100000a。或者更重要的是:您可以非常快速地计算总和。

但我又没有想清楚,也许这是一个非常糟糕的主意;)

I'm not sure whether a Trie gives you much performance gain.. but I would certainly think about it.

The other idea I had is to try to compress your string. I didn't really think about it, just a crazy idea...

if you have a string like this: ababaa compress it maybe to: abab2a. Then you have to come up with a technique where you can use your algorithm with those strings. The advantage is you can then compare long strings 100000a efficiently with each other. Or more importantly: you can calculate your sum very fast.

But again, I didn't think it through, maybe this is a very bad idea ;)

¢蛋碎的人ぎ生 2025-01-01 00:16:04

这里有一个java实现:

        // sprefix
        String s = "abababa";
        Vector<Integer>[] v = new Vector[s.length()];
        int sPrefix = s.length();
        v[0] = new Vector<Integer>();
        v[0].add(new Integer(0));
        for(int j = 1; j < s.length(); j++)
        {
            v[j] = new Vector<Integer>();
            v[j].add(new Integer(0));
            for(int k = 0; k < v[j - 1].size(); k++)
                if(s.charAt(j) == s.charAt(v[j - 1].get(k)))
                {
                    v[j].add(v[j - 1].get(k) + 1);
                    v[j - 1].set(k, 0);
                }
        }

        for(int j = 0; j < v.length; j++)
            for(int k = 0; k < v[j].size(); k++)
                sPrefix += v[j].get(k);

        System.out.println("Result = " + sPrefix);

Here a java implementation:

        // sprefix
        String s = "abababa";
        Vector<Integer>[] v = new Vector[s.length()];
        int sPrefix = s.length();
        v[0] = new Vector<Integer>();
        v[0].add(new Integer(0));
        for(int j = 1; j < s.length(); j++)
        {
            v[j] = new Vector<Integer>();
            v[j].add(new Integer(0));
            for(int k = 0; k < v[j - 1].size(); k++)
                if(s.charAt(j) == s.charAt(v[j - 1].get(k)))
                {
                    v[j].add(v[j - 1].get(k) + 1);
                    v[j - 1].set(k, 0);
                }
        }

        for(int j = 0; j < v.length; j++)
            for(int k = 0; k < v[j].size(); k++)
                sPrefix += v[j].get(k);

        System.out.println("Result = " + sPrefix);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文