所有后缀的最长前缀字符串长度
查找字符串所有后缀的最长前缀字符串的长度。
例如,字符串 ababaa
的后缀为 ababaa
、babaa
、abaa
、baa
、aa
和 a
。这些字符串中的每一个与字符串“ababaa”的相似度分别为6,0,3,0,1,1。因此,答案是 6 + 0 + 3 + 0 + 1 + 1 = 11。
我编写了以下代码,
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <time.h>
int main ( int argc, char **argv) {
size_t T;
std::cin >> T;
char input[100000];
for ( register size_t i = 0; i < T; ++i) {
std::cin >> input;
double t = clock();
size_t len = strlen(input);
char *left = input;
char *right = input + len - 1;
long long sol = 0;
int end_count = 1;
while ( left < right ) {
if ( *right != '\0') {
if ( *left++ == *right++ ) {
sol++;
continue;
}
}
end_count++;
left = input; // reset the left pointer
right = input + len - end_count; // set right to one left.
}
std::cout << sol + len << std::endl;
printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
}
}
工作正常,但对于 100000
长且具有相同字符的字符串,即 aaaaaaaaaa。 ......a
,花了很长时间,我该如何进一步优化这个。
Find the length of the longest prefix string for all the suffixes of the string.
For example suffixes of the string ababaa
are ababaa
, babaa
, abaa
, baa
, aa
and a
. The similarities of each of these strings with the string "ababaa" are 6,0,3,0,1,1 respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.
I wrote following code
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <time.h>
int main ( int argc, char **argv) {
size_t T;
std::cin >> T;
char input[100000];
for ( register size_t i = 0; i < T; ++i) {
std::cin >> input;
double t = clock();
size_t len = strlen(input);
char *left = input;
char *right = input + len - 1;
long long sol = 0;
int end_count = 1;
while ( left < right ) {
if ( *right != '\0') {
if ( *left++ == *right++ ) {
sol++;
continue;
}
}
end_count++;
left = input; // reset the left pointer
right = input + len - end_count; // set right to one left.
}
std::cout << sol + len << std::endl;
printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
}
}
Working fine, but for a string which is 100000
long and having same character i.e. aaaaaaaaaa.......a
, it is taking long time , how can i optimize this one more.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以使用后缀数组: http://en.wikipedia.org/wiki/Suffix_array
You can use Suffix Array: http://en.wikipedia.org/wiki/Suffix_array
假设您的
ababaa
是一个模式 P。我认为您可以使用以下算法:
Let's say your
ababaa
is a pattern P.I think you could use the following algorithm:
使用
Z
算法计算所有子字符串的长度,其前缀也是O(n)
,然后扫描结果数组并将其值求和。参考:https://www .geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/
Use
Z
algorithm to calculate length of all substrings, which also prefixes inO(n)
and then scan resulting array and sum its values.Reference: https://www.geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/
据我所知,您正在使用普通数组来评估后缀,尽管它可能对某些数据集有效,但对于某些情况(例如您提到的情况)可能无效。
您需要实现一个 Prefix-Tree 或 Trie 之类的数据结构。这些代码并不简单,所以如果您不熟悉它们,我建议您阅读一些有关它们的内容。
From what I see, you are using plain array to evaluate the suffix and though it may turn out to be efficient for some data set, it would fail to be efficient for some cases, such as the one you mentioned.
You would need to implement a Prefix-Tree or Trie like Data Structure. The code for those aren't straightforward, so if you are not familiar with them, I would suggest you read a little bit about them.
我不确定 Trie 是否会给你带来很大的性能提升..但我肯定会考虑它。
我的另一个想法是尝试压缩你的字符串。我并没有真正考虑过它,只是一个疯狂的想法......
如果你有一个像这样的字符串:
ababaa
将其压缩为:abab2a
。然后你必须想出一种技术,可以将你的算法用于这些字符串。优点是您可以有效地相互比较长字符串100000a
。或者更重要的是:您可以非常快速地计算总和。但我又没有想清楚,也许这是一个非常糟糕的主意;)
I'm not sure whether a Trie gives you much performance gain.. but I would certainly think about it.
The other idea I had is to try to compress your string. I didn't really think about it, just a crazy idea...
if you have a string like this:
ababaa
compress it maybe to:abab2a
. Then you have to come up with a technique where you can use your algorithm with those strings. The advantage is you can then compare long strings100000a
efficiently with each other. Or more importantly: you can calculate your sum very fast.But again, I didn't think it through, maybe this is a very bad idea ;)
这里有一个java实现:
Here a java implementation: