较大循环字符串中的最小循环子字符串
我试图找到一种算法,可以返回较大循环字符串中最短循环子字符串的长度。
循环字符串将被定义为两个或多个相同字符串的串联,例如“abababab”或“aaaa”...
现在在给定的示例中,字符串 T =“abbcabbcabbcabbc” 存在模式“abbc”的循环" 但最短的循环子串将是“bb”。
I am trying to find an algorithm that culd return the length of the shortest cyclic sub string in a larger cyclic string.
A cyclic string would be defined as a concatenation of tow or more identicle strings, e.g. "abababab", or "aaaa"...
Now in a given for example a string T = "abbcabbcabbcabbc" there is a cycle of the pattern "abbc" but the shortest cyclic sub string would be "bb".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您只是寻找出现多次的子字符串:
构建后缀树。
创建后缀树时,可以统计每个子字符串的重复出现次数,并将其保存在节点上出现的次数。
然后只需在树上执行 BFS 搜索(将为您提供分层搜索,从较短的字符串到较长的字符串)并找到第一个大于 1 且出现多次的子字符串。
总复杂度:O(n),其中 n 是字符串的长度
编辑:
您可以实现每个节点包含一个字母的树,这将为您提供更好的粒度并允许您按长度查看所有子字符串。
这是一棵香蕉的后缀树,其中每个节点都包含一个字母,您可以看到那里拥有所有子字符串。
如果您查看applications 后缀树的部分,您会看到它正是用于此类任务 - 查找有关的内容子串。
从根部看图片,你可以看到所有的子串都是从根部开始的(BFS列表):
If you're just looking for a substring that appears more than once:
Build a Suffix tree from the string.
While creating the suffix tree, you can count re-occurrences of every substring and save it on the number of occurrences on the node.
Then just do a BFS search on the tree (which will give you a layered search, from shorter to longer strings) and find the first substring which is longer than 1 that occurred more than once.
Total complexity: O(n) where n is the length of the string
Edit:
You can implement the tree that each node contains one letter, that will give you better granularity and allow you to see all the substrings by length.
Here's a suffix tree of banana where every node contains one letter, you can see that you have all the substrings there.
If you'll look at the applications section of the suffix tree, you'll see that it is used for exactly this kind of tasks - finding stuff about substrings.
Look at the image from the root, you can see ALL the substrings start from the root (BFS list):
让我在您的示例中将生成器称为“abbc” - 即您为了获得更大的字符串而重复的字符串。
第一个观察结果是,较小的字符串应该通过重复某些子字符串两次来生成。
很明显,最小的字符串应该小于重复两次的生成器(2*生成器),因为2*生成器是循环的。
现在请注意,在搜索较小的循环字符串时,您只需要考虑使用生成器 3 次获得的字符串。事实上,如果最小的不存在,但它在 4* 生成器中,那么它必须跨越至少两个生成器,但它不会是最小的。
现在假设较大的字符串是 3*generator(或 2*generator)。
而且很明显,如果生成器只有不同的数字,那么答案是 2*生成器。如果不是,那么您只需要在较大的字符串(例如位置 i 和 j)中找到所有相同字符对,并检查以 ai 开头(长度为 2*(ji))的字符串是否是循环的。如果你按照ji增加的顺序尝试,那么你可以在第一次成功后停止。
Let me call "abbc" the generator in your example - i.e. the string that you repeat in order to get the bigger string.
The very first observation is that the smaller string should be made by repeating some substring twice.
It's clear that the smallest string should be smaller than the generator repeated twice (2*generator), because 2*generator is cyclic.
Now note that you only need to consider the string obtained by taking the generator 3 times, when searching for smaller cyclic string. Indeed, if the smallest is not there, but it is in the 4*generator, then it must span at least two generators, but then it wouldn't be the smallest.
So now lets assume the bigger string is 3*generator (or 2*generator).
Also it's clear that if the generator has only different digits, then the answer is 2*generator. If not then you just need to find all pairs of identical characters in the bigger string say at position i and j and check whether the string starting a i, which is 2*(j-i) long is cyclic. If you try them in order of increasing j-i, then you can stop after the first success.