从子字符串列表中获取字符串
我在一次编程竞赛(facebook 招聘)中要求解决以下问题
输入:子字符串列表:
{"bar","foo","hi"} //from 1 to 100 sub-strings
和句子:
"hellothisisfoobarhi" //from 1 to 1000000 character
输出:12 //句子中第一个匹配项的索引 (foo)
另一个例子是:
sub -strings:{"hi","hi"}
句子:"hiJonIamSayinghihiforYou"
output:15 // hi的索引,第二个'hi'因为第一个'hi'只是一个技巧,子句子是'hi' hi"
再一个 ex:
sub-strings:{"hi","foo"}
句子 :"sayingfoohi"
输出:6 // 顺序并不重要,它们只需要彼此相邻
谁知道这个问题有好的算法吗?
I asked to solve the following problem in a programming competition (facebook recruitment)
Input: list of sub-strings:
{"bar","foo","hi"} //from 1 to 100 sub-strings
and the sentence:
"hellothisisfoobarhi" //from 1 to 1000000 character
Output: 12 //the index of the first match in the sentence (foo)
another example would be :
sub-strings:{"hi","hi"}
sentence :"hiJonIamSayinghihiforYou"
output:15 // the index of hi,the second 'hi' because the first 'hi' is just a trick,the sub-sentece is 'hi' hi"
one more ex:
sub-strings:{"hi","foo"}
sentence :"sayingfoohi"
output:6 // the order doesn't matter ,they just need to be beside each others
Who knows a good algorithm for this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
构造大字符串的后缀树——树的构造是 O(n),其中 n 是大字符串的长度。
现在,只需沿着子字符串向下穿过树,您就可以在 O(m) 时间内找到任何子字符串的位置(其中 m 是子字符串的长度)——子字符串结束的节点或叶子将保存key 对应于大字符串的索引。
遍历子字符串集,找到它们在大字符串中的位置,并跟踪最小索引。
Construct a suffix tree of the large string -- the construction of the tree is O(n) where n is the length of the large string.
Now you can find the location of any of the substrings in O(m) time (where m is the length of the substring) by simply following the the substring down through the tree -- the node or leaf where the substring ends will hold the key corresponding to the index into the large string.
Go through the set of substrings finding their location in the big string, keeping track of the minimum index.
另一种方法 - 专注于子字符串而不是主字符串 - 将是 http:// en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
Another way - concentrating on the substrings rather than the main string - would be http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm