使用正则表达式在 JavaScript 中查找最长重复子字符串
我想找到字符串中最长的重复字符串,用 JavaScript 实现并使用基于正则表达式的方法。
我有一个 PHP 实现,当直接移植到 JavaScript 时,它不起作用。
PHP 实现取自问题“查找最长重复字符串?” 的答案
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);
:将使用最长的重复子字符串填充 $matches[0][X]
(其中 X
是 $matches[0]
的长度)可以在 $input
中找到。我已经用许多输入字符串对此进行了测试,并且确信输出是正确的。
JavaScript 中最接近的直接端口是:
var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);
这不会给出正确的结果
input Excepted result matches[0][X] ====================================================== inputinput input input 7inputinput input input inputinput7 input input 7inputinput7 input 7 XXinputinputYY input XX
我对正则表达式不够熟悉,无法理解此处使用的正则表达式的作用。
我当然可以实现一些算法来找到最长的重复子串。在尝试这样做之前,我希望不同的正则表达式能够在 JavaScript 中产生正确的结果。
是否可以修改上述正则表达式,以便在 JavaScript 中返回预期的输出?我承认这在一句台词中可能是不可能的。
I'd like to find the longest repeating string within a string, implemented in JavaScript and using a regular-expression based approach.
I have an PHP implementation that, when directly ported to JavaScript, doesn't work.
The PHP implementation is taken from an answer to the question "Find longest repeating strings?":
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);
This will populate $matches[0][X]
(where X
is the length of $matches[0]
) with the longest repeating substring to be found in $input
. I have tested this with many input strings and found am confident the output is correct.
The closest direct port in JavaScript is:
var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);
This doesn't give correct results
input Excepted result matches[0][X] ====================================================== inputinput input input 7inputinput input input inputinput7 input input 7inputinput7 input 7 XXinputinputYY input XX
I'm not familiar enough with regular expressions to understand what the regular expression used here is doing.
There are certainly algorithms I could implement to find the longest repeating substring. Before I attempt to do that, I'm hoping a different regular expression will produce the correct results in JavaScript.
Can the above regular expression be modified such that the expected output is returned in JavaScript? I accept that this may not be possible in a one-liner.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Javascript 匹配仅返回第一个匹配项——您必须循环才能找到多个结果。一些测试表明这得到了预期的结果:
请注意,对于“xxabcdyy”,您只能返回“x”,因为它返回最大长度的第一个字符串。
Javascript matches only return the first match -- you have to loop in order to find multiple results. A little testing shows this gets the expected results:
Note that for "xxabcdyy" you only get "x" back, as it returns the first string of maximum length.
JS 正则表达式似乎有点奇怪。我没有完整的答案,但这是我发现的。
虽然我认为他们做了同样的事情 re.exec() 和 "string".match(re) 的行为不同。 Exec 似乎只返回它找到的第一个匹配项,而 match 似乎返回所有匹配项(在两种情况下都使用 /g)。
另一方面, exec 似乎可以与正则表达式中的 ?= 一起正常工作,而 match 返回所有空字符串。删除 ?= 留给我们
使用
返回
而
返回
所以至少在匹配时你得到 inputinput 作为你的值之一。显然,这既没有拉出最长的,也没有消除冗余,但也许它仍然有帮助。
另一件事,我在 firebug 的控制台中进行了测试,它抛出了一个关于不支持 $1 的错误,所以也许 $ 变量中有一些值得一看的东西。
It seems JS regexes are a bit weird. I don't have a complete answer, but here's what I found.
Although I thought they did the same thing re.exec() and "string".match(re) behave differently. Exec seems to only return the first match it finds, whereas match seems to return all of them (using /g in both cases).
On the other hand, exec seems to work correctly with ?= in the regex whereas match returns all empty strings. Removing the ?= leaves us with
Using that
returns
whereas
returns
So at least with match you get inputinput as one of your values. Obviously, this neither pulls out the longest, nor removes the redundancy, but maybe it helps nonetheless.
One other thing, I tested in firebug's console which threw an error about not supporting $1, so maybe there's something in the $ vars worth looking at.