使用正则表达式在 JavaScript 中查找最长重复子字符串

发布于 2024-09-26 11:22:54 字数 1145 浏览 2 评论 0原文

我想找到字符串中最长的重复字符串,用 JavaScript 实现并使用基于正则表达式的方法。

我有一个 PHP 实现,当直接移植到 JavaScript 时,它不起作用。

PHP 实现取自问题“查找最长重复字符串?” 的答案

preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);

:将使用最长的重复子字符串填充 $matches[0][X](其中 X$matches[0] 的长度)可以在 $input 中找到。我已经用许多输入字符串对此进行了测试,并且确信输出是正确的。

JavaScript 中最接近的直接端口是:

var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);

这不会给出正确的结果

input                  Excepted result   matches[0][X]
======================================================
inputinput             input             input
7inputinput            input             input
inputinput7            input             input
7inputinput7           input             7
XXinputinputYY         input             XX

我对正则表达式不够熟悉,无法理解此处使用的正则表达式的作用。

我当然可以实现一些算法来找到最长的重复子串。在尝试这样做之前,我希望不同的正则表达式能够在 JavaScript 中产生正确的结果。

是否可以修改上述正则表达式,以便在 JavaScript 中返回预期的输出?我承认这在一句台词中可能是不可能的。

I'd like to find the longest repeating string within a string, implemented in JavaScript and using a regular-expression based approach.

I have an PHP implementation that, when directly ported to JavaScript, doesn't work.

The PHP implementation is taken from an answer to the question "Find longest repeating strings?":

preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);

This will populate $matches[0][X] (where X is the length of $matches[0]) with the longest repeating substring to be found in $input. I have tested this with many input strings and found am confident the output is correct.

The closest direct port in JavaScript is:

var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);

This doesn't give correct results

input                  Excepted result   matches[0][X]
======================================================
inputinput             input             input
7inputinput            input             input
inputinput7            input             input
7inputinput7           input             7
XXinputinputYY         input             XX

I'm not familiar enough with regular expressions to understand what the regular expression used here is doing.

There are certainly algorithms I could implement to find the longest repeating substring. Before I attempt to do that, I'm hoping a different regular expression will produce the correct results in JavaScript.

Can the above regular expression be modified such that the expected output is returned in JavaScript? I accept that this may not be possible in a one-liner.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蛮可爱 2024-10-03 11:22:54

Javascript 匹配仅返回第一个匹配项——您必须循环才能找到多个结果。一些测试表明这得到了预期的结果:

function maxRepeat(input) {
 var reg = /(?=((.+)(?:.*?\2)+))/g;
 var sub = ""; //somewhere to stick temp results
 var maxstr = ""; // our maximum length repeated string
 reg.lastIndex = 0; // because reg previously existed, we may need to reset this
 sub = reg.exec(input); // find the first repeated string
 while (!(sub == null)){
  if ((!(sub == null)) && (sub[2].length > maxstr.length)){
   maxstr = sub[2];
  }
  sub = reg.exec(input);
  reg.lastIndex++; // start searching from the next position
 }
 return maxstr;
}

// I'm logging to console for convenience
console.log(maxRepeat("aabcd"));             //aa
console.log(maxRepeat("inputinput"));        //input
console.log(maxRepeat("7inputinput"));       //input
console.log(maxRepeat("inputinput7"));       //input
console.log(maxRepeat("7inputinput7"));      //input
console.log(maxRepeat("xxabcdyy"));          //x
console.log(maxRepeat("XXinputinputYY"));    //input

请注意,对于“xxabcdyy”,您只能返回“x”,因为它返回最大长度的第一个字符串。

Javascript matches only return the first match -- you have to loop in order to find multiple results. A little testing shows this gets the expected results:

function maxRepeat(input) {
 var reg = /(?=((.+)(?:.*?\2)+))/g;
 var sub = ""; //somewhere to stick temp results
 var maxstr = ""; // our maximum length repeated string
 reg.lastIndex = 0; // because reg previously existed, we may need to reset this
 sub = reg.exec(input); // find the first repeated string
 while (!(sub == null)){
  if ((!(sub == null)) && (sub[2].length > maxstr.length)){
   maxstr = sub[2];
  }
  sub = reg.exec(input);
  reg.lastIndex++; // start searching from the next position
 }
 return maxstr;
}

// I'm logging to console for convenience
console.log(maxRepeat("aabcd"));             //aa
console.log(maxRepeat("inputinput"));        //input
console.log(maxRepeat("7inputinput"));       //input
console.log(maxRepeat("inputinput7"));       //input
console.log(maxRepeat("7inputinput7"));      //input
console.log(maxRepeat("xxabcdyy"));          //x
console.log(maxRepeat("XXinputinputYY"));    //input

Note that for "xxabcdyy" you only get "x" back, as it returns the first string of maximum length.

颜漓半夏 2024-10-03 11:22:54

JS 正则表达式似乎有点奇怪。我没有完整的答案,但这是我发现的。

虽然我认为他们做了同样的事情 re.exec() 和 "string".match(re) 的行为不同。 Exec 似乎只返回它找到的第一个匹配项,而 match 似乎返回所有匹配项(在两种情况下都使用 /g)。

另一方面, exec 似乎可以与正则表达式中的 ?= 一起正常工作,而 match 返回所有空字符串。删除 ?= 留给我们

re = /((.+)(?:.*?\2)+)/g

使用

"XXinputinputYY".match(re);

返回

["XX", "inputinput", "YY"]

re.exec("XXinputinputYY");

返回

["XX", "XX", "X"]

所以至少在匹配时你得到 inputinput 作为你的值之一。显然,这既没有拉出最长的,也没有消除冗余,但也许它仍然有帮助。

另一件事,我在 firebug 的控制台中进行了测试,它抛出了一个关于不支持 $1 的错误,所以也许 $ 变量中有一些值得一看的东西。

It seems JS regexes are a bit weird. I don't have a complete answer, but here's what I found.

Although I thought they did the same thing re.exec() and "string".match(re) behave differently. Exec seems to only return the first match it finds, whereas match seems to return all of them (using /g in both cases).

On the other hand, exec seems to work correctly with ?= in the regex whereas match returns all empty strings. Removing the ?= leaves us with

re = /((.+)(?:.*?\2)+)/g

Using that

"XXinputinputYY".match(re);

returns

["XX", "inputinput", "YY"]

whereas

re.exec("XXinputinputYY");

returns

["XX", "XX", "X"]

So at least with match you get inputinput as one of your values. Obviously, this neither pulls out the longest, nor removes the redundancy, but maybe it helps nonetheless.

One other thing, I tested in firebug's console which threw an error about not supporting $1, so maybe there's something in the $ vars worth looking at.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文