正则表达式非贪婪是贪婪

发布于 2024-07-08 05:45:07 字数 240 浏览 21 评论 0原文

我有以下文本

tooooooooooooon

根据我正在阅读的这本书,当 ? 跟在任何量词之后时,它就会变得非贪婪。

我的正则表达式 to*?n 仍然返回 tooooooooooooon

它应该返回 ton 不是吗?

知道为什么吗?

I have the following text

tooooooooooooon

According to this book I'm reading, when the ? follows after any quantifier, it becomes non greedy.

My regex to*?n is still returning tooooooooooooon.

It should return ton shouldn't it?

Any idea why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

雾里花 2024-07-15 05:45:07

正则表达式只能匹配实际存在的文本片段。

由于子字符串“ton”不存在于字符串中的任何位置,因此它不可能是匹配结果。 匹配只会返回原始字符串的子字符串

编辑:要清楚,如果您使用下面的字符串,带有额外的“n”,则

toooooooonoooooon

此正则表达式(不指定“o”)

t.*n

将匹配以下内容(尽可能多 ) “n”之前尽可能少的字符),

toooooooonoooooon

但正则表达式

t.*?n

只会匹配以下内容(“n”之前尽可能少的字符)

toooooooon

A regular expression can only match a fragment of text that actually exists.

Because the substring 'ton' doesn't exist anywhere in your string, it can't be the result of a match. A match will only return a substring of the original string

EDIT: To be clear, if you were using the string below, with an extra 'n'

toooooooonoooooon

this regular expression (which doesn't specify 'o's)

t.*n

would match the following (as many characters as possible before an 'n')

toooooooonoooooon

but the regular expression

t.*?n

would only match the following (as few characters as possible before an 'n')

toooooooon
悲欢浪云 2024-07-15 05:45:07

正则表达式 es 总是渴望匹配。

你的表达式是这样说的:

A 't', followed by *as few as possible* 'o's, followed by a 'n'.

这意味着任何必需的 o 都将被匹配,因为末尾有一个 'n',而表达式渴望到达它。 匹配所有的 o 是成功的唯一可能。

A regular expression es always eager to match.

Your expression says this:

A 't', followed by *as few as possible* 'o's, followed by a 'n'.

That means any o's necessary will be matched, because there is an 'n' at the end, which the expression is eager to reach. Matching all the o's is it's only possibility to succeed.

小帐篷 2024-07-15 05:45:07

正则表达式尝试匹配其中的所有内容。 因为要匹配的 'o' 数量不少于 Toooon 中的每个 o 来匹配 n,所以一切都匹配。 另外,因为你正在使用 o*? 而不是 o+? 您不需要 o 存在。

示例,在 Perl 中

$a = "toooooo";
$b = "toooooon";

if ($a =~ m/(to*?)/) {
        print $1,"\n";
}
if ($b =~ m/(to*?n)/) {
        print $1,"\n";
}

~>perl ex.pl
t
toooooon

Regexps try to match everything in them. Because there are no less 'o's to match than every o in toooon to match the n, everything is matched. Also, because you are using o*? instead of o+? you are not requiring an o to be present.

Example, in Perl

$a = "toooooo";
$b = "toooooon";

if ($a =~ m/(to*?)/) {
        print $1,"\n";
}
if ($b =~ m/(to*?n)/) {
        print $1,"\n";
}

~>perl ex.pl
t
toooooon
花想c 2024-07-15 05:45:07

正则表达式总是尽力匹配。 在这种情况下,您要做的唯一一件事就是让解析器回溯到 /o*?/ 节点,从而减慢解析器的速度。 对于 "tooooon" 中的每个 'o' 一次。 而使用正常匹配时,第一次会花费尽可能多的 'o'。 由于下一个要匹配的元素是 'n',它不会与 'o' 匹配,因此尝试使用最小匹配没有什么意义。 事实上,当正常匹配失败时,需要相当长的一段时间才会失败。 它必须回溯每个 'o',直到没有剩余的可回溯。 在这种情况下,我实际上会使用最大匹配 /to*+n/'o' 会拿走它能拿走的一切,并且永远不会归还任何东西。 这将使得当它失败时它会很快失败。

最小 RE 成功:

'toooooon' ~~ /to*?n/

 t  o  o  o  o  o  o  n       
{t}                           match [t]
[t]                           match [o] 0 times
[t]<n>                        fail to match [n] -> retry [o]
[t]{o}                        match [o] 1 times
[t][o]<n>                     fail to match [n] -> retry [o]
[t][o]{o}                     match [o] 2 times
[t][o][o]<n>                  fail to match [n] -> retry [o]

. . . .

[t][o][o][o][o]{o}            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]
[t][o][o][o][o][o]{o}         match [o] 6 times
[t][o][o][o][o][o][o]{n}      match [n]

正常 RE 成功:

(注:与最大 RE 类似)

'toooooon' ~~ /to*n/

 t  o  o  o  o  o  o  n       
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]{n}      match [n]

最小 RE 失败:

'toooooo' ~~ /to*?n/

 t  o  o  o  o  o  o

. . . .

. . . .

[t][o][o][o][o]{o}            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]
[t][o][o][o][o][o]{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> retry [o]
[t][o][o][o][o][o][o]<o>      fail to match [o] 7 times -> match failed

正常 RE 失败:

'toooooo' ~~ /to*n/

 t  o  o  o  o  o  o       
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> retry [o]
[t][o][o][o][o][o]            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]

. . . .

[t][o]                        match [o] 1 times
[t][o]<o>                     fail to match [n] -> retry [o]
[t]                           match [o] 0 times
[t]<n>                        fail to match [n] -> match failed

最大 RE 失败:

'toooooo' ~~ /to*+n/

 t  o  o  o  o  o  o
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> match failed

The Regex always does its best to match. The only thing you are doing in this case would be slowing your parser down, by having it backtrack into the /o*?/ node. Once for every single 'o' in "tooooon". Whereas with normal matching, it would take as many 'o's, as it can, the first time through. Since the next element to match against is 'n', which won't be matched by 'o', there is little point in trying to use minimal matching. Actually, when the normal matching fails, it would take quite a while for it to fail. It has to backtrack through every 'o', until there is none left to backtrack through. In this case I would actually use maximal matching /to*+n/. The 'o' would take all it could, and never give any of it back. This would make it so that when it fails it fails quickly.

Minimal RE succeeding:

'toooooon' ~~ /to*?n/

 t  o  o  o  o  o  o  n       
{t}                           match [t]
[t]                           match [o] 0 times
[t]<n>                        fail to match [n] -> retry [o]
[t]{o}                        match [o] 1 times
[t][o]<n>                     fail to match [n] -> retry [o]
[t][o]{o}                     match [o] 2 times
[t][o][o]<n>                  fail to match [n] -> retry [o]

. . . .

[t][o][o][o][o]{o}            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]
[t][o][o][o][o][o]{o}         match [o] 6 times
[t][o][o][o][o][o][o]{n}      match [n]

Normal RE succeeding:

(NOTE: Similar for Maximal RE)

'toooooon' ~~ /to*n/

 t  o  o  o  o  o  o  n       
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]{n}      match [n]

Failure of Minimal RE:

'toooooo' ~~ /to*?n/

 t  o  o  o  o  o  o

. . . .

. . . .

[t][o][o][o][o]{o}            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]
[t][o][o][o][o][o]{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> retry [o]
[t][o][o][o][o][o][o]<o>      fail to match [o] 7 times -> match failed

Failure of Normal RE:

'toooooo' ~~ /to*n/

 t  o  o  o  o  o  o       
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> retry [o]
[t][o][o][o][o][o]            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]

. . . .

[t][o]                        match [o] 1 times
[t][o]<o>                     fail to match [n] -> retry [o]
[t]                           match [o] 0 times
[t]<n>                        fail to match [n] -> match failed

Failure of Maximal RE:

'toooooo' ~~ /to*+n/

 t  o  o  o  o  o  o
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> match failed
生活了然无味 2024-07-15 05:45:07

您正在搜索的字符串(实际上是干草堆)不包含子字符串“ton”。

然而,它确实包含子字符串“tooooooooooooon”。

The string you are searching in (the haystack as it were) does not contain the substring "ton".

It does however contain the substring "tooooooooooooon".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文