正则表达式非贪婪是贪婪
我有以下文本
tooooooooooooon
根据我正在阅读的这本书,当 ?
跟在任何量词之后时,它就会变得非贪婪。
我的正则表达式 to*?n
仍然返回 tooooooooooooon
。
它应该返回 ton
不是吗?
知道为什么吗?
I have the following text
tooooooooooooon
According to this book I'm reading, when the ?
follows after any quantifier, it becomes non greedy.
My regex to*?n
is still returning tooooooooooooon
.
It should return ton
shouldn't it?
Any idea why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
正则表达式只能匹配实际存在的文本片段。
由于子字符串“ton”不存在于字符串中的任何位置,因此它不可能是匹配结果。 匹配只会返回原始字符串的子字符串
编辑:要清楚,如果您使用下面的字符串,带有额外的“n”,则
此正则表达式(不指定“o”)
将匹配以下内容(尽可能多 ) “n”之前尽可能少的字符),
但正则表达式
只会匹配以下内容(“n”之前尽可能少的字符)
A regular expression can only match a fragment of text that actually exists.
Because the substring 'ton' doesn't exist anywhere in your string, it can't be the result of a match. A match will only return a substring of the original string
EDIT: To be clear, if you were using the string below, with an extra 'n'
this regular expression (which doesn't specify 'o's)
would match the following (as many characters as possible before an 'n')
but the regular expression
would only match the following (as few characters as possible before an 'n')
正则表达式 es 总是渴望匹配。
你的表达式是这样说的:
这意味着任何必需的 o 都将被匹配,因为末尾有一个 'n',而表达式渴望到达它。 匹配所有的 o 是成功的唯一可能。
A regular expression es always eager to match.
Your expression says this:
That means any o's necessary will be matched, because there is an 'n' at the end, which the expression is eager to reach. Matching all the o's is it's only possibility to succeed.
正则表达式尝试匹配其中的所有内容。 因为要匹配的 'o' 数量不少于 Toooon 中的每个 o 来匹配 n,所以一切都匹配。 另外,因为你正在使用 o*? 而不是 o+? 您不需要 o 存在。
示例,在 Perl 中
Regexps try to match everything in them. Because there are no less 'o's to match than every o in toooon to match the n, everything is matched. Also, because you are using o*? instead of o+? you are not requiring an o to be present.
Example, in Perl
正则表达式总是尽力匹配。 在这种情况下,您要做的唯一一件事就是让解析器回溯到
/o*?/
节点,从而减慢解析器的速度。 对于"tooooon"
中的每个'o'
一次。 而使用正常匹配时,第一次会花费尽可能多的'o'
。 由于下一个要匹配的元素是'n'
,它不会与'o'
匹配,因此尝试使用最小匹配没有什么意义。 事实上,当正常匹配失败时,需要相当长的一段时间才会失败。 它必须回溯每个'o'
,直到没有剩余的可回溯。 在这种情况下,我实际上会使用最大匹配/to*+n/
。'o'
会拿走它能拿走的一切,并且永远不会归还任何东西。 这将使得当它失败时它会很快失败。最小 RE 成功:
正常 RE 成功:
(注:与最大 RE 类似)
最小 RE 失败:
正常 RE 失败:
最大 RE 失败:
The Regex always does its best to match. The only thing you are doing in this case would be slowing your parser down, by having it backtrack into the
/o*?/
node. Once for every single'o'
in"tooooon"
. Whereas with normal matching, it would take as many'o'
s, as it can, the first time through. Since the next element to match against is'n'
, which won't be matched by'o'
, there is little point in trying to use minimal matching. Actually, when the normal matching fails, it would take quite a while for it to fail. It has to backtrack through every'o'
, until there is none left to backtrack through. In this case I would actually use maximal matching/to*+n/
. The'o'
would take all it could, and never give any of it back. This would make it so that when it fails it fails quickly.Minimal RE succeeding:
Normal RE succeeding:
(NOTE: Similar for Maximal RE)
Failure of Minimal RE:
Failure of Normal RE:
Failure of Maximal RE:
您正在搜索的字符串(实际上是干草堆)不包含子字符串“ton”。
然而,它确实包含子字符串“tooooooooooooon”。
The string you are searching in (the haystack as it were) does not contain the substring "ton".
It does however contain the substring "tooooooooooooon".