对正则表达式贪婪运算符和终止字符的混淆
我正在准备 SCJP 考试,下面的模拟题让我措手不及。该工具中的解释不是很好,所以我希望有知识的人可以解释一下。
使用 C.*L
的正则表达式,识别将从 CooLooLCuuLooC
捕获的单词,
我选择了 CooL 和 CuuL
。我做出这种选择的原因是因为我相信它会查找 C 的起始匹配,然后采用任意字符零次或多次,直到找到 L
,然后终止。
然而,答案实际上是CooLooLCuuL
。我很困惑前 2 个 L 是如何通过的?
有人可以帮我解决这个问题吗?
谢谢
I'm studying up for the SCJP exam, and the following mock question caught me offguard. The explanation in the tool wasn't very good so I'm hoping the knowledgeable people of SO can explain it.
With the regex of C.*L
, identify the words it would capture from CooLooLCuuLooC
I selected CooL and CuuL
. My reason for this choice is because I believed it would look for a starting match of C, then take any character zero or more times until it finds an L
, and then terminate.
However, the answer is actually CooLooLCuuL
. I'm confused as to how the first 2 L
's make it through?
Could anyone please clear this up for me?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
还有一个可能有用的解释:
.*
匹配任何内容(默认情况下,换行符除外!!!),零次或多次 - 一般来说,您明白这一点。但是,.*?
也符合该定义。区别在于贪婪....*
将匹配任何内容,直到它无法匹配其他任何内容(“贪婪”或“渴望”).*?
将匹配任何内容,直到可以匹配以下表达式('非贪婪'或'勉强')因此,
C.*L
将找到大写的C
,然后匹配ooLooLCuuLooC< /代码> 与<代码>.*。然后它会发现它必须匹配大写的
L
。在字符串的末尾,这是不可能的,因此它会转到可以匹配L
的位置,强制.*
放弃字符LooC
才能做到这一点。结果:CooLooLCuuL
如果您使用
C.*?L
,它将找到C
,然后匹配o
,测试下一个o
是否与L
匹配。这会失败,使其匹配oo
并测试下一个L
是否与L
匹配。这将成功并返回CooL
。第三个选项用于匹配
CooL
或CuuL
(即以C
开头并以L
结尾的任何字符串) ) 将是C[^L]*L
。它匹配C
,然后匹配任意数量的不是大写L
的字符,然后匹配大写L
。Just one more possibly useful explanation:
The
.*
matches anything (except, by default, newlines!!!!), zero or more times - you understood that, generally. However,.*?
also meets that definition. The difference is greediness....*
will match anything until it can't match anything else ('greedy' or 'eager').*?
will match anything until the following expression can be matched ('non-greedy' or 'reluctant')Thus,
C.*L
will find a capitalC
, then matchooLooLCuuLooC
with.*
. It will then find it has to match a capitalL
. Being at the end of the string, that's not possible, so it goes to where it can match anL
, forcing the.*
to give up the charactersLooC
in order to do so. Result:CooLooLCuuL
If you were to use
C.*?L
it will findC
, then matcho
, testing the nexto
for a match toL
. This would fail, making it matchoo
and testing the nextL
for a match toL
. This would succeed and it would returnCooL
.A third option for matching either
CooL
orCuuL
(that is any strings that start withC
and end withL
) would beC[^L]*L
. This matchesC
, then any number of characters that are not a capitalL
, then a capitalL
.C.*L
匹配CooLooLCuuL
因为它是贪婪的。它会尝试尽可能多地进行咀嚼,同时仍然找到匹配项,直到您匹配的字符串不再剩下来找到有效的匹配项。C.*?L
是非贪婪的,因此匹配CooL
,因为当找到第一个匹配时它将被满足。它甚至会留下足够的字符串以便找到第二个匹配CuuL
。C.*L
matchesCooLooLCuuL
because it's greedy. It will try to chew over as much as it can while still finding a match, until there's no more left of the string you're matching against to find a valid match.C.*?L
is non-greedy and therefore matchesCooL
, since it will be satisfied when the first match is found. It will even leave off enough of the string for a second matchCuuL
to be found.这是因为它是一个
贪婪搜索
,并且会匹配尽可能多的字符,然后回溯直到找到L
字符。这是一个很好的资源,可以获取有关此事的更多信息:http://www.regular-expressions.info /repeat.html
This is because it's a
greedy search
and will match as many characters as possible and then backtrack until it finds aL
character.Here's a great resource to get more information in the matter: http://www.regular-expressions.info/repeat.html