对正则表达式贪婪运算符和终止字符的混淆

发布于 2024-12-17 12:45:10 字数 342 浏览 0 评论 0原文

我正在准备 SCJP 考试,下面的模拟题让我措手不及。该工具中的解释不是很好,所以我希望有知识的人可以解释一下。

使用 C.*L 的正则表达式,识别将从 CooLooLCuuLooC 捕获的单词,

我选择了 CooL 和 CuuL。我做出这种选择的原因是因为我相信它会查找 C 的起始匹配,然后采用任意字符零次或多次,直到找到 L,然后终止。

然而,答案实际上是CooLooLCuuL。我很困惑前 2 个 L 是如何通过的?

有人可以帮我解决这个问题吗?

谢谢

I'm studying up for the SCJP exam, and the following mock question caught me offguard. The explanation in the tool wasn't very good so I'm hoping the knowledgeable people of SO can explain it.

With the regex of C.*L, identify the words it would capture from CooLooLCuuLooC

I selected CooL and CuuL. My reason for this choice is because I believed it would look for a starting match of C, then take any character zero or more times until it finds an L, and then terminate.

However, the answer is actually CooLooLCuuL. I'm confused as to how the first 2 L's make it through?

Could anyone please clear this up for me?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

月亮邮递员 2024-12-24 12:45:10

还有一个可能有用的解释:

.* 匹配任何内容(默认情况下,换行符除外!!!),零次或多次 - 一般来说,您明白这一点。但是,.*? 也符合该定义。区别在于贪婪...

  • .* 将匹配任何内容,直到它无法匹配其他任何内容(“贪婪”或“渴望”)
  • .*? 将匹配任何内容,直到可以匹配以下表达式('非贪婪'或'勉强')

因此,C.*L将找到大写的C,然后匹配ooLooLCuuLooC< /代码> 与<代码>.*。然后它会发现它必须匹配大写的L。在字符串的末尾,这是不可能的,因此它会转到可以匹配 L 的位置,强制 .* 放弃字符 LooC 才能做到这一点。结果:CooLooLCuuL

如果您使用C.*?L,它将找到C,然后匹配o,测试下一个 o 是否与 L 匹配。这会失败,使其匹配 oo 并测试下一个 L 是否与 L 匹配。这将成功并返回 CooL

第三个选项用于匹配 CooLCuuL(即以 C 开头并以 L 结尾的任何字符串) ) 将是 C[^L]*L。它匹配 C,然后匹配任意数量的不是大写 L 的字符,然后匹配大写 L

Just one more possibly useful explanation:

The .* matches anything (except, by default, newlines!!!!), zero or more times - you understood that, generally. However, .*? also meets that definition. The difference is greediness...

  • .* will match anything until it can't match anything else ('greedy' or 'eager')
  • .*? will match anything until the following expression can be matched ('non-greedy' or 'reluctant')

Thus, C.*L will find a capital C, then match ooLooLCuuLooC with .*. It will then find it has to match a capital L. Being at the end of the string, that's not possible, so it goes to where it can match an L, forcing the .* to give up the characters LooC in order to do so. Result: CooLooLCuuL

If you were to use C.*?L it will find C, then match o, testing the next o for a match to L. This would fail, making it match oo and testing the next L for a match to L. This would succeed and it would return CooL.

A third option for matching either CooL or CuuL (that is any strings that start with C and end with L) would be C[^L]*L. This matches C, then any number of characters that are not a capital L, then a capital L.

GRAY°灰色天空 2024-12-24 12:45:10

C.*L 匹配 CooLooLCuuL 因为它是贪婪的。它会尝试尽可能多地进行咀嚼,同时仍然找到匹配项,直到您匹配的字符串不再剩下来找到有效的匹配项。 C.*?L 是非贪婪的,因此匹配 CooL,因为当找到第一个匹配时它将被满足。它甚至会留下足够的字符串以便找到第二个匹配 CuuL

C.*L matches CooLooLCuuL because it's greedy. It will try to chew over as much as it can while still finding a match, until there's no more left of the string you're matching against to find a valid match. C.*?L is non-greedy and therefore matches CooL, since it will be satisfied when the first match is found. It will even leave off enough of the string for a second match CuuL to be found.

原来是傀儡 2024-12-24 12:45:10

这是因为它是一个贪婪搜索,并且会匹配尽可能多的字符,然后回溯直到找到L 字符。

这是一个很好的资源,可以获取有关此事的更多信息:http://www.regular-expressions.info /repeat.html

This is because it's a greedy search and will match as many characters as possible and then backtrack until it finds a L character.

Here's a great resource to get more information in the matter: http://www.regular-expressions.info/repeat.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文