为什么这个表达式不遵循贪婪的方法?

发布于 2024-08-15 14:47:33 字数 717 浏览 2 评论 0原文

为什么这个表达式不遵循贪婪方法?

string input = @"cool  man! your  dog can walk on water ";
string pattern = @"cool (?<cool>(.*))    (?<h>((dog)*)) (?(h)(?<dog>(.*))) ";

MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace);


foreach (Match match in matches)
{
    Console.WriteLine("cool=" + match.Groups["cool"].Value);
    Console.WriteLine("dog=" + match.Groups["dog"].Value);
    Console.ReadLine();
}

输出:

cool=  man! your  dog can walk on water
dog=

正如您所观察到的: (dog) 组匹配了 0 次。但是既然 * 是贪婪的,为什么它不尝试找到 (dog) 的最大匹配次数(即 1)呢?

有什么线索吗?

Why this expression is not following greedy approach?

string input = @"cool  man! your  dog can walk on water ";
string pattern = @"cool (?<cool>(.*))    (?<h>((dog)*)) (?(h)(?<dog>(.*))) ";

MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace);


foreach (Match match in matches)
{
    Console.WriteLine("cool=" + match.Groups["cool"].Value);
    Console.WriteLine("dog=" + match.Groups["dog"].Value);
    Console.ReadLine();
}

Output:


cool= man! your dog can walk on water
dog=

As you can observe:
(dog) group is matched 0 times.But since,* is greedy,why doesn’t it tries to find maximum matches of (dog) which is 1?

Any clues?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

草莓酥 2024-08-22 14:47:33

第一个 .* 最初匹配整个字符串。然后,正则表达式引擎确定是否需要后退以匹配正则表达式的其余部分。但是 (?((dog)*))(?(h)(?(.*))) 都可以合法匹配零个字符,因此不需要回溯(就 .* 而言)。尝试在该部分使用非贪婪的 .*?

编辑(响应下面答案中发布的附加信息):好的,用非贪婪的 .*?.* > 确实有效果,只是不是你想要的效果。之前“cool”一词后面的所有内容都在 组中捕获,现在它在 组中捕获。发生的事情是这样的:

匹配单词“cool”后, (?(.*?)) 最初不匹配任何内容(与贪婪行为相反),而 (?((dog)*)) 尝试匹配。无论在哪里尝试,这部分总是会成功,因为它可以匹配“dog”或空字符串。这意味着 (?(h)...) 中的条件表达式将始终计算为 true,因此它会继续并将其余输入与 进行匹配(?(.*))

据我了解,您希望匹配命名组 中“cool”之后的所有内容,除非该字符串包含单词“dog”;那么您想要捕获命名组 中“dog”之后的所有内容。您尝试使用 条件 来实现此目的,但这并不是真正正确的工具。只需这样做:

string pattern = @"cool (?<cool>.*?) (dog (?<dog>.*))?$";

这里的关键是末尾的 $ ;它强制非贪婪的 .*? 保持匹配,直到到达字符串末尾。因为它是非贪婪的,所以在消耗每个字符之前,它会尝试匹配正则表达式的下一部分 (dog (?.*))。如果存在单词“dog”,则字符串的其余部分将被 (?.*) 消耗;如果不是,正则表达式仍然会成功,因为 ? 使整个部分成为可选。

The first .* initially matches the whole string. Then the regex engine determines whether it needs to back off to match the rest of the regex. But (?<h>((dog)*)) and (?(h)(?<dog>(.*))) can both legally match zero characters, so no backtracking is needed (as far as the .* is concerned). Try using a non-greedy .*? in that part.

EDIT (in response to the additional info posted in the answer below): Okay, replacing the first .* with a non-greedy .*? does have an effect, just not the one you want. Where everything after the word "cool" was being captured in group <cool> before, now it's being captured in group <dog>. Here's what's happening:

After the word "cool" is matched, (?<cool>(.*?)) initially matches nothing (the opposite of the greedy behavior), and (?<h>((dog)*)) tries to match. This part will always succeed no matter where it's tried, because it can match either "dog" or an empty string. That means the conditional expression in (?(h)...) will always evaluate to true, so it goes ahead and matches the rest of the input with (?<dog>(.*)).

As I understand it, you want to match everything after "cool" in named group <cool>, unless the string contains the word "dog"; then you want to capture everything after "dog" in named group <dog>. You're trying to use a conditional for that, but it's not really the right tool. Just do this:

string pattern = @"cool (?<cool>.*?) (dog (?<dog>.*))?$";

The key here is the $ at the end; it forces the non-greedy .*? to keep matching until it reaches the end of the string. Because it's non-greedy, it tries to match the next part of the regex, (dog (?<dog>.*)), before consuming each character. If the word "dog" is there, the rest of the string will be consumed by (?<dog>.*); if not, the regex still succeeds because the ? makes that whole part optional.

短叹 2024-08-22 14:47:33

我确实尝试过非贪婪 (.*?) 但它没有任何效果,这是显而易见的,因为非贪婪 (.*?) 代表 {0, 1}。而且由于这里甚至有零个字符匹配,所以没有效果。

任何想法如何纠正它。我的意思是,我想捕获字符串后跟 (dog) 如果它存在的话,否则前一组将捕获字符串 (cool(.*) )

问题是 (dog) 是可选的,如果它存在,我们需要它后面的字符串。

使用 (dog)? 没有任何效果,因为它再次匹配零个字符。

谢谢 。

I did tried non-greedy (.*?) but it has no effect which is obvious as non-greedy (.*?) stands for {0,1}.and since even zero characters matches here,so no effect.

Any ideas how can correct it .I mean ,i want to capture the string followed by (dog) if its present there or else the previous group will capture the string (cool(.*))

The problem is that (dog) is optional and if its present,we need the string following it.

using (dog)? doesn't have any effect as it again matches zero characters.

Thanks .

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文