惰性正则表达式无法按预期工作 C#

发布于 2024-09-06 13:37:19 字数 162 浏览 9 评论 0原文

我有以下正则表达式:a?\W*?b 我有一个字符串 ,.! ,b
搜索匹配项时,我得到 ,.! ,b,但不仅仅是我期望的b。这是为什么?如何修改正则表达式以获得我需要的内容?
感谢您的帮助。

I have the following regex: a?\W*?b
and I have a string ,.! ,b
When searching for a match I get ,.! ,b, but not just b as I expect. Why is that? How to modify the regex to get what I need?
Thank you for your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

柠北森屋 2024-09-13 13:37:19

惰性量词在这里对你想要的没有帮助。让我们看看发生了什么。

正则表达式引擎从字符串的开头启动。首先尝试匹配a。它不能,但是没有问题,因为 a 是可选的。

然后,有一个惰性 \W*?,因此正则表达式引擎会跳过它,但会记住当前位置。

然后它尝试匹配b。它不能,因此它回溯并成功将 ,\W*? 匹配。然后它继续尝试匹配 b (因为惰性量词)。还是不行,又原路返回。这会重复几次,直到正则表达式引擎最终到达 b。现在匹配已完成 - 正则表达式引擎声明成功。

所以正则表达式按指定工作 - 只是不按预期工作。现在的问题是:您到底想要正则表达式做什么?

例如,如果您真正想要的是:

单独匹配 b,除非它前面有 a 和一些非单词字符,在这种情况下匹配 a 中的所有内容b,然后使用

b|a\W*b

A lazy quantifier doesn't help here for what you want. Let's see what's happening.

The regex engine starts at the beginning of the string. First tries to match a. It can't, but it's no problem since the a is optional.

Then, there is a lazy \W*? so the regex engine skips it but remembers the current position.

It then tries to match b. It can't, so it backtracks and successfully matches the , with \W*?. It then goes on to try and match b (because of the lazy quantifier). It still can't and backtracks again. This repeats a few times until finally the regex engine has arrived at the b. Now the match is complete - the regex engine declares success.

So the regex works as specified - just not as intended. Now the question is: What exactly do you want the regex to do?

For example, if what you really want is:

Match b alone, unless it's preceded by a and some non-word characters, in which case match everything from a to b, then use

b|a\W*b
心安伴我暖 2024-09-13 13:37:19

惰性表达式仅从右侧惰性,即通过删除右侧的字符来使其尽可能短,但不会删除左侧的字符。

为了让匹配稍后开始,你需要在它之前有一个贪婪表达式来吞掉你不想匹配的字符。

或者,正如 Tim 所示,您可以通过仅匹配第一个字符和后续分隔符(如果第一个字符存在)来使匹配稍后开始。

A lazy expression is only lazy from the right, i.e. it will be as short as possible by removing characters on the right, but it will not remove characters on the left.

To make the match start later, you need a greedy expression before it that swallows the characters that you don't want to match.

Alternatively, as Tim showed, you can make the match start later by only matching the first character and the following separators if the first character exists.

淡淡の花香 2024-09-13 13:37:19

例如,以下内容可能有效:(a\W*)?b

要更好地了解什么可以解决您的问题,您应该包含更多示例。

For example, the following might work: (a\W*)?b

To know better what might solve your problem, you should include more examples.

找回味觉 2024-09-13 13:37:19

您的正则表达式匹配整个字符串,如下所示:

  1. a,零次或一次重复(本例中为“”)
  2. 任何非字母数字字符,任意次数的重复,尽可能少(本例中为“,.! ,”)
  3. b

在您的情况下,正则表达式匹配整个字符串,因此不会仅找到 b (它不会找到同一部分的多个匹配项)。

如果您搜索像 ',.! 这样的字符串,db' 它将找到 b。

Your regexp matches the entire string like this:

  1. a, zero or one repetitions ("" in this case)
  2. Any character that is not alphanumeric, any number of repetitions, as few as possible (",.! ," in this case)
  3. b

In your case the regexp matches the entire string, and will therefor not find just the b (it doesn't find several matches of the same part).

If you search in a string like ',.! ,db' it will find the b.

痴骨ら 2024-09-13 13:37:19

a? 表示“我想要零个或一个 a 实例” - 这是满足的,因为有零个实例,然后是

< code>\W* 表示“我想要零个或多个非单词字符”,这通过标点符号和空格字符来满足,最后

b 表示 < em>“匹配一个字母b”,确实如此。所以你的整个字符串满足正则表达式。

如果您在任何人提出可能的解决方案之前提供更多可能输入的示例,将会有所帮助。

The a? says "i want either zero or one instance of a" - this is satisfied as there is zero instances, and followed by

\W* says "i want zero or more non word characters", which is satisfied by the punctuation and space characters, and finally

b says "match a letter b", which it does. So your whole string satisfies the regex.

It helps if you give more examples of possible inputs before anyone sugests a possible solution.

陈年往事 2024-09-13 13:37:19

您的示例没有显示为什么 a? 是正则表达式的一部分,而是仅匹配看起来像 ,.! 的字符串中的 b 。 ,b 您可以使用像这样的(?=\W*?)b 的lookbehind。

这会匹配 b ,其前面是一个“非单词字符”字符,并且可以无限次(尽可能少)

如果您只想匹配 ab 在字符串中,例如 a,.! ,b 您必须使用捕获组:(a?)\W*?(b),其中第一个组将保存 a(如果存在)并且组 2 b

Your example doesn't show why the a? is part of your regex but to match only b in a string that looks like ,.! ,b you can use lookbehind like this (?=\W*?)b.

This matches b that is preceded by a character that is a "non-word character" zero and unlmited times (as few as possible)

If you only want to match say a and b in a string such as a,.! ,b you'll have to use capturing groups: (a?)\W*?(b) where group one will hold the a if present and group 2 b

焚却相思 2024-09-13 13:37:19

将正则表达式称为贪婪或非贪婪是错误的。您可以在整个正则表达式中使用非贪婪量词,但正如您所发现的,它仍然会尝试尽早开始匹配。同样,仅使用贪婪量词的正则表达式不能保证返回最长可能的匹配。例如,

Regex.Match("foo bar", @"\w+ (?:b|bar)")

...返回 foo b,因为交替会选择第一个有效的替代方案,即使稍后的替代方案会导致更长的匹配。 (请注意,我谈论的是 Perl 派生的正则表达式风格,例如 .NET 的;某些风格,例如 awkegrep,确实能够支持最长的匹配。但是,因为这些风格没有非贪婪量词,所以贪婪不仅仅是默认模式,它还是唯一模式。)

简而言之,不存在贪婪或非贪婪正则表达式之类的东西,仅贪婪或非贪婪量词。

It's a mistake to speak of a regex as being greedy or non-greedy. You can use non-greedy quantifiers throughout the regex, but it will still try to start matching at the earliest opportunity, as you discovered. Similarly, a regex that uses only greedy quantifiers isn't guaranteed to return the longest possible match. For example,

Regex.Match("foo bar", @"\w+ (?:b|bar)")

...returns foo b, because alternation settles for the first alternative that works, even if a later one would result in a longer match. (Note that I'm talking about Perl-derived regex flavors like .NET's; some flavors, like awk and egrep, do indeed hold out for the longest possible match. But, since those flavors don't have non-greedy quantifiers, greedy isn't just the default mode, it's the only mode.)

In short, there's no such thing as a greedy or non-greedy regex, only greedy or non-greedy quantifiers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文