C# 正则表达式：单行选项导致负向先行失败

发布于 2024-09-03 15:59:21 字数 1082 浏览 6 评论 0原文

我试图弄清楚为什么当打开“单行”选项时，带有负向预测的正则表达式会失败。

示例（简化）：

<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>

如果单行选项打开，则此操作

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)

将失败；如果单行选项关闭，则将工作。例如，这有效（禁用单行选项）：

(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))

我的理解是单行模式只允许点“.”。匹配新行，我不明白为什么它会影响上面的表达式。

谁能解释一下我在这里缺少什么？

:::::::::::::::::::::::

编辑： (?!.*) 是负向展望而不是捕获组。

 <source>(?!.*?<source>)(.*?)</source>(?!\s*<target)

如果单行模式打开，也会失败，所以看起来这不是一个贪婪问题。在正则表达式设计器中尝试一下（如 Expresso 或 Rad 正则表达式）：

单行关闭时，它匹配（如预期）：

<source>Test 1</source>    
<source>Test 3</source>

单行打开：

<source>Test 3</source>

我不明白为什么它也不匹配第一个：它确实不包含第一个否定前瞻，因此它应该与表达式匹配。

原文

I am trying to figure out why a regex with negative look ahead fails when the "single line" option is turned on.

Example (simplified):

<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>

This:

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)

will fail if the single line option is on, and will work if the single line option is off. For instance, this works (disables the single line option):

(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))

My understanding is that the single line mode simply allows the dot "." to match new lines, and I don't see why it would affect the expression above.

Can anyone explain what I am missing here?

::::::::::::::::::::::

EDIT: (?!.*) is a negative look ahead not a capturing group.

 <source>(?!.*?<source>)(.*?)</source>(?!\s*<target)

will ALSO FAIL if the single line mode is on, so it doesn't look like this is a greediness issue. Try it in a Regex designer (like Expresso or Rad regex):

With single line OFF, it matches (as expected):

<source>Test 1</source>    
<source>Test 3</source>

With single line ON:

<source>Test 3</source>

I don't understand why it doesn't match the first one as well: it does not contain the first negative look ahead, so it should match the expression.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫性小仙女 2024-09-10 15:59:21

我相信这就是您正在寻找的内容：

<source>((?:(?!</?source>).)*)</source>(?!\s*<target)

其想法是一次匹配每个字符，但前提是确保它不是的第一个字符。此外，通过在前瞻中添加 /?，您不必使用非贪婪量词。

I believe this is what you're looking for:

<source>((?:(?!</?source>).)*)</source>(?!\s*<target)

The idea is that you match each character one at a time, but only after making sure it isn't the first character of </source>. Also, with the addition of /? to the lookahead, you don't have to use a non-greedy quantifier.

回复收藏 0 原文

追星践月 2024-09-10 15:59:21

它“失败”的原因是因为您似乎放错了消极的前瞻。

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
        ^^^^^^^^^^^^^^

现在，让我们考虑一下 (?!.*) 在这里做了什么：它是一个前瞻，表示 .*NO 不匹配。 source> 从该位置开始。

嗯，在单行模式下， . 匹配所有内容。匹配前两个后，实际上是 .*！因此，前两个的负向先行失败。

在最后一个上，.* 不再匹配，因此否定先行成功。该模式的其余部分也成功，这就是为什么您只能在单行模式下获得 Test 3 。

The reason why it "fails" is because you seem to have misplaced the negative lookahead.

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
        ^^^^^^^^^^^^^^

Now, let's consider what (?!.*<source>) does here: it's a lookahead that says that there is NO match for .*<source> from that position.

Well, in single-line mode, . matches everything. After matching the first two <source>, there IS in fact .*<source>! So the negative lookahead fails for the first two <source>.

On the last <source>, .*<source> no longer match, so the negative lookahead succeeds. The rest of the pattern also succeeds, and that's why you only get <source>Test 3</source> in single-line mode.

回复收藏 0 原文

~没有更多了~