C# 正则表达式:单行选项导致负向先行失败

发布于 2024-09-03 15:59:21 字数 1082 浏览 3 评论 0原文

我试图弄清楚为什么当打开“单行”选项时,带有负向预测的正则表达式会失败。

示例(简化):

<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>

如果单行选项打开,则此操作

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)

将失败;如果单行选项关闭,则将工作。例如,这有效(禁用单行选项):

(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))

我的理解是单行模式只允许点“.”。匹配新行,我不明白为什么它会影响上面的表达式。

谁能解释一下我在这里缺少什么?

:::::::::::::::::::::::

编辑: (?!.*) 是负向展望而不是捕获组。

 <source>(?!.*?<source>)(.*?)</source>(?!\s*<target)

如果单行模式打开,也会失败,所以看起来这不是一个贪婪问题。在正则表达式设计器中尝试一下(如 Expresso 或 Rad 正则表达式):

单行关闭时,它匹配(如预期):

<source>Test 1</source>    
<source>Test 3</source>

单行打开:

<source>Test 3</source>

我不明白为什么它也不匹配第一个:它确实不包含第一个否定前瞻,因此它应该与表达式匹配。

I am trying to figure out why a regex with negative look ahead fails when the "single line" option is turned on.

Example (simplified):

<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>

This:

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)

will fail if the single line option is on, and will work if the single line option is off. For instance, this works (disables the single line option):

(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))

My understanding is that the single line mode simply allows the dot "." to match new lines, and I don't see why it would affect the expression above.

Can anyone explain what I am missing here?

::::::::::::::::::::::

EDIT: (?!.*) is a negative look ahead not a capturing group.

 <source>(?!.*?<source>)(.*?)</source>(?!\s*<target)

will ALSO FAIL if the single line mode is on, so it doesn't look like this is a greediness issue. Try it in a Regex designer (like Expresso or Rad regex):

With single line OFF, it matches (as expected):

<source>Test 1</source>    
<source>Test 3</source>

With single line ON:

<source>Test 3</source>

I don't understand why it doesn't match the first one as well: it does not contain the first negative look ahead, so it should match the expression.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

猫性小仙女 2024-09-10 15:59:21

我相信这就是您正在寻找的内容:

<source>((?:(?!</?source>).)*)</source>(?!\s*<target)

其想法是一次匹配每个字符,但前提是确保它不是 的第一个字符。此外,通过在前瞻中添加 /?,您不必使用非贪婪量词。

I believe this is what you're looking for:

<source>((?:(?!</?source>).)*)</source>(?!\s*<target)

The idea is that you match each character one at a time, but only after making sure it isn't the first character of </source>. Also, with the addition of /? to the lookahead, you don't have to use a non-greedy quantifier.

追星践月 2024-09-10 15:59:21

它“失败”的原因是因为您似乎放错了消极的前瞻。

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
        ^^^^^^^^^^^^^^

现在,让我们考虑一下 (?!.*) 在这里做了什么:它是一个前瞻,表示 .*NO 不匹配。 source> 从该位置开始。

嗯,在单行模式下, . 匹配所有内容。匹配前两个 后,实际上是 .*!因此,前两个 的负向先行失败。

在最后一个 上,.* 不再匹配,因此否定先行成功。该模式的其余部分也成功,这就是为什么您只能在单行模式下获得 Test 3

The reason why it "fails" is because you seem to have misplaced the negative lookahead.

<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
        ^^^^^^^^^^^^^^

Now, let's consider what (?!.*<source>) does here: it's a lookahead that says that there is NO match for .*<source> from that position.

Well, in single-line mode, . matches everything. After matching the first two <source>, there IS in fact .*<source>! So the negative lookahead fails for the first two <source>.

On the last <source>, .*<source> no longer match, so the negative lookahead succeeds. The rest of the pattern also succeeds, and that's why you only get <source>Test 3</source> in single-line mode.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文