C# 正则表达式:单行选项导致负向先行失败
我试图弄清楚为什么当打开“单行”选项时,带有负向预测的正则表达式会失败。
示例(简化):
<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>
如果单行选项打开,则此操作
<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
将失败;如果单行选项关闭,则将工作。例如,这有效(禁用单行选项):
(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))
我的理解是单行模式只允许点“.”。匹配新行,我不明白为什么它会影响上面的表达式。
谁能解释一下我在这里缺少什么?
:::::::::::::::::::::::
编辑: (?!.*) 是负向展望而不是捕获组。
<source>(?!.*?<source>)(.*?)</source>(?!\s*<target)
如果单行模式打开,也会失败,所以看起来这不是一个贪婪问题。在正则表达式设计器中尝试一下(如 Expresso 或 Rad 正则表达式):
单行关闭时,它匹配(如预期):
<source>Test 1</source>
<source>Test 3</source>
单行打开:
<source>Test 3</source>
我不明白为什么它也不匹配第一个:它确实不包含第一个否定前瞻,因此它应该与表达式匹配。
I am trying to figure out why a regex with negative look ahead fails when the "single line" option is turned on.
Example (simplified):
<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>
This:
<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
will fail if the single line option is on, and will work if the single line option is off. For instance, this works (disables the single line option):
(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))
My understanding is that the single line mode simply allows the dot "." to match new lines, and I don't see why it would affect the expression above.
Can anyone explain what I am missing here?
::::::::::::::::::::::
EDIT: (?!.*) is a negative look ahead not a capturing group.
<source>(?!.*?<source>)(.*?)</source>(?!\s*<target)
will ALSO FAIL if the single line mode is on, so it doesn't look like this is a greediness issue. Try it in a Regex designer (like Expresso or Rad regex):
With single line OFF, it matches (as expected):
<source>Test 1</source>
<source>Test 3</source>
With single line ON:
<source>Test 3</source>
I don't understand why it doesn't match the first one as well: it does not contain the first negative look ahead, so it should match the expression.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我相信这就是您正在寻找的内容:
其想法是一次匹配每个字符,但前提是确保它不是
的第一个字符。此外,通过在前瞻中添加
/?
,您不必使用非贪婪量词。I believe this is what you're looking for:
The idea is that you match each character one at a time, but only after making sure it isn't the first character of
</source>
. Also, with the addition of/?
to the lookahead, you don't have to use a non-greedy quantifier.它“失败”的原因是因为您似乎放错了消极的前瞻。
现在,让我们考虑一下
(?!.*
在这里做了什么:它是一个前瞻,表示.*NO 不匹配。 source>
从该位置开始。嗯,在单行模式下,
.
匹配所有内容。匹配前两个后,实际上是
.*
!因此,前两个的负向先行失败。
在最后一个
上,
.*
不再匹配,因此否定先行成功。该模式的其余部分也成功,这就是为什么您只能在单行模式下获得。
The reason why it "fails" is because you seem to have misplaced the negative lookahead.
Now, let's consider what
(?!.*<source>)
does here: it's a lookahead that says that there is NO match for.*<source>
from that position.Well, in single-line mode,
.
matches everything. After matching the first two<source>
, there IS in fact.*<source>
! So the negative lookahead fails for the first two<source>
.On the last
<source>
,.*<source>
no longer match, so the negative lookahead succeeds. The rest of the pattern also succeeds, and that's why you only get<source>Test 3</source>
in single-line mode.