正则表达式 ‘(?<=#)[^#]+(?=#)’工作?
我在 C# 程序中有以下正则表达式,并且很难理解它:
(?<=#)[^#]+(?=#)
我将其分解为我认为我理解的内容:
(?<=#) a group, matching a hash. what's `?<=`?
[^#]+ one or more non-hashes (used to achieve non-greediness)
(?=#) another group, matching a hash. what's the `?=`?
所以我遇到的问题是 ?<=
和 ?<
部分。从 MSDN 来看,?
用于命名组,但在这种情况下,尖括号永远不会关闭。
我在文档中找不到 ?=
,搜索它确实很困难,因为搜索引擎大多会忽略这些特殊字符。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
它们被称为环视;它们允许您断言模式是否匹配,而无需实际进行匹配。有 4 种基本的环视:
模式
...(?=pattern)
- ...当前位置的右侧(向前看)(?<=pattern)
- ...当前位置的左侧(向后看后面)模式
(?!pattern)
- ...右侧(? - ...左边
作为一个简单的提醒,环顾一下:
=
是正,!
是负 em><
是向后看,否则是向前看参考文献
但是为什么要使用lookarounds呢?
有人可能会认为上面的模式中的环视是不必要的,并且
#([^#]+)#
可以很好地完成这项工作(提取\1 来获取非
#
)。不完全是。不同之处在于,由于环视与
#
不匹配,因此下次尝试查找匹配时可以再次“使用”它。简单地说,环视允许“匹配”重叠。考虑以下输入字符串:
现在,
#([az]+)#
将给出以下匹配项 (如 rubular.com 上所示):将其与
(?<=#)[az]+(?=#)
进行比较,匹配:不幸的是,这可以'无法在 rubular.com 上进行演示,因为它不支持lookbehind。但是,它确实支持前瞻,因此我们可以使用
#([az]+)(?=#)
执行类似的操作,它匹配 (如 rubular.com 上所示):参考文献
They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:
pattern
...(?=pattern)
- ... to the right of current position (look ahead)(?<=pattern)
- ... to the left of current position (look behind)pattern
(?!pattern)
- ... to the right(?<!pattern)
- ... to the leftAs an easy reminder, for a lookaround:
=
is positive,!
is negative<
is look behind, otherwise it's look aheadReferences
But why use lookarounds?
One might argue that lookarounds in the pattern above aren't necessary, and
#([^#]+)#
will do the job just fine (extracting the string captured by\1
to get the non-#
).Not quite. The difference is that since a lookaround doesn't match the
#
, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.Consider the following input string:
Now,
#([a-z]+)#
will give the following matches (as seen on rubular.com):Compare this with
(?<=#)[a-z]+(?=#)
, which matches:Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with
#([a-z]+)(?=#)
, which matches (as seen on rubular.com):References
正如另一张海报提到的,这些是lookarounds,特殊的构造用于更改匹配的内容和时间。这表示:
因此这将匹配两个
#
之间的所有字符。前瞻和后视在许多情况下都非常有用。例如,考虑规则“匹配所有后面不跟有
a
的b
”。您的第一次尝试可能类似于b[^a]
,但这是不对的:这也会匹配bus
中的bu
或 < code>bo 位于boy
中,但您只想要b
。即使后面没有a
,它也不会匹配cab
中的b
,因为没有更多的字符可以匹配。要正确执行此操作,您需要先行查看:
b(?!a)
。这表示“匹配b
但之后不匹配a
,并且不将其作为匹配的一部分”。因此,它只会匹配bolo
中的b
,这正是您想要的;同样,它会匹配cab
中的b
。As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:
So this will match all the characters in between two
#
s.Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all
b
s not followed by ana
." Your first attempt might be something likeb[^a]
, but that's not right: this will also match thebu
inbus
or thebo
inboy
, but you only wanted theb
. And it won't match theb
incab
, even though that's not followed by ana
, because there are no more characters to match.To do that correctly, you need a lookahead:
b(?!a)
. This says "match ab
but don't match ana
afterwards, and don't make that part of the match". Thus it'll match just theb
inbolo
, which is what you want; likewise it'll match theb
incab
.