当前位置：文江博客话题详情

regex regex-lookarounds lookbehind

正则表达式 ‘(?<=#)[^#]+(?=#)’工作？

发布于 2024-09-06 17:34:49 字数 498 浏览 13 评论 0 原文

我在 C# 程序中有以下正则表达式，并且很难理解它：

(?<=#)[^#]+(?=#)

我将其分解为我认为我理解的内容：

(?<=#)    a group, matching a hash. what's `?<=`?
[^#]+     one or more non-hashes (used to achieve non-greediness)
(?=#)     another group, matching a hash. what's the `?=`?

所以我遇到的问题是 ?<= 和 ?< 部分。从 MSDN 来看，? 用于命名组，但在这种情况下，尖括号永远不会关闭。

我在文档中找不到 ?= ，搜索它确实很困难，因为搜索引擎大多会忽略这些特殊字符。

原文

I have the following regex in a C# program, and have difficulties understanding it:

(?<=#)[^#]+(?=#)

I'll break it down to what I think I understood:

(?<=#)    a group, matching a hash. what's `?<=`?
[^#]+     one or more non-hashes (used to achieve non-greediness)
(?=#)     another group, matching a hash. what's the `?=`?

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

违心° 2024-09-13 17:34:49

它们被称为环视；它们允许您断言模式是否匹配，而无需实际进行匹配。有 4 种基本的环视：

积极的环视：看看我们是否可以匹配模式...
- (?=pattern) - ...当前位置的右侧（向前看）
- (?<=pattern) - ...当前位置的左侧（向后看后面）
负环视 - 看看我们是否无法匹配模式
- (?!pattern) - ...右侧
- (? - ...左边

作为一个简单的提醒，环顾一下：

= 是正，! 是负 em>
< 是向后看，否则是向前看

参考文献

regular-expressions.info/Lookarounds

但是为什么要使用lookarounds呢？

有人可能会认为上面的模式中的环视是不必要的，并且 #([^#]+)# 可以很好地完成这项工作（提取 \1 来获取非 #）。

不完全是。不同之处在于，由于环视与 # 不匹配，因此下次尝试查找匹配时可以再次“使用”它。简单地说，环视允许“匹配”重叠。

考虑以下输入字符串：

and #one# and #two# and #three#four#

现在，#([az]+)# 将给出以下匹配项 (如 rubular.com 上所示）：

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

将其与 (?<=#)[az]+(?=#) 进行比较，匹配：

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

不幸的是，这可以'无法在 rubular.com 上进行演示，因为它不支持lookbehind。但是，它确实支持前瞻，因此我们可以使用 #([az]+)(?=#) 执行类似的操作，它匹配 (如 rubular.com 上所示）：

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

参考文献

正则表达式.info/风味比较

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

Positive lookarounds: see if we CAN match the pattern...
- (?=pattern) - ... to the right of current position (look ahead)
- (?<=pattern) - ... to the left of current position (look behind)
Negative lookarounds - see if we can NOT match the pattern
- (?!pattern) - ... to the right
- (?<!pattern) - ... to the left

As an easy reminder, for a lookaround:

= is positive, ! is negative
< is look behind, otherwise it's look ahead

References

regular-expressions.info/Lookarounds

But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

Consider the following input string:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

References

regular-expressions.info/Flavor Comparison

回复收藏 0 原文

暮色兮凉城 2024-09-13 17:34:49

正如另一张海报提到的，这些是lookarounds，特殊的构造用于更改匹配的内容和时间。这表示：

(?<=#)    match but don't capture, the string `#`
            when followed by the next expression

[^#]+     one or more characters that are not `#`, and

(?=#)     match but don't capture, the string `#`
            when preceded by the last expression

因此这将匹配两个 # 之间的所有字符。

前瞻和后视在许多情况下都非常有用。例如，考虑规则“匹配所有后面不跟有 a 的 b”。您的第一次尝试可能类似于 b[^a]，但这是不对的：这也会匹配 bus 中的 bu 或 < code>bo 位于 boy 中，但您只想要 b。即使后面没有 a，它也不会匹配 cab 中的 b，因为没有更多的字符可以匹配。

要正确执行此操作，您需要先行查看：b(?!a)。这表示“匹配 b 但之后不匹配 a，并且不将其作为匹配的一部分”。因此，它只会匹配 bolo 中的 b，这正是您想要的；同样，它会匹配 cab 中的 b。

As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:

(?<=#)    match but don't capture, the string `#`
            when followed by the next expression

[^#]+     one or more characters that are not `#`, and

(?=#)     match but don't capture, the string `#`
            when preceded by the last expression

So this will match all the characters in between two #s.

Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all bs not followed by an a." Your first attempt might be something like b[^a], but that's not right: this will also match the bu in bus or the bo in boy, but you only wanted the b. And it won't match the b in cab, even though that's not followed by an a, because there are no more characters to match.

To do that correctly, you need a lookahead: b(?!a). This says "match a b but don't match an a afterwards, and don't make that part of the match". Thus it'll match just the b in bolo, which is what you want; likewise it'll match the b in cab.

回复收藏 0 原文

~没有更多了~

关于作者

长伴

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

正则表达式 ‘(?<=#)[^#]+(?=#)’工作？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

参考文献

但是为什么要使用lookarounds呢？

参考文献

References

But why use lookarounds?

References

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

正则表达式 ‘(?<=#)[^#]+(?=#)’工作？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

参考文献

但是为什么要使用lookarounds呢？

参考文献

References

But why use lookarounds?

References

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。