交替中的混乱

发布于 2024-11-16 10:55:23 字数 707 浏览 0 评论 0原文

假设在正则表达式中，如果匹配一个替代中的一个替代项，即使还剩下更多替代项（替代项之外的正则表达式中没有其他标记），它也会在那里停止。

此模式搜索一个双字（例如，this this)

\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)

如果我引入这个主题，我有一个困惑：

它与模式相匹配。

"<i>whatever<i>         whatever"

\b([az]+) 匹配

((?:<[^>]+>|\s)+) 跟随一个 TAG，因此第二个另类匹配。

(\1\b) 必须匹配 if 跟在第一个括号中反向引用的相同单词之后。

如果标签后面不是“(\1\b)”，而是空格，为什么要匹配。

我知道在交替中存在\s。

但是TAG匹配不是应该消耗掉交替吗？

为什么 \s 替代方案仍然存在？

原文

Suppos that within a regex, if match one alternative from an alternation it stop right there even if still more alternatives left (there are no other tokens in the regex outside the alternation).

Source

This pattern that search one double word (e.g., this this)

\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)

I have one confusion if I introduce this subject:

It match with the patern.

"<i>whatever<i>         whatever"

\b([a-z]+) Match

((?:<[^>]+>|\s)+) Follows one TAG, so the 2nd alternative match.

(\1\b) Have to match if follows the same word backreferenced in the first parentheses.

Why match if after the tag not follows the '(\1\b)', follows whitespaces.

I know that within the alternation exist \s.

But is not supposed that the TAG match consume the alternation?

Why the \s alternative still alive?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

り繁华旳梦境 2024-11-23 10:55:23

交替由 + 量词控制：

(?:\s|<[^>]+>)+

...因此它会尝试多次匹配。每次，它都可能尝试两种选择：首先\s，如果失败，则<[^>]+>。

第一次\s匹配失败，但<[^>]+>匹配成功。

第二次，\s 匹配一个空格。

第三次，\s 匹配另一个空格。

...依此类推，直到所有空间都被耗尽。

The alternation is controlled by a + quantifier:

(?:\s|<[^>]+>)+

...so it tries to match multiple times. Each time, it may try both alternatives: first \s, and if that fails, <[^>]+>.

The first time, \s fails to match, but <[^>]+> succeeds in matching <i>.

The second time, \s matches one space.

The third time, \s matches another space.

...and so on, until all the spaces are consumed.

回复收藏 0 原文

月下客 2024-11-23 10:55:23

+ 表示“一个或多个 (?:\s|<[^>]+>)”。是的，它们中的第一个消耗了标签，但是在(\1\b)后面可能有无限数量的附加标签或空格。

\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)
                         ^

That + means "one or more of (?:\s|<[^>]+>)". Yes, the first of them consumes the tag, but there may be an infinite number of additional tags or whitespace before (\1\b) follows.

\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)
                         ^

回复收藏 0 原文

~没有更多了~

关于作者

涙—继续流

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

交替中的混乱

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

末蓝

年少掌心

党海生

飞翔的企鹅

鹿港小镇

wookoon

友情链接

交替中的混乱

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

末蓝

年少掌心

党海生

飞翔的企鹅

鹿港小镇

wookoon

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。