带有可选部分的正则表达式不会创建反向引用

发布于 2024-09-05 04:18:19 字数 703 浏览 1 评论 0原文

我想在文本行末尾匹配一个可选标签。

输入文本示例:

The quick brown fox jumps over the lazy dog {tag}

我想匹配大括号中的部分并创建对其的反向引用。

我的正则表达式看起来像这样:(

^.*(\{\w+\})?

有点简化,我还在标记之前匹配部分):

它匹配行(带和不带标记),但不会创建对标记的反向引用。

如果我删除“?”字符,所以正则表达式是:

^.*(\{\w+\})

它创建对标记的反向引用,但不匹配没有标记的行。

我从 http://www.regular-expressions.info/refadv.html 了解到可选运算符不会影响反向引用:

圆括号将正则表达式分组在它们之间。它们捕获与其中的正则表达式匹配的文本,可以在反向引用中重用,并且它们允许您将正则表达式运算符应用于整个分组的正则表达式。

但一定是误会了什么。

如何使标签部分可选并在存在时创建反向引用?

I want to match an optional tag at the end of a line of text.

Example input text:

The quick brown fox jumps over the lazy dog {tag}

I want to match the part in curly-braces and create a back-reference to it.

My regex looks like this:

^.*(\{\w+\})?

(somewhat simplified, I'm also matching parts before the tag):

It matches the lines ok (with and without the tag) but doesn't create a back-reference to the tag.

If I remove the '?' character, so regex is:

^.*(\{\w+\})

It creates a back-reference to the tag but then doesn't match lines without the tag.

I understood from http://www.regular-expressions.info/refadv.html that the optional operator wouldn't affect the backreference:

Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.

but must've misunderstood something.

How do I make the tag part optional and create a back-reference when it exists?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

白云不回头 2024-09-12 04:18:19

这不是反向引用问题,问题在于仅读取与 .* 匹配的文本即可满足正则表达式。它并不觉得有必要继续阅读可选的结束标签。如果您确实要读到行尾,最简单的解决方案是附加一个 $ (美元符号)以强制正则表达式匹配整行。

编辑

顺便说一句,我没有从字面上理解你的正则表达式,因为你说它与其他东西匹配,但要清楚 .* 将消耗整行。您需要类似 [^{]* 的内容来防止标签被吞没。我想这对你来说不是问题。

It is not a backreference problem, the problem is that the regular expression was satisfied by just reading in the text that matched .*. It didn't feel compelled to continue reading to read the optional end-tag. The simplest solution if you're truly reading to the end of the line is to append a $ (dollar sign) to force the regular expression to match the whole line.

edit

BTW, I didn't take your reg-ex literally since you said it matches other stuff, but just to be clear .* will consume the whole line. You'd need something like [^{]* to prevent the tag from getting swallowed. I'm guessing that's not a problem for you.

懵少女 2024-09-12 04:18:19

除了其他人解释的之外,您可能还想让 .* “懒惰”:

^.*?(\{\w+\})?

In addition to what others have explained, you might want to make the .* "lazy":

^.*?(\{\w+\})?
梓梦 2024-09-12 04:18:19

正如 David Gladfelter 所说,实际的问题是,当你将其设置为可选时,它与 不匹配;但是,他提出的修复方案不起作用编辑1:您需要使用他在编辑中添加的内容(在我撰写本文时编写)。问题在于量词 (*+?{n,m}) 是 贪婪:它们总是尽可能多地匹配。因此,当您编写 ^.*(\{\w+\})? 时,.* 将始终匹配整行,因为空匹配满足可选组。另请注意,虽然 ? 是贪婪的,但第一个贪婪(.*)优先。如果您只允许在该可选组周围使用大括号,那么您可以通过明确说明来解决您的问题:^[^\{]*(\{\w+\})?。这样,第一个块将匹配第一个大括号之前的所有内容,然后(因为 ? 是贪婪的)匹配大括号单词(如果可以的话)。

通常,解决此问题的另一种方法是通过附加 ?: *? 使量词变得惰性(或非贪婪、最小等)。代码>、<代码>+?、<代码>??和<代码>{n,m}?。然而,这在这里对你没有帮助:相反,如果你这样做 ^.*?(\{\w+\})?,懒惰的 .*? 会尝试匹配零个字符,成功,然后可选组将不匹配。不过,虽然它在这里不起作用,但它是您工具箱中的一个有用的工具。 编辑 1: 另外,请注意,尽管它们在 C# 中可用,但并非所有正则表达式引擎都可用。

As David Gladfelter said, the actual problem is that when you make it optional, it doesn't match; however, his proposed fix won't work. Edit 1: You'll need to use what he put in his edit (which got written as I was writing this). The problem is that quantifiers (*, +, ?, {n,m}) are greedy: they always match as much as they possibly can. Thus, when you write ^.*(\{\w+\})?, the .* will always match the whole line, because an empty match satisfies the optional group. Also note that although ? is greedy, the first greediness (of .*) takes precedence. If you're only allowed to have curly brackets around that optional group, then you can solve your problem by saying so explicitly: ^[^\{]*(\{\w+\})?. This way, the first chunk will match everything up to the first curly bracket, and then (since ? is greedy) match the curly-bracketed word if it can.

Often, another way to solve this is to make the quantifiers lazy (or non-greedy, minimal, etc.) by appending a ?: *?, +?, ??, and {n,m}?. However, this won't help you here: instead, if you do ^.*?(\{\w+\})?, the lazy .*? will try to match zero characters, succeed, and then the optional group won't match. Still, though it won't work here, it's a useful tool in your toolbox. Edit 1: Also, note that these aren't available in all regex engines, although they are available in C#.

高速公鹿 2024-09-12 04:18:19

谢谢你们。我使用了答案、非贪婪修饰符和行尾匹配的组合,这似乎可以解决问题,所以正则表达式现在是:

^.*?(\{\w+\})?$ 

我不想在第一部分使用 [^{]*匹配,因为非标签大括号可能出现在此处,但标签始终位于行尾。

感谢您的回答,他们都很有帮助。

Thanks guys. I used a combination of answers, the not-greedy modifier and the end-of-line match, which seems to do the trick, so regex is now:

^.*?(\{\w+\})?$ 

I didn't want to use [^{]* for the first part of the match, as non-tag curly brackets may appear here, but tags will always be at the end of the line.

Thanks for the answers, they were all helpful.

灰色世界里的红玫瑰 2024-09-12 04:18:19

如果您只对标签感兴趣,而不关心字符串的其余部分,那么只需将标签与此正则表达式进行匹配(在 rubular.com 上查看):

\{(\w+)\}$

也就是说,您尝试在以下位置匹配一些 {word}字符串的末尾。如果不存在,那就太糟糕了,没有匹配项。不需要 ? 修饰符或不情愿的 .* 以及所有这些东西。

在 C# 中,您甚至可能想要使用 RegexOptions.RightToLeft,因为您无论如何都在尝试匹配后缀,所以可能是这样的:

string[] lines = {
  "The quick brown fox jumps over the lazy dog",
  "The quick brown fox jumps over the lazy dog {tag}",
  "The quick brown fox jumps over the {lazy} dog",
  "The quick brown fox jumps over the {lazy} {dog}",
};

Regex r = new Regex(@"\{(\w+)\}$", RegexOptions.RightToLeft);

foreach (string line in lines) {
  Console.WriteLine("[" + r.Match(line).Groups[1] + "]");
}

This prints (如 ideone.com 上所示):

[]
[tag]
[]
[dog]

If you're only interested in the tag, and doesn't care about the rest of the string, then you'd make your life much easier by just matching the tag with this regex (see it on rubular.com):

\{(\w+)\}$

That is, you're trying to match some {word} at the end of the string. If it's not there, then too bad, there's no match. There is no need for a ? modifier or a reluctant .* and all that stuff.

In C#, you may even want to use RegexOptions.RightToLeft, since you're trying to match a suffix anyway, so perhaps something like this:

string[] lines = {
  "The quick brown fox jumps over the lazy dog",
  "The quick brown fox jumps over the lazy dog {tag}",
  "The quick brown fox jumps over the {lazy} dog",
  "The quick brown fox jumps over the {lazy} {dog}",
};

Regex r = new Regex(@"\{(\w+)\}$", RegexOptions.RightToLeft);

foreach (string line in lines) {
  Console.WriteLine("[" + r.Match(line).Groups[1] + "]");
}

This prints (as seen on ideone.com):

[]
[tag]
[]
[dog]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文