带有可选部分的正则表达式不会创建反向引用

发布于 2024-09-05 04:18:19 字数 703 浏览 8 评论 0原文

我想在文本行末尾匹配一个可选标签。

输入文本示例：

The quick brown fox jumps over the lazy dog {tag}

我想匹配大括号中的部分并创建对其的反向引用。

我的正则表达式看起来像这样：（

^.*(\{\w+\})?

有点简化，我还在标记之前匹配部分）：

它匹配行（带和不带标记），但不会创建对标记的反向引用。

如果我删除“？”字符，所以正则表达式是：

^.*(\{\w+\})

它创建对标记的反向引用，但不匹配没有标记的行。

我从 http://www.regular-expressions.info/refadv.html 了解到可选运算符不会影响反向引用：

圆括号将正则表达式分组在它们之间。它们捕获与其中的正则表达式匹配的文本，可以在反向引用中重用，并且它们允许您将正则表达式运算符应用于整个分组的正则表达式。

但一定是误会了什么。

如何使标签部分可选并在存在时创建反向引用？

原文

I want to match an optional tag at the end of a line of text.

Example input text:

The quick brown fox jumps over the lazy dog {tag}

I want to match the part in curly-braces and create a back-reference to it.

My regex looks like this:

^.*(\{\w+\})?

(somewhat simplified, I'm also matching parts before the tag):

It matches the lines ok (with and without the tag) but doesn't create a back-reference to the tag.

If I remove the '?' character, so regex is:

^.*(\{\w+\})

It creates a back-reference to the tag but then doesn't match lines without the tag.

I understood from http://www.regular-expressions.info/refadv.html that the optional operator wouldn't affect the backreference:

Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.

but must've misunderstood something.

How do I make the tag part optional and create a back-reference when it exists?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白云不回头 2024-09-12 04:18:19

这不是反向引用问题，问题在于仅读取与 .* 匹配的文本即可满足正则表达式。它并不觉得有必要继续阅读可选的结束标签。如果您确实要读到行尾，最简单的解决方案是附加一个 $ （美元符号）以强制正则表达式匹配整行。

编辑

顺便说一句，我没有从字面上理解你的正则表达式，因为你说它与其他东西匹配，但要清楚 .* 将消耗整行。您需要类似 [^{]* 的内容来防止标签被吞没。我想这对你来说不是问题。

回复收藏 0 原文

懵少女 2024-09-12 04:18:19

除了其他人解释的之外，您可能还想让 .* “懒惰”：

^.*?(\{\w+\})?

In addition to what others have explained, you might want to make the .* "lazy":

^.*?(\{\w+\})?

回复收藏 0 原文

梓梦 2024-09-12 04:18:19

正如 David Gladfelter 所说，实际的问题是，当你将其设置为可选时，它与 ~~不匹配；但是，他提出的修复方案不起作用~~。 编辑1：您需要使用他在编辑中添加的内容（在我撰写本文时编写）。问题在于量词 (*、+、?、{n,m}) 是贪婪：它们总是尽可能多地匹配。因此，当您编写 ^.*(\{\w+\})? 时，.* 将始终匹配整行，因为空匹配满足可选组。另请注意，虽然 ? 是贪婪的，但第一个贪婪（.*）优先。如果您只允许在该可选组周围使用大括号，那么您可以通过明确说明来解决您的问题：^[^\{]*(\{\w+\})?。这样，第一个块将匹配第一个大括号之前的所有内容，然后（因为 ? 是贪婪的）匹配大括号单词（如果可以的话）。

通常，解决此问题的另一种方法是通过附加 ?: *? 使量词变得惰性（或非贪婪、最小等）。代码>、<代码>+?、<代码>??和<代码>{n,m}?。然而，这在这里对你没有帮助：相反，如果你这样做 ^.*?(\{\w+\})?，懒惰的 .*? 会尝试匹配零个字符，成功，然后可选组将不匹配。不过，虽然它在这里不起作用，但它是您工具箱中的一个有用的工具。编辑 1：另外，请注意，尽管它们在 C# 中可用，但并非所有正则表达式引擎都可用。

回复收藏 0 原文

高速公鹿 2024-09-12 04:18:19

谢谢你们。我使用了答案、非贪婪修饰符和行尾匹配的组合，这似乎可以解决问题，所以正则表达式现在是：

^.*?(\{\w+\})?$

我不想在第一部分使用 [^{]*匹配，因为非标签大括号可能出现在此处，但标签始终位于行尾。

感谢您的回答，他们都很有帮助。

Thanks guys. I used a combination of answers, the not-greedy modifier and the end-of-line match, which seems to do the trick, so regex is now:

^.*?(\{\w+\})?$

I didn't want to use [^{]* for the first part of the match, as non-tag curly brackets may appear here, but tags will always be at the end of the line.

Thanks for the answers, they were all helpful.

回复收藏 0 原文

灰色世界里的红玫瑰 2024-09-12 04:18:19

如果您只对标签感兴趣，而不关心字符串的其余部分，那么只需将标签与此正则表达式进行匹配（在 rubular.com 上查看）：

\{(\w+)\}$

也就是说，您尝试在以下位置匹配一些 {word}字符串的末尾。如果不存在，那就太糟糕了，没有匹配项。不需要 ? 修饰符或不情愿的 .* 以及所有这些东西。

在 C# 中，您甚至可能想要使用 RegexOptions.RightToLeft，因为您无论如何都在尝试匹配后缀，所以可能是这样的：

string[] lines = {
  "The quick brown fox jumps over the lazy dog",
  "The quick brown fox jumps over the lazy dog {tag}",
  "The quick brown fox jumps over the {lazy} dog",
  "The quick brown fox jumps over the {lazy} {dog}",
};

Regex r = new Regex(@"\{(\w+)\}$", RegexOptions.RightToLeft);

foreach (string line in lines) {
  Console.WriteLine("[" + r.Match(line).Groups[1] + "]");
}

This prints (如 ideone.com 上所示）：

[]
[tag]
[]
[dog]

If you're only interested in the tag, and doesn't care about the rest of the string, then you'd make your life much easier by just matching the tag with this regex (see it on rubular.com):

\{(\w+)\}$

That is, you're trying to match some {word} at the end of the string. If it's not there, then too bad, there's no match. There is no need for a ? modifier or a reluctant .* and all that stuff.

In C#, you may even want to use RegexOptions.RightToLeft, since you're trying to match a suffix anyway, so perhaps something like this:

string[] lines = {
  "The quick brown fox jumps over the lazy dog",
  "The quick brown fox jumps over the lazy dog {tag}",
  "The quick brown fox jumps over the {lazy} dog",
  "The quick brown fox jumps over the {lazy} {dog}",
};

Regex r = new Regex(@"\{(\w+)\}$", RegexOptions.RightToLeft);

foreach (string line in lines) {
  Console.WriteLine("[" + r.Match(line).Groups[1] + "]");
}

This prints (as seen on ideone.com):