为什么回溯中的有限重复在某些风格中不起作用？

发布于 2024-09-08 01:17:49 字数 424 浏览 15 评论 0原文

我想解析 dd/mm/yy 格式的日期中间的 2 位数字，但也允许使用单个数字表示日和月。

这就是我的想法：

(?<=^[\d]{1,2}\/)[\d]{1,2}

我想要一个 1 或 2 位数字 [\d]{1,2} 以及 1 或 2 位数字和斜杠 ^[\d]{ 1,2}\/ 之前。

这不适用于许多组合，我已经测试了 10/10/10、11/12/13 等...

但令我惊讶的是 ( ?<=^\d\d\/)[\d]{1,2} 有效。

但是如果 \d\d 匹配的话 [\d]{1,2} 也应该匹配，还是我错了？

原文

I want to parse the 2 digits in the middle from a date in dd/mm/yy format but also allowing single digits for day and month.

This is what I came up with:

(?<=^[\d]{1,2}\/)[\d]{1,2}

I want a 1 or 2 digit number [\d]{1,2} with a 1 or 2 digit number and slash ^[\d]{1,2}\/ before it.

This doesn't work on many combinations, I have tested 10/10/10, 11/12/13, etc...

But to my surprise (?<=^\d\d\/)[\d]{1,2} worked.

But the [\d]{1,2} should also match if \d\d did, or am I wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱*していゐ 2024-09-15 01:17:49

关于后向支持

主要的正则表达式风格对后向有不同的支持；有些施加了某些限制，有些甚至根本不支持。

Javascript：不支持
Python：仅限固定长度
Java：仅限有限长度
.NET：无限制

参考文献

regular- Expressions.info/Flavor Comparison

Python

在 Python 中，仅支持固定长度后向查找，您的原始模式会引发错误，因为 \d{1,2} 显然没有固定长度。您可以通过交替使用两个不同的固定长度lookbehind来“修复”此问题，例如：

(?<=^\d\/)\d{1,2}|(?<=^\d\d\/)\d{1,2}

或者您可以将两个lookbehind作为非捕获组的替代品：（

(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}

请注意，您可以只使用 \d< /code> 不带括号）。

也就是说，使用捕获组可能要简单得多：

^\d{1,2}\/(\d{1,2})

请注意 findall将返回组 1 捕获的内容。捕获组比lookbehind得到更广泛的支持，并且通常会导致更可读的模式（例如在本例中）。

此代码片段说明了上述所有要点：

p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'^\d{1,2}\/(\d{1,2})')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")

参考文献

regular-expressions.info/Lookarounds，字符类，交替，捕获组< /a>

Java

Java 仅支持有限长度后向查找，因此您可以像在原始模式中一样使用 \d{1,2} 。以下代码片段演示了这一点：

    String text =
        "12/34/56 date\n" +
        "1/23/45 another date\n";

    Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.println(m.group());
    } // "34", "23"

请注意，(?m) 是嵌入的 Pattern.MULTILINE 以便 ^ 匹配每行的开头。另请注意，由于 \ 是字符串文字的转义字符，因此必须编写 "\\" 才能在 Java 中获得一个反斜杠。

C-Sharp

C# 支持lookbehind 的完整正则表达式。以下代码片段显示了如何在后行中使用 + 重复：

var text = @"
1/23/45
12/34/56
123/45/67
1234/56/78
";

Regex r = new Regex(@"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine(m);
} // "23", "34", "45", "56"

请注意，与 Java 不同，在 C# 中，您可以使用 @-引号字符串这样你就不必转义 \。

为了完整起见，以下是在 C# 中使用捕获组选项的方法：

Regex r = new Regex(@"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}

根据前面的文本，将打印：

Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56

On lookbehind support

Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.

Javascript: not supported
Python: fixed length only
Java: finite length only
.NET: no restriction

References

regular-expressions.info/Flavor comparison

Python

In Python, where only fixed length lookbehind is supported, your original pattern raises an error because \d{1,2} obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:

(?<=^\d\/)\d{1,2}|(?<=^\d\d\/)\d{1,2}

Or perhaps you can put both lookbehinds as alternates of a non-capturing group:

(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}

(note that you can just use \d without the brackets).

That said, it's probably much simpler to use a capturing group instead:

^\d{1,2}\/(\d{1,2})

Note that findall returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).

This snippet illustrates all of the above points:

p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'^\d{1,2}\/(\d{1,2})')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")

References

regular-expressions.info/Lookarounds, Character classes, Alternation, Capturing groups

Java

Java supports only finite-length lookbehind, so you can use \d{1,2} like in the original pattern. This is demonstrated by the following snippet:

    String text =
        "12/34/56 date\n" +
        "1/23/45 another date\n";

    Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.println(m.group());
    } // "34", "23"

Note that (?m) is the embedded Pattern.MULTILINE so that ^ matches the start of every line. Note also that since \ is an escape character for string literals, you must write "\\" to get one backslash in Java.

C-Sharp

C# supports full regex on lookbehind. The following snippet shows how you can use + repetition on a lookbehind:

var text = @"
1/23/45
12/34/56
123/45/67
1234/56/78
";

Regex r = new Regex(@"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine(m);
} // "23", "34", "45", "56"

Note that unlike Java, in C# you can use @-quoted string so that you don't have to escape \.

For completeness, here's how you'd use the capturing group option in C#:

Regex r = new Regex(@"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}

Given the previous text, this prints:

Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56

Related questions

How can I match on, but exclude a regex pattern?

回复收藏 0 原文

找回味觉 2024-09-15 01:17:49

除非有问题中未注明的使用后视的具体原因，否则简单地匹配整个内容并仅捕获您感兴趣的部分怎么样？

JavaScript 示例：

>>> /^\d{1,2}\/(\d{1,2})\/\d{1,2}$/.exec("12/12/12")[1]
"12"

Unless there's a specific reason for using the lookbehind which isn't noted in the question, how about simply matching the whole thing and only capturing the bit you're interested in instead?

JavaScript example:

>>> /^\d{1,2}\/(\d{1,2})\/\d{1,2}$/.exec("12/12/12")[1]
"12"

回复收藏 0 原文

不疑不惑不回忆 2024-09-15 01:17:49

引用regular-expressions.info：

坏消息是大多数正则表达式
口味不允许你只使用
向后查找中的任何正则表达式，因为
他们无法应用正则表达式
向后。因此，定期
表达式引擎需要能够
计算出要后退多少步
在检查后向检查之前。
因此，许多正则表达式风格，
包括 Perl 使用的那些和
Python，只允许定长
字符串。您可以使用任何正则表达式
匹配的长度可以是
预定的。这意味着您可以使用
文字文本和字符类。
您不能使用重复或可选
项目。你可以使用交替，但是
仅当所有选项都在交替中时
长度相同。

换句话说，您的正则表达式不起作用，因为您在lookbehind中使用可变宽度表达式，并且您的正则表达式引擎不支持它。

回复收藏 0 原文

爱已欠费 2024-09-15 01:17:49

除了 @polygenelubricants 列出的那些之外，“仅限固定长度”规则还有两个例外。在 PCRE（PHP、Apache 等的正则表达式引擎）和 Oniguruma（Ruby 1.9、Textmate）中，lookbehind 可能包含一个替换，其中每个替换可能匹配不同数量的字符，如下所示只要每个选项的长度是固定的。例如：

(?<=\b\d\d/|\b\d/)\d{1,2}(?=/\d{2}\b)

请注意，交替必须位于lookbehind 子表达式的顶层。您可能像我一样，试图分解出共同的元素，如下所示：

(?<=\b(?:\d\d/|\d)/)\d{1,2}(?=/\d{2}\b)

……但这行不通；在顶层，子表达式现在由具有非固定长度的单个替代项组成。

第二个例外更有用：\K，受 Perl 和 PCRE 支持。它实际上意味着“假装比赛真的从这里开始”。正则表达式中出现在其前面的任何内容都被视为积极的后向查找。与 .NET Lookbehind 一样，没有任何限制；正常正则表达式中出现的任何内容都可以在 \K 之前使用。

\b\d{1,2}/\K\d{1,2}(?=/\d{2}\b)

但大多数时候，当有人遇到向后查找问题时，事实证明他们甚至不应该使用它们。正如 @insin 指出的，通过使用捕获组可以更轻松地解决这个问题。

编辑：差点忘了 JGSoft，EditPad Pro 和 PowerGrep 使用的正则表达式风格；与 .NET 一样，它具有完全不受限制的后向查找（无论是正向查找还是负向查找）。

In addition to those listed by @polygenelubricants, there are two more exceptions to the "fixed length only" rule. In PCRE (the regex engine for PHP, Apache, et al) and Oniguruma (Ruby 1.9, Textmate), a lookbehind may consist of an alternation in which each alternative may match a different number of characters, as long as the length of each alternative is fixed. For example:

(?<=\b\d\d/|\b\d/)\d{1,2}(?=/\d{2}\b)

Note that the alternation has to be at the top level of the lookbehind subexpression. You might, like me, be tempted to factor out the common elements, like this:

(?<=\b(?:\d\d/|\d)/)\d{1,2}(?=/\d{2}\b)

...but it wouldn't work; at the top level, the subexpression now consists of a single alternative with a non-fixed length.

The second exception is much more useful: \K, supported by Perl and PCRE. It effectively means "pretend the match really started here." Whatever appears before it in the regex is treated as a positive lookbehind. As with .NET lookbehinds, there are no restrictions; whatever can appear in a normal regex can be used before the \K.

\b\d{1,2}/\K\d{1,2}(?=/\d{2}\b)

But most of the time, when someone has a problem with lookbehinds, it turns out they shouldn't even be using them. As @insin pointed out, this problem can be solved much more easily by using a capturing group.

EDIT: Almost forgot JGSoft, the regex flavor used by EditPad Pro and PowerGrep; like .NET, it has completely unrestricted lookbehinds, positive and negative.

回复收藏 0 原文

~没有更多了~

关于作者

翻了热茶

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

为什么回溯中的有限重复在某些风格中不起作用？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于后向支持

参考文献

Python

参考文献

Java

C-Sharp

相关问题

On lookbehind support

References

Python

References

Java

C-Sharp

Related questions

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

为什么回溯中的有限重复在某些风格中不起作用？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于后向支持

参考文献

Python

参考文献

Java

C-Sharp

相关问题

On lookbehind support

References

Python

References

Java

C-Sharp

Related questions

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。