为什么回溯中的有限重复在某些风格中不起作用?
我想解析 dd/mm/yy 格式的日期中间的 2 位数字,但也允许使用单个数字表示日和月。
这就是我的想法:
(?<=^[\d]{1,2}\/)[\d]{1,2}
我想要一个 1 或 2 位数字 [\d]{1,2}
以及 1 或 2 位数字和斜杠 ^[\d]{ 1,2}\/
之前。
这不适用于许多组合,我已经测试了 10/10/10
、11/12/13
等...
但令我惊讶的是 ( ?<=^\d\d\/)[\d]{1,2}
有效。
但是如果 \d\d
匹配的话 [\d]{1,2}
也应该匹配,还是我错了?
I want to parse the 2 digits in the middle from a date in dd/mm/yy
format but also allowing single digits for day and month.
This is what I came up with:
(?<=^[\d]{1,2}\/)[\d]{1,2}
I want a 1 or 2 digit number [\d]{1,2}
with a 1 or 2 digit number and slash ^[\d]{1,2}\/
before it.
This doesn't work on many combinations, I have tested 10/10/10
, 11/12/13
, etc...
But to my surprise (?<=^\d\d\/)[\d]{1,2}
worked.
But the [\d]{1,2}
should also match if \d\d
did, or am I wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
关于后向支持
主要的正则表达式风格对后向有不同的支持;有些施加了某些限制,有些甚至根本不支持。
参考文献
Python
在 Python 中,仅支持固定长度后向查找,您的原始模式会引发错误,因为
\d{1,2}
显然没有固定长度。您可以通过交替使用两个不同的固定长度lookbehind来“修复”此问题,例如:或者您可以将两个lookbehind作为非捕获组的替代品:(
请注意,您可以只使用
\d< /code> 不带括号)。
也就是说,使用捕获组可能要简单得多:
请注意
findall
将返回组 1 捕获的内容。捕获组比lookbehind得到更广泛的支持,并且通常会导致更可读的模式(例如在本例中)。此代码片段说明了上述所有要点:
参考文献
Java
Java 仅支持有限长度后向查找,因此您可以像在原始模式中一样使用
\d{1,2}
。以下代码片段演示了这一点:请注意,
(?m)
是嵌入的Pattern.MULTILINE
以便^
匹配每行的开头。另请注意,由于\
是字符串文字的转义字符,因此必须编写"\\"
才能在 Java 中获得一个反斜杠。C-Sharp
C# 支持lookbehind 的完整正则表达式。以下代码片段显示了如何在后行中使用
+
重复:请注意,与 Java 不同,在 C# 中,您可以使用 @-引号字符串 这样你就不必转义
\
。为了完整起见,以下是在 C# 中使用捕获组选项的方法:
根据前面的
文本
,将打印:相关问题
On lookbehind support
Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.
References
Python
In Python, where only fixed length lookbehind is supported, your original pattern raises an error because
\d{1,2}
obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:Or perhaps you can put both lookbehinds as alternates of a non-capturing group:
(note that you can just use
\d
without the brackets).That said, it's probably much simpler to use a capturing group instead:
Note that
findall
returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).This snippet illustrates all of the above points:
References
Java
Java supports only finite-length lookbehind, so you can use
\d{1,2}
like in the original pattern. This is demonstrated by the following snippet:Note that
(?m)
is the embeddedPattern.MULTILINE
so that^
matches the start of every line. Note also that since\
is an escape character for string literals, you must write"\\"
to get one backslash in Java.C-Sharp
C# supports full regex on lookbehind. The following snippet shows how you can use
+
repetition on a lookbehind:Note that unlike Java, in C# you can use @-quoted string so that you don't have to escape
\
.For completeness, here's how you'd use the capturing group option in C#:
Given the previous
text
, this prints:Related questions
除非有问题中未注明的使用后视的具体原因,否则简单地匹配整个内容并仅捕获您感兴趣的部分怎么样?
JavaScript 示例:
Unless there's a specific reason for using the lookbehind which isn't noted in the question, how about simply matching the whole thing and only capturing the bit you're interested in instead?
JavaScript example:
引用regular-expressions.info:
换句话说,您的正则表达式不起作用,因为您在lookbehind中使用可变宽度表达式,并且您的正则表达式引擎不支持它。
To quote regular-expressions.info:
In other words your regex does not work because you're using a variable-width expression inside a lookbehind and your regex engine does not support that.
除了 @polygenelubricants 列出的那些之外,“仅限固定长度”规则还有两个例外。在 PCRE(PHP、Apache 等的正则表达式引擎)和 Oniguruma(Ruby 1.9、Textmate)中,lookbehind 可能包含一个替换,其中每个替换可能匹配不同数量的字符,如下所示只要每个选项的长度是固定的。例如:
请注意,交替必须位于lookbehind 子表达式的顶层。您可能像我一样,试图分解出共同的元素,如下所示:
……但这行不通;在顶层,子表达式现在由具有非固定长度的单个替代项组成。
第二个例外更有用:
\K
,受 Perl 和 PCRE 支持。它实际上意味着“假装比赛真的从这里开始”。正则表达式中出现在其前面的任何内容都被视为积极的后向查找。与 .NET Lookbehind 一样,没有任何限制;正常正则表达式中出现的任何内容都可以在\K
之前使用。但大多数时候,当有人遇到向后查找问题时,事实证明他们甚至不应该使用它们。正如 @insin 指出的,通过使用捕获组可以更轻松地解决这个问题。
编辑:差点忘了 JGSoft,EditPad Pro 和 PowerGrep 使用的正则表达式风格;与 .NET 一样,它具有完全不受限制的后向查找(无论是正向查找还是负向查找)。
In addition to those listed by @polygenelubricants, there are two more exceptions to the "fixed length only" rule. In PCRE (the regex engine for PHP, Apache, et al) and Oniguruma (Ruby 1.9, Textmate), a lookbehind may consist of an alternation in which each alternative may match a different number of characters, as long as the length of each alternative is fixed. For example:
Note that the alternation has to be at the top level of the lookbehind subexpression. You might, like me, be tempted to factor out the common elements, like this:
...but it wouldn't work; at the top level, the subexpression now consists of a single alternative with a non-fixed length.
The second exception is much more useful:
\K
, supported by Perl and PCRE. It effectively means "pretend the match really started here." Whatever appears before it in the regex is treated as a positive lookbehind. As with .NET lookbehinds, there are no restrictions; whatever can appear in a normal regex can be used before the\K
.But most of the time, when someone has a problem with lookbehinds, it turns out they shouldn't even be using them. As @insin pointed out, this problem can be solved much more easily by using a capturing group.
EDIT: Almost forgot JGSoft, the regex flavor used by EditPad Pro and PowerGrep; like .NET, it has completely unrestricted lookbehinds, positive and negative.