后向引用中的反向引用
您可以在后视中使用反向引用吗?
假设我想要split
在我后面重复两次的字符。
String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!
System.out.println(java.util.Arrays.toString(
"Bazooka killed the poor aardvark (yummy!)"
.split(REGEX2)
)); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"
使用 REGEX2
(其中反向引用位于嵌套在后向查找中的前向查找中)可以工作,但是 REGEX1
在运行时给出此错误:
Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
^
有点 我认为这是有道理的,因为一般来说,反向引用可以捕获任何长度的字符串(如果正则表达式编译器更聪明一点,它可以确定 \1
是 (. )
在这种情况下,因此具有有限的长度)。
那么有没有办法在lookbehind中使用反向引用呢?
如果没有,您是否可以使用这种嵌套的前瞻来解决它?还有其他常用的技术吗?
Can you use backreferences in a lookbehind?
Let's say I want to split
wherever behind me a character is repeated twice.
String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!
System.out.println(java.util.Arrays.toString(
"Bazooka killed the poor aardvark (yummy!)"
.split(REGEX2)
)); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"
Using REGEX2
(where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1
gives this error at run-time:
Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
^
This sort of make sense, I suppose, because in general the backreference can capture a string of any length (if the regex compiler is a bit smarter, though, it could determine that \1
is (.)
in this case, and therefore has a finite length).
So is there a way to use a backreference in a lookbehind?
And if there isn't, can you always work around it using this nested lookahead? Are there other commonly-used techniques?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看来您的怀疑是正确的,反向引用通常不能在 Java 后向中使用。您提出的解决方法使后视的有限长度变得明确,并且对我来说看起来非常聪明。
我很想知道 Python 使用这个正则表达式做了什么。 Python只支持固定长度的lookbehind,不像Java那样支持有限长度,但是这个正则表达式是固定长度的。我无法直接使用
re.split()
,因为 Python 的re.split()
永远不会在空匹配上拆分,但我认为我在中发现了一个错误re.sub()
:lookbehind 匹配两个重复字符之间的!
Looks like your suspicion is correct that backreferences generally can't be used in Java lookbehinds. The workaround you proposed makes the finite length of the lookbehind explicit and looks very clever to me.
I was intrigued to find out what Python does with this regex. Python only supports fixed-length lookbehind, not finite-length like Java, but this regex is fixed length. I couldn't use
re.split()
directly because Python'sre.split()
never splits on an empty match, but I think I found a bug inre.sub()
:The lookbehind matches between the two duplicate characters!