后向引用中的反向引用

发布于 2024-08-30 07:56:44 字数 828 浏览 4 评论 0原文

您可以在后视中使用反向引用吗?

假设我想要split在我后面重复两次的字符。

    String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
    String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!

    System.out.println(java.util.Arrays.toString(
        "Bazooka killed the poor aardvark (yummy!)"
        .split(REGEX2)
    )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"

使用 REGEX2 (其中反向引用位于嵌套在后向查找中的前向查找中)可以工作,但是 REGEX1 在运行时给出此错误:

Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
        ^

有点 我认为这是有道理的,因为一般来说,反向引用可以捕获任何长度的字符串(如果正则表达式编译器更聪明一点,它可以确定 \1(. ) 在这种情况下,因此具有有限的长度)。

那么有没有办法在lookbehind中使用反向引用呢?

如果没有,您是否可以使用这种嵌套的前瞻来解决它?还有其他常用的技术吗?

Can you use backreferences in a lookbehind?

Let's say I want to split wherever behind me a character is repeated twice.

    String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
    String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!

    System.out.println(java.util.Arrays.toString(
        "Bazooka killed the poor aardvark (yummy!)"
        .split(REGEX2)
    )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"

Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives this error at run-time:

Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
        ^

This sort of make sense, I suppose, because in general the backreference can capture a string of any length (if the regex compiler is a bit smarter, though, it could determine that \1 is (.) in this case, and therefore has a finite length).

So is there a way to use a backreference in a lookbehind?

And if there isn't, can you always work around it using this nested lookahead? Are there other commonly-used techniques?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤星 2024-09-06 07:56:44

看来您的怀疑是正确的,反向引用通常不能在 Java 后向中使用。您提出的解决方法使后视的有限长度变得明确,并且对我来说看起来非常聪明。

我很想知道 Python 使用这个正则表达式做了什么。 Python只支持固定长度的lookbehind,不像Java那样支持有限长度,但是这个正则表达式是固定长度的。我无法直接使用 re.split() ,因为 Python 的 re.split() 永远不会在空匹配上拆分,但我认为我在 中发现了一个错误re.sub()

>>> r=re.compile("(?<=(.)\\1)")
>>> a=re.sub(r,"|", "Bazooka killed the poor aardvark (yummy!)")
>>> a
'Bazo|oka kil|led the po|or a|ardvark (yum|my!)'

lookbehind 匹配两个重复字符之间的

Looks like your suspicion is correct that backreferences generally can't be used in Java lookbehinds. The workaround you proposed makes the finite length of the lookbehind explicit and looks very clever to me.

I was intrigued to find out what Python does with this regex. Python only supports fixed-length lookbehind, not finite-length like Java, but this regex is fixed length. I couldn't use re.split() directly because Python's re.split() never splits on an empty match, but I think I found a bug in re.sub():

>>> r=re.compile("(?<=(.)\\1)")
>>> a=re.sub(r,"|", "Bazooka killed the poor aardvark (yummy!)")
>>> a
'Bazo|oka kil|led the po|or a|ardvark (yum|my!)'

The lookbehind matches between the two duplicate characters!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文