当前位置：文江博客话题详情

需要正则表达式来匹配特殊情况

发布于 2024-09-02 15:03:35 字数 349 浏览 5 评论 0原文

我正在拼命寻找与这些场景匹配的正则表达式：

1）匹配交替字符

我有一个像“This is my foobababababaf string”这样的字符串 - 我想匹配“babababa”

我唯一知道的是片段的长度搜索 - 我不知道可能是什么字符/数字 - 但它们是交替的。

我真的不知道从哪里开始:(

在像“这是我的 foobaafoobaaaooo 字符串”这样的字符串中匹配组合组 - 我想匹配“aaaooo”。就像 1) 我不知道可能是什么字符/数字。我只知道他们会分两组出现。

我尝试使用 (.)\1\1\1(.)\1\1\1 之类的东西......

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

海的爱人是光 2024-09-09 15:03:35

我想这样的事情就是你想要的。

对于交替字符：

(?=(.)(?!\1)(.))(?:\1\2){2,}

\0 将是整个交替序列，\1 和 \2 是两个（不同的）交替字符。

对于 N 和 M 字符的运行，可能由其他字符分隔（此处将 N 和 M 替换为数字）：

(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}

\0 将是整个匹配，包括中缀。 \1 是字符重复（至少）N 次，\2 是字符重复（至少）M次。

这是 Java 中的测试工具。

import java.util.regex.*;

public class Regex3 {
    static String runNrunM(int N, int M) {
        return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
            .replace("N", String.valueOf(N))
            .replace("M", String.valueOf(M));
    }
    static void dumpMatches(String text, String pattern) {
        Matcher m = Pattern.compile(pattern).matcher(text);
        System.out.println(text + " <- " + pattern);
        while (m.find()) {
            System.out.println("  match");
            for (int g = 0; g <= m.groupCount(); g++) {
                System.out.format("    %d: [%s]%n", g, m.group(g));
            }
        }
    }
    public static void main(String[] args) {
        String[] tests = {
            "foobababababaf foobaafoobaaaooo",
            "xxyyyy axxayyyya zzzzzzzzzzzzzz"
        };
        for (String test : tests) {
            dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
        }
        for (String test : tests) {
            dumpMatches(test, runNrunM(3, 3));
        }
        for (String test : tests) {
            dumpMatches(test, runNrunM(2, 4));
        }
    }
}

这会产生以下输出：

foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
  match
    0: [bababababa]
    1: [b]
    2: [a]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
  match
    0: [aaaooo]
    1: [a]
    2: [o]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
  match
    0: [yyyy axxayyyya zzz]
    1: [y]
    2: [z]
foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
  match
    0: [xxyyyy]
    1: [x]
    2: [y]
  match
    0: [xxayyyy]
    1: [x]
    2: [y]

Explanation

(?=(.)(?!\1)(.))(?:\1\2){2,} 有两部分
- (?=(.)(?!\1)(.)) 使用前瞻建立 \1 和 \2
  - 嵌套负向先行可确保 \1 != \2
  - 使用前瞻捕获让 \0 拥有整个匹配项（而不仅仅是“尾部”）
- (?:\1\2){2,} 捕获 \1\2 序列，该序列必须至少重复两次。
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M} 分为三部分
- (?=(.))\1{N} 在前瞻中捕获 \1，然后匹配它 N 次
  - 使用前瞻捕获意味着重复可以是 N 而不是 N-1
- .*? 允许中缀分隔两个运行，但不愿意使其尽可能短
- (?=(?!\1)(.))\2{M}
  - 与第一部分类似
  - 嵌套负向先行可确保 \1 != \2

运行正则表达式将匹配更长的运行，例如 run(2,2 ) 匹配 "xxxyyy"：

xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2}
  match
    0: [xxxyy]
    1: [x]
    2: [y]

此外，它不允许重叠匹配。即“xx11yyy222”中只有一个run(2,3)。

xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3}
  match
    0: [xx11yyy]
    1: [x]
    2: [y]

I think something like this is what you want.

For alternating characters:

(?=(.)(?!\1)(.))(?:\1\2){2,}

\0 will be the entire alternating sequence, \1 and \2 are the two (distinct) alternating characters.

For run of N and M characters, possibly separated by other characters (replace N and M with numbers here):

(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}

\0 will be entire match, including infix. \1 is the character repeated (at least) N times, \2 is the character repeated (at least) M times.

Here's a test harness in Java.

import java.util.regex.*;

public class Regex3 {
    static String runNrunM(int N, int M) {
        return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
            .replace("N", String.valueOf(N))
            .replace("M", String.valueOf(M));
    }
    static void dumpMatches(String text, String pattern) {
        Matcher m = Pattern.compile(pattern).matcher(text);
        System.out.println(text + " <- " + pattern);
        while (m.find()) {
            System.out.println("  match");
            for (int g = 0; g <= m.groupCount(); g++) {
                System.out.format("    %d: [%s]%n", g, m.group(g));
            }
        }
    }
    public static void main(String[] args) {
        String[] tests = {
            "foobababababaf foobaafoobaaaooo",
            "xxyyyy axxayyyya zzzzzzzzzzzzzz"
        };
        for (String test : tests) {
            dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
        }
        for (String test : tests) {
            dumpMatches(test, runNrunM(3, 3));
        }
        for (String test : tests) {
            dumpMatches(test, runNrunM(2, 4));
        }
    }
}

This produces the following output:

foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
  match
    0: [bababababa]
    1: [b]
    2: [a]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
  match
    0: [aaaooo]
    1: [a]
    2: [o]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
  match
    0: [yyyy axxayyyya zzz]
    1: [y]
    2: [z]
foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
  match
    0: [xxyyyy]
    1: [x]
    2: [y]
  match
    0: [xxayyyy]
    1: [x]
    2: [y]

Explanation

(?=(.)(?!\1)(.))(?:\1\2){2,} has two parts
- (?=(.)(?!\1)(.)) establishes \1 and \2 using lookahead
  - Nested negative lookahead ensures that \1 != \2
  - Using lookahead to capture lets \0 have the entire match (instead of just the "tail" end)
- (?:\1\2){2,} captures the \1\2 sequence, which must repeat at least twice.
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M} has three parts
- (?=(.))\1{N} captures \1 in a lookahead, and then match it N times
  - Using lookahead to capture means the repetition can be N instead of N-1
- .*? allows an infix to separate the two runs, reluctant to keep it as short as possible
- (?=(?!\1)(.))\2{M}
  - Similar to first part
  - Nested negative lookahead ensures that \1 != \2

The run regex will match longer runs, e.g. run(2,2) matches "xxxyyy":

xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2}
  match
    0: [xxxyy]
    1: [x]
    2: [y]

Also, it does not allow overlapping matches. That is, there is only one run(2,3) in "xx11yyy222".

xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3}
  match
    0: [xx11yyy]
    1: [x]
    2: [y]

回复收藏 0 原文

沉睡月亮 2024-09-09 15:03:35

假设您使用 perl/PCRE：

(.{2})\1+ 或 ((.)(?!\2)(.))\1+。第二个正则表达式阻止匹配诸如 oooo 之类的内容。

UPD：那么2.将是((.)\2{N}).*?((?!\2)(.)\4{M}) 。如果您想获得像 oooaoooo 这样的匹配，请删除 (?!\2) 并将 N 和 M 替换为 n-1 和 m-1。

回复收藏 0 原文

℉絮湮 2024-09-09 15:03:35

嗯，这适用于第一个......

((.)(.))(\2\3)+

Well, this works for the first one...

((.)(.))(\2\3)+

回复收藏 0 原文

Saygoodbye 2024-09-09 15:03:35

JavaScript 中的示例

a = "This is my foobababababaf string"

console.log(a.replace(/(.)(.)(\1\2)+/, "<<amp;>>"))

a = "This is my foobaafoobaaaooo string"

console.log(a.replace(/(.)\1+(.)\2+/, "<<amp;>>"))

Examples in javascript

a = "This is my foobababababaf string"

console.log(a.replace(/(.)(.)(\1\2)+/, "<<amp;>>"))

a = "This is my foobaafoobaaaooo string"

console.log(a.replace(/(.)\1+(.)\2+/, "<<amp;>>"))

回复收藏 0 原文

~没有更多了~