使用正则表达式保存子字符串

发布于 2024-09-05 05:04:42 字数 287 浏览 4 评论 0原文

我对 Java(或任何语言)中的正则表达式很陌生,我想使用它们进行查找。我不明白如何做的棘手部分是替换匹配字符串中的某些内容。

例如,如果我正在寻找的行是

Person item6 [can {item thing [wrap]}]

我能够编写一个找到该行的正则表达式,但是找到单词“thing”是什么(因为它在不同的行中可能有所不同)是我的问题。我可能想用其他单词替换该单词,或者将其保存在变量中供以后使用。有没有简单的方法可以使用 Java 的正则表达式引擎来做到这一点?

I'm new to regular expressions in Java (or any language, for that matter) and I'm wanting to do a find using them. The tricky part that I don't understand how to do is replace something inside the string that matches.

For example, if the line I'm looking for is

Person item6 [can {item thing [wrap]}]

I'm able to write a regex that finds that line, but finding what the word "thing" is (as it may differ among different lines) is my problem. I may want to either replace that word with something else or save it in a variable for later. Is there any easy way to do this using Java's regex engine?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

撩起发的微风 2024-09-12 05:04:42

是的。您将其包装在“捕获组”中,这只是正则表达式中与感兴趣的单词匹配的部分周围的一些 ( ) 。

这是一个示例:

public static void main(String[] args) {

    Pattern pat = Pattern.compile("testing (\\d+) widgets");

    String text = "testing 5 widgets";

    Matcher matcher = pat.matcher(text);

    if (matcher.matches()) {
        System.out.println("Widgets tested : " + matcher.group(1));
    } else {
        System.out.println("No match");
    }

}

Pattern 和 Matcher 来自 java.util.regex。 String 类中有一些快捷方式,但这些是最灵活的

Yes. You wrap it in "capturing groups", which is just some ( ) around the part of the regular expression matching the interesting word.

Here is an example:

public static void main(String[] args) {

    Pattern pat = Pattern.compile("testing (\\d+) widgets");

    String text = "testing 5 widgets";

    Matcher matcher = pat.matcher(text);

    if (matcher.matches()) {
        System.out.println("Widgets tested : " + matcher.group(1));
    } else {
        System.out.println("No match");
    }

}

Pattern and Matcher come from java.util.regex. There are some shortcuts in the String class, but these are the most flexible

伤痕我心 2024-09-12 05:04:42

问题规范不是很清楚,但这里有一些可能有效的想法:

使用lookarounds和replaceAll/First

以下正则表达式与前面带有\w+的字符串 "{item ",后跟字符串 " ["。环视仅用于精确匹配 \w+。元字符 {[ 会根据需要进行转义。

String text =
    "Person item6 [can {item thing [wrap]}]\n" +
    "Cat item7 [meow meow {item thang [purr]}]\n" +
    "Dog item8 [maybe perhaps {itemmmm thong [woof]}]" ;

String LOOKAROUND_REGEX = "(?<=\\{item )\\w+(?= \\[)";

System.out.println(
    text.replaceAll(LOOKAROUND_REGEX, "STUFF")
);

这将打印:

Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm thong [woof]}]

参考文献


使用捕获组而不是环视

环视应谨慎使用。尤其是在 Java 中,Lookbehind 非常有限。更常用的技术是使用捕获组来匹配更多而不仅仅是有趣的部分。

以下正则表达式与之前的类似模式 \w+ 匹配,但还包含 "{item " 前缀和 " [" 后缀。此外,item 中的 m 可以无限制地重复(这在 Java 的 Lookbehind 中是无法匹配的)。

String CAPTURING_REGEX = "(\\{item+ )(\\w+)( \\[)";

System.out.println(
    text.replaceAll(CAPTURING_REGEX, "$1STUFF$3")
);

打印结果:

Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm STUFF [woof]}]

我们的模式有 3 个捕获组:

(\{item+ )(\w+)( \[)
\________/\___/\___/
 group 1    2    3

请注意,我们不能简单地替换与 "STUFF" 匹配的内容,因为我们匹配了一些“无关”的部分。我们对替换它们不感兴趣,因此我们捕获这些部分并将它们放回替换字符串中。我们在 Java 中引用替换字符串中捕获的组的方式是使用 $ 符号;因此,上面示例中的 $1$3

参考文献


使用 Matcher为了获得更大的灵活性

并非所有事情都可以通过替换字符串来完成。例如,Java 没有后处理功能来将捕获的字符串大写。在这些更一般的替换场景中,您可以使用 Matcher 循环,如下所示:

Matcher m = Pattern.compile(CAPTURING_REGEX).matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    System.out.println("Match found");
    for (int i = 0; i <= m.groupCount(); i++) {
        System.out.printf("Group %d captured <%s>%n", i, m.group(i));
    }
    m.appendReplacement(sb,
        String.format("%s%s %<s and more %<SS%s",
            m.group(1), m.group(2), m.group(3)
        )
    );
}
m.appendTail(sb);

System.out.println(sb.toString());

上面的打印内容:

Match found
Group 0 captured <{item thing [>
Group 1 captured <{item >
Group 2 captured <thing>
Group 3 captured < [>

Match found
Group 0 captured <{item thang [>
Group 1 captured <{item >
Group 2 captured <thang>
Group 3 captured < [>

Match found
Group 0 captured <{itemmmm thong [>
Group 1 captured <{itemmmm >
Group 2 captured <thong>
Group 3 captured < [>

Person item6 [can {item thing thing and more THINGS [wrap]}]
Cat item7 [meow meow {item thang thang and more THANGS [purr]}]
Dog item8 [maybe perhaps {itemmmm thong thong and more THONGS [woof]}]

参考资料

附件

The problem specification isn't very clear, but here are some ideas that may work:

Use lookarounds and replaceAll/First

The following regex matches the \w+ that is preceded by the string "{item " and followed by the string " [". Lookarounds are used to match exactly the \w+ only. Metacharacters { and [ are escaped as necessary.

String text =
    "Person item6 [can {item thing [wrap]}]\n" +
    "Cat item7 [meow meow {item thang [purr]}]\n" +
    "Dog item8 [maybe perhaps {itemmmm thong [woof]}]" ;

String LOOKAROUND_REGEX = "(?<=\\{item )\\w+(?= \\[)";

System.out.println(
    text.replaceAll(LOOKAROUND_REGEX, "STUFF")
);

This prints:

Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm thong [woof]}]

References


Use capturing groups instead of lookarounds

Lookarounds should be used judiciously. Lookbehinds in particular in Java is very limited. A more commonly applied technique is to use capturing groups to match more than just the interesting parts.

The following regex matches a similar pattern from before, \w+, but also includes the "{item " prefix and " [" suffix. Additionally, the m in item can repeat without limitation (something that can't be matched in a lookbehind in Java).

String CAPTURING_REGEX = "(\\{item+ )(\\w+)( \\[)";

System.out.println(
    text.replaceAll(CAPTURING_REGEX, "$1STUFF$3")
);

This prints:

Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm STUFF [woof]}]

Our pattern has 3 capturing groups:

(\{item+ )(\w+)( \[)
\________/\___/\___/
 group 1    2    3

Note that we can't simply replace what we matched with "STUFF", because we match some "extraneous" parts. We're not interested in replacing them, so we capture these parts and just put them back in the replacement string. The way we refer to what a group captured in replacement strings in Java is to use the $ sigil; thus the $1 and $3 in the above example.

References


Use a Matcher for more flexibility

Not everything can be done with replacement strings. Java doesn't have postprocessing to capitalize a captured string, for example. In these more general replacement scenarios, you can use a Matcher loop like the following:

Matcher m = Pattern.compile(CAPTURING_REGEX).matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    System.out.println("Match found");
    for (int i = 0; i <= m.groupCount(); i++) {
        System.out.printf("Group %d captured <%s>%n", i, m.group(i));
    }
    m.appendReplacement(sb,
        String.format("%s%s %<s and more %<SS%s",
            m.group(1), m.group(2), m.group(3)
        )
    );
}
m.appendTail(sb);

System.out.println(sb.toString());

The above prints:

Match found
Group 0 captured <{item thing [>
Group 1 captured <{item >
Group 2 captured <thing>
Group 3 captured < [>

Match found
Group 0 captured <{item thang [>
Group 1 captured <{item >
Group 2 captured <thang>
Group 3 captured < [>

Match found
Group 0 captured <{itemmmm thong [>
Group 1 captured <{itemmmm >
Group 2 captured <thong>
Group 3 captured < [>

Person item6 [can {item thing thing and more THINGS [wrap]}]
Cat item7 [meow meow {item thang thang and more THANGS [purr]}]
Dog item8 [maybe perhaps {itemmmm thong thong and more THONGS [woof]}]

References

Attachments

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文