用Java Regex提取文本并提供参考支持

发布于 2025-02-11 22:16:38 字数 1070 浏览 2 评论 0原文

我在Java中写了一种方法。它基本上是提取具有匹配模式的文本,并返回 all 提取。它只是像java.util.regex.matcher's find()/matches()然后group> group()< /code>:

Matcher matcher = pattern.matcher(fileContent);
StringBuilder sb = new StringBuilder();
while(matcher.matches()) {
    sb.append(matcher.group()).append("\n");
}
return sb.toString();

但是,我希望提取的摘录以参考文献(美元符号,$)和文字character-eScaping(BackSlash,> \),支持,支持就像matcher.replaceall(替换)中的替换一样( doc )。例如:

fileContent = """
    aaabbcac aabb
    bcbcbbccc babba
""";
pattern = Pattern.compile("bb.*(.)(abb)");
extractionFormatter = "$1: $0, \\$$2";

预期的输出是:

a: bbcac aabb, $abb
b: bbccc babb, $abb

希望您了解我要做的事情。您知道是否有任何现有的库/方法可以实现这一目标而无需我重新发明轮子?

I am having some problem writing a method in Java. It basically extracts text with matching pattern and returns ALL the extractions. It simply works just like java.util.regex.Matcher's find()/matches() then group() :

Matcher matcher = pattern.matcher(fileContent);
StringBuilder sb = new StringBuilder();
while(matcher.matches()) {
    sb.append(matcher.group()).append("\n");
}
return sb.toString();

However, I would like the extractions to be formatted with the references(dollar sign,$) and literal-character-escaping (backslash,\) support, just like the replacement in Matcher.replaceAll(replacement)(Doc). For example:

fileContent = """
    aaabbcac aabb
    bcbcbbccc babba
""";
pattern = Pattern.compile("bb.*(.)(abb)");
extractionFormatter = "$1: $0, \\$2";

The expected output would be:

a: bbcac aabb, $abb
b: bbccc babb, $abb

I hope you understand what I am trying to do. Do you know if there is any existing library/method that can achieve this without having me to reinvent the wheel?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

短叹 2025-02-18 22:16:38

您可以从Matcher类中使用结果方法,该类将返回MatchResult s的流首先获得所有匹配项,使用使用结果matchResult.group,立即使用方法string.replaceall使用模式AS REGEX和您的fractactionFormatter作为替换,并最终使用新行加入了所有行:

String fileContent = "aaabbcac aabb\n" +
                     "bcbcbbccc babba";

Pattern pattern = Pattern.compile("bb.*(.)(abb)");
String extractionFormatter = "$1: $0, \\$2";

String output = pattern.matcher(fileContent)
                        .results()
                        .map(MatchResult::group)
                        .map(s -> s.replaceAll(pattern.pattern(), extractionFormatter))
                        .collect(Collectors.joining(System.lineSeparator()));

System.out.println(output);

You can use the results method from the Matcher class which returns a stream of MatchResults to first get all matches, get the results as string using MatchResult.group, replace now using the method String.replaceAll using the pattern as regex and your extractionFormatter as replacement and finally join all using new line:

String fileContent = "aaabbcac aabb\n" +
                     "bcbcbbccc babba";

Pattern pattern = Pattern.compile("bb.*(.)(abb)");
String extractionFormatter = "$1: $0, \\$2";

String output = pattern.matcher(fileContent)
                        .results()
                        .map(MatchResult::group)
                        .map(s -> s.replaceAll(pattern.pattern(), extractionFormatter))
                        .collect(Collectors.joining(System.lineSeparator()));

System.out.println(output);
温柔一刀 2025-02-18 22:16:38

您可以使用 string.replaceall

要注意的是,如果您想通过捕获组获取所需的输出,则必须从替换中不应存在的字符串中匹配(删除)。

使用将提供所需输出的模式:

String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pattern = "(?m)^.*?(bb\\S*).*(.)(abb).*$";
String extractionFormatter = "$2: $1 $2$3, \\$3";
System.out.print(fileContent.replaceAll(pattern, extractionFormatter));

输出

a: bbcac aabb, $abb
b: bbccc babb, $abb

请参见a java demo


或使用StringBuilder,Matcher和While循环:

String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pat = "bb.*(.)(abb)";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(fileContent);
String extractionFormatter = "$1: $0, \\$2";
StringBuilder sb = new StringBuilder();

while(matcher.find()) {
    sb.append(matcher.group().replaceAll(pat, extractionFormatter)).append("\n");
}
System.out.print(sb);

请参阅A java demo

You can use String.replaceAll instead.

The thing to note is that if you want to get the desired output with capture groups, you would have to match (to remove) from the string that should not be there in the replacement.

Using a pattern that would give the desired output:

String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pattern = "(?m)^.*?(bb\\S*).*(.)(abb).*
quot;;
String extractionFormatter = "$2: $1 $2$3, \\$3";
System.out.print(fileContent.replaceAll(pattern, extractionFormatter));

Output

a: bbcac aabb, $abb
b: bbccc babb, $abb

See a Java demo.


Or using the Stringbuilder, Matcher and the while loop:

String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pat = "bb.*(.)(abb)";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(fileContent);
String extractionFormatter = "$1: $0, \\$2";
StringBuilder sb = new StringBuilder();

while(matcher.find()) {
    sb.append(matcher.group().replaceAll(pat, extractionFormatter)).append("\n");
}
System.out.print(sb);

See a Java demo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文