Java重复模式匹配(三)

发布于 2024-12-10 05:28:55 字数 1975 浏览 0 评论 0原文

我正在尝试解决一个简单的 Java 正则表达式匹配问题,但仍然得到相互冲突的结果(跟进 this< /a> 和那个问题)。

更具体地说,我试图匹配重复的文本输入,其中包含由“|”分隔的组。 (竖线),其前面可能直接带有下划线('_'),特别是如果组不为空(即,如果输入中没有出现两个连续的 | 分隔符)。

此类输入的一个示例是:

Text group 1_|Text group 2_|||Text group 5_|||Text group 8

此外,我需要一种方法来验证是否发生了匹配,以避免将与该输入相关的处理应用于我的应用程序也处理的其他完全不同的输入,使用不同的正则表达式。

为了确认正则表达式有效,我使用 RegexPal

经过多次测试,最接近的我想要的是以下两个正则表达式,在我上面引用的问题中建议:

1. (?:\||^)([^\\|]*) 
2. \G([^\|]+?)_?\||\G()\||\G([^\|]*)$

使用其中任何一个,如果我运行 ma​​tcher.find() 循环,我会得到:

  • 所有文本组,中包含的下划线end, from Regex 1
  • 除最后一个之外的所有文本组,没有下划线,但最后有 2 个空组,来自 Regex 2。

因此,显然 Regex 2 不正确(并且 RegexPal 也不将其显示为匹配)。

我可以使用 Regex 1 并进行一些后处理来删除尾随下划线,尽管理想情况下我希望正则表达式为我做这件事。

然而,上述两个正则表达式对于 ma​​tcher.matches() 都没有返回 true,而 ma​​tcher.find() 即使对于完全不相关的输入也始终为 true(这是合理的,因为即使在其他文本中,通常也会有至少 1 个匹配组)。

因此,我有两个问题

  1. 是否有一个正确的(完全工作的)正则表达式排除尾随下划线
  2. 有没有办法检查只有正确的正则表达式匹配

用于测试 Regex 1 的代码类似于

String input = "Text group 1_|Text group 2_|||Text group 5_|||Text group 8";

Matcher matcher = Pattern.compile("(?:\\||^)([^\\\\|]*)").matcher(input);

if (matcher.matches())
{
    System.out.println("Input MATCHED: " + input);

    while (matcher.find())
    {
        System.out.println("\t\t" + matcher.group(1));
    }

}
else
{
    System.out.println("\tInput NOT MATCHED: " + input);
}

使用上述代码总是导致“不匹配”。删除 if/else 并仅使用 ma​​tcher.find() 可以检索所有文本组

I am trying to solve a simple Java regex matching problem but still getting conflicting results (following up on this and that question).

More specifically, I am trying to match a repetitive text input, consisting of groups that are delimited by '|' (vertical bar) that may be directly preceded by underscore ('_'), especially if the groups are not empty (i.e., if no two consecutive | delimiters appear in the input).

An example such input is:

Text group 1_|Text group 2_|||Text group 5_|||Text group 8

In addition, I need a way to verify that a match has occurred, in order to avoid applying the processing related to that input to other, totally different inputs that my application also processes, using different regular expressions.

To confirm that a regex works, I am using RegexPal.

After several tests, the closest to what I want are the following two Regular Expressions, suggested in the questions I quoted above:

1. (?:\||^)([^\\|]*) 
2. \G([^\|]+?)_?\||\G()\||\G([^\|]*)$

Using either of these, if I run a matcher.find() loop I get:

  • All the text groups, with the underscore included in the end, from Regex 1
  • All the text groups apart from the last, with no underscore but 2 empty groups in the end, from Regex 2.

So, apparently Regex 2 is not correct (and RegexPal also does not show it as matching).

I could use Regex 1 and do some post-processing to remove the trailing underscore, although ideally I would like the regex to do that for me.

However, none of the two aforementioned regular expressions returns true for matcher.matches(), whereas matcher.find() is always true even for totally irrelevant input (reasonable, since there will often be at least 1 matching group, even in other text).

I thus have two questions:

  1. Is there a correct (fully working) regex that excludes the trailing underscore?
  2. Is there any way of checking that only the correct regex has matched?

The code used to test Regex 1, is something like

String input = "Text group 1_|Text group 2_|||Text group 5_|||Text group 8";

Matcher matcher = Pattern.compile("(?:\\||^)([^\\\\|]*)").matcher(input);

if (matcher.matches())
{
    System.out.println("Input MATCHED: " + input);

    while (matcher.find())
    {
        System.out.println("\t\t" + matcher.group(1));
    }

}
else
{
    System.out.println("\tInput NOT MATCHED: " + input);
}

Using the above code always results in "NOT MATCHED". Removing the if/else and only using matcher.find() does retrieve all text groups.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

下壹個目標 2024-12-17 05:28:55

Matcher#matches 方法尝试将整个输入序列与模式进行匹配,这就是您得到结果 Input NOT MATCHED 的原因。请参阅此处的文档 http:// /download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#matches

如果您想排除尾随下划线,您可以使用此正则表达式(稍微修改 。

(?:\\||^)([^\\\\|_]*)

如果您确定 _ 就在 | 之前,那么这将起作用

Matcher#matches method attempts to match the entire input sequence against the pattern, that is why you are getting the result Input NOT MATCHED. See the documentation here http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#matches

If you want to exclude the trailing underscore you can use this regex (slight modification of what you already have)

(?:\\||^)([^\\\\|_]*)

This would work if you are sure that _ comes just before |.

当梦初醒 2024-12-17 05:28:55

RegexPal 是一个 JavaScript 正则表达式工具。 Java 和 JavaScript 正则表达式语言有所不同。考虑使用 Java Regex 工具;也许 这个

这可能与您的内容很接近想要: (?:([^_\|]+)_{0,1}+\|*)+

编辑: 代码额外。
在 java 6 中,这会打印每个组(find() 循环)。

public static void main(String[] args)
{
    String input = "Text group 1_|Text group 2_|||Text group 5_|||Text group 8";
    Matcher matcher;
    Pattern pattern = Pattern.compile("(?:([^_\\|]+)_{0,1}+\\|*)+");
    Pattern groupPattern = Pattern.compile("(?:([^_\\|]+)_{0,1}+\\|*)");

    matcher = pattern.matcher(input);
    if (matcher.matches())
    {
        Matcher groupMatcher;

        System.out.println("matcher.matches() is true");
        int groupCount = matcher.groupCount();
        for (int index = 1; index <= groupCount; ++index)
        {
            System.out.print("group (pattern)[");
            System.out.print(index);
            System.out.print("]: ");
            System.out.println(matcher.group(index));
        }

        groupMatcher = groupPattern.matcher(input);
        while (groupMatcher.find())
        {
            System.out.print("group (groupPattern):");
            System.out.println(groupMatcher.group());
                            System.out.println(groupMatcher.group(1));
        }
    }
    else
    {
        System.out.println("No match");
    }
}

RegexPal is a JavaScript regex tool. The Java and JavaScript regular expression languages differ. Consider using a Java Regex tool; perhaps this one

This may be close to what you want: (?:([^_\|]+)_{0,1}+\|*)+

Edit: Code added.
In java 6 this prints each group (the find() loop).

public static void main(String[] args)
{
    String input = "Text group 1_|Text group 2_|||Text group 5_|||Text group 8";
    Matcher matcher;
    Pattern pattern = Pattern.compile("(?:([^_\\|]+)_{0,1}+\\|*)+");
    Pattern groupPattern = Pattern.compile("(?:([^_\\|]+)_{0,1}+\\|*)");

    matcher = pattern.matcher(input);
    if (matcher.matches())
    {
        Matcher groupMatcher;

        System.out.println("matcher.matches() is true");
        int groupCount = matcher.groupCount();
        for (int index = 1; index <= groupCount; ++index)
        {
            System.out.print("group (pattern)[");
            System.out.print(index);
            System.out.print("]: ");
            System.out.println(matcher.group(index));
        }

        groupMatcher = groupPattern.matcher(input);
        while (groupMatcher.find())
        {
            System.out.print("group (groupPattern):");
            System.out.println(groupMatcher.group());
                            System.out.println(groupMatcher.group(1));
        }
    }
    else
    {
        System.out.println("No match");
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文