重复解析（并分组）正则表达式

发布于 2024-11-15 03:08:28 字数 666 浏览 5 评论 0原文

我尝试使用 java 解析规则并使用 RegEx 读取内部内容，但由于我对 RegEx 非常陌生，所以我发现了几个问题。

首先，我尝试用这个正则表达式解析谓词（我不知道这是否太复杂）： "([a-zA-Z]+)\\(([\\?]?[a -zA-Z0-9]+)?(,[\\?]?[a-zA-Z0-9]+)*\\)"，我发现这是完全错误的..谓词应该是这样的（我懒得写了）完整表达式），p(), p(?a), p(?a,?b,c,?d)。谓词名称必须是字符串（仅包含字母字符），参数是仅包含字母字符或以 ? 开头的字符串。

给定元素 p(a,b,c) 时，我发现这里有两个问题：

当我执行循环来查看每个组的元素时（使用 Matcher），结果只有 p(a,b,c)、p、a 和 ,c，如何我也可以检索 b 吗？
如何不在组内包含 , （逗号），注意重复也应该包含它？

另一种情况，当我输入p()时，为什么会得到一个元素为null的组？

知道如何解决这个问题吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南城旧梦 2024-11-22 03:08:29

最长的示例字符串中的“arg”值之一是 ?b?，这似乎与您的描述不符。删除它，您的正则表达式就会匹配所有样本，但这仍然给您带来提取单个参数的问题。在 Java 中执行此操作的最简单方法是将所有参数捕获为一个字符串，然后拆分该字符串以分解各个参数。

正如@Tomalak所说，你的正则表达式非常好；我认为它唯一的问题是代表第一个参数的组后面的 ? 。它应该控制整个参数字符串，而不仅仅是第一个参数。我的意思是，如果没有第一个参数，就没有必要寻找第二个、第三个等等，不是吗？我会这样做：

(?:[?]?[a-zA-Z0-9]+(?:,[?]?[a-zA-Z0-9]+)*)?

这将不匹配任何内容，或者一个参数，或者用逗号分隔的多个参数，但它不会匹配（例如） ,a 或 ,?a, b，就像你的正则表达式一样。下面是 Java 字符串文字形式的完整正则表达式：

"([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)"

谓词名称在组 #1 中捕获，参数在组 #2 中捕获。如果没有参数，组 #2 将包含一个空字符串（不是 null）。否则，您可以通过用逗号分隔来分解各个参数。

顺便说一句，您可以使用反斜杠 (\?) 或方括号 ([?]) 转义大多数元字符；你不需要两者都做。如果它只是一个字符（即不是像 [!.?] 这样的真实字符类的一部分），我建议使用反斜杠。我知道 Java 中的字符数相同，但我认为反斜杠使其更加自记录。

编辑：这是我使用的代码：

String[] inputs = { "p()", "p(?a)", "p(?a,?b,c,?d)", "p(a,b,c)" };
Pattern p = Pattern.compile(
    "([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)");

for ( String s : inputs )
{
  Matcher m = p.matcher(s);
  if ( m.matches() )
  {
    System.out.printf("%nFull match: %s%nPredicate name:%n  %s%n",
                      m.group(), m.group(1));
    String allArgs = m.group(2);
    if (allArgs.length() == 0)
    {
      System.out.println("No arguments");
    }
    else
    {
      System.out.println("Arguments:");
      for (String arg : allArgs.split(","))
      {
        System.out.printf("  %s%n", arg);
      }
    }
  }
}

One of the "arg" values in your longest sample string is ?b?, which doesn't seem to match your description. Remove that and your regex matches all the samples, but that still leaves you with the problem of extracting the individual arguments. The easiest way to do that in Java is to capture all the arguments as one string, then split that string to break out the individual arguments.

As @Tomalak said, your regex is pretty good; the only thing I can see wrong with it is the ? after the group representing the first argument. It should control the whole argument string, not just the first argument. I mean, if there's no first argument, there's no point looking for a second, third, etc., is there? Here's how I would do it:

(?:[?]?[a-zA-Z0-9]+(?:,[?]?[a-zA-Z0-9]+)*)?

That will match nothing, or one argument, or several arguments separated by commas, but it won't match (for example) ,a or ,?a,b, as your regex does. Here's the full regex in the form of a Java string literal:

"([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)"

The predicate name is captured in group #1 and the arguments are captured in group #2. If there are no arguments, group #2 will contain an empty string (not a null). Otherwise, you can break out the individual arguments by splitting it on commas.

BTW, you can escape most metacharacters with backslashes (\?) or square brackets ([?]); you don't need to do both. If it's only the one character (i.e., not part of a real character class like [!.?]), I advise using backslashes. I know it's the same number of characters in Java, but I think the backslashes make it a little more self-documenting.

EDIT: Here's the code I used:

String[] inputs = { "p()", "p(?a)", "p(?a,?b,c,?d)", "p(a,b,c)" };
Pattern p = Pattern.compile(
    "([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)");

for ( String s : inputs )
{
  Matcher m = p.matcher(s);
  if ( m.matches() )
  {
    System.out.printf("%nFull match: %s%nPredicate name:%n  %s%n",
                      m.group(), m.group(1));
    String allArgs = m.group(2);
    if (allArgs.length() == 0)
    {
      System.out.println("No arguments");
    }
    else
    {
      System.out.println("Arguments:");
      for (String arg : allArgs.split(","))
      {
        System.out.printf("  %s%n", arg);
      }
    }
  }
}

回复收藏 0 原文