重复解析(并分组)正则表达式

发布于 2024-11-15 03:08:28 字数 666 浏览 2 评论 0原文

我尝试使用 java 解析规则并使用 RegEx 读取内部内容,但由于我对 RegEx 非常陌生,所以我发现了几个问题。

首先,我尝试用这个正则表达式解析谓词(我不知道这是否太复杂): "([a-zA-Z]+)\\(([\\?]?[a -zA-Z0-9]+)?(,[\\?]?[a-zA-Z0-9]+)*\\)",我发现这是完全错误的..谓词应该是这样的(我懒得写了)完整表达式),p(), p(?a), p(?a,?b,c,?d)。谓词名称必须是字符串(仅包含字母字符),参数是仅包含字母字符或以 ? 开头的字符串。

给定元素 p(a,b,c) 时,我发现这里有两个问题:

  1. 当我执行循环来查看每个组的元素时(使用 Matcher),结果只有 p(a,b,c)pa,c,如何我也可以检索 b 吗?
  2. 如何不在组内包含 , (逗号),注意重复也应该包含它?

另一种情况,当我输入p()时,为什么会得到一个元素为null的组?

知道如何解决这个问题吗?

I tried to parse a rule using java and read whatever inside using RegEx, but since I am very new to RegEx, I found several problem.

First, I try to parse a predicate with this RegEx (I don't know whether this is too complicated): "([a-zA-Z]+)\\(([\\?]?[a-zA-Z0-9]+)?(,[\\?]?[a-zA-Z0-9]+)*\\)", and I just found that this is completely wrong... The predicate should be something like this (I am too lazy to write the complete expression), p(), p(?a), p(?a,?b,c,?d). The predicate name has to be a string (contain alpha-character(s) only) and the arguments is a string contain alpha-character(s) only or began with ?.

There are two problems here I found, given element p(a,b,c):

  1. When I do a loop for seeing the element of each group (using Matcher), the results are only p(a,b,c), p, a, and ,c, how could I retrieve the b also?
  2. How not to include the , (comma sign) inside the group, note that the repetition should including it also?

The other case, when I input p(), why did it get a group in which the element is null?

Any idea how to fix this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

南城旧梦 2024-11-22 03:08:29

最长的示例字符串中的“arg”值之一是 ?b?,这似乎与您的描述不符。删除它,您的正则表达式就会匹配所有样本,但这仍然给您带来提取单个参数的问题。在 Java 中执行此操作的最简单方法是将所有参数捕获为一个字符串,然后拆分该字符串以分解各个参数。

正如@Tomalak所说,你的正则表达式非常好;我认为它唯一的问题是代表第一个参数的组后面的 ? 。它应该控制整个参数字符串,而不仅仅是第一个参数。我的意思是,如果没有第一个参数,就没有必要寻找第二个、第三个等等,不是吗?我会这样做:

(?:[?]?[a-zA-Z0-9]+(?:,[?]?[a-zA-Z0-9]+)*)?

这将不匹配任何内容,或者一个参数,或者用逗号分隔的多个参数,但它不会匹配(例如) ,a,?a, b,就像你的正则表达式一样。下面是 Java 字符串文字形式的完整正则表达式:

"([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)"

谓词名称在组 #1 中捕获,参数在组 #2 中捕获。如果没有参数,组 #2 将包含一个空字符串(不是 null)。否则,您可以通过用逗号分隔来分解各个参数。

顺便说一句,您可以使用反斜杠 (\?) 或方括号 ([?]) 转义大多数元字符;你不需要两者都做。如果它只是一个字符(即不是像 [!.?] 这样的真实字符类的一部分),我建议使用反斜杠。我知道 Java 中的字符数相同,但我认为反斜杠使其更加自记录。


编辑:这是我使用的代码:

String[] inputs = { "p()", "p(?a)", "p(?a,?b,c,?d)", "p(a,b,c)" };
Pattern p = Pattern.compile(
    "([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)");

for ( String s : inputs )
{
  Matcher m = p.matcher(s);
  if ( m.matches() )
  {
    System.out.printf("%nFull match: %s%nPredicate name:%n  %s%n",
                      m.group(), m.group(1));
    String allArgs = m.group(2);
    if (allArgs.length() == 0)
    {
      System.out.println("No arguments");
    }
    else
    {
      System.out.println("Arguments:");
      for (String arg : allArgs.split(","))
      {
        System.out.printf("  %s%n", arg);
      }
    }
  }
}

One of the "arg" values in your longest sample string is ?b?, which doesn't seem to match your description. Remove that and your regex matches all the samples, but that still leaves you with the problem of extracting the individual arguments. The easiest way to do that in Java is to capture all the arguments as one string, then split that string to break out the individual arguments.

As @Tomalak said, your regex is pretty good; the only thing I can see wrong with it is the ? after the group representing the first argument. It should control the whole argument string, not just the first argument. I mean, if there's no first argument, there's no point looking for a second, third, etc., is there? Here's how I would do it:

(?:[?]?[a-zA-Z0-9]+(?:,[?]?[a-zA-Z0-9]+)*)?

That will match nothing, or one argument, or several arguments separated by commas, but it won't match (for example) ,a or ,?a,b, as your regex does. Here's the full regex in the form of a Java string literal:

"([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)"

The predicate name is captured in group #1 and the arguments are captured in group #2. If there are no arguments, group #2 will contain an empty string (not a null). Otherwise, you can break out the individual arguments by splitting it on commas.

BTW, you can escape most metacharacters with backslashes (\?) or square brackets ([?]); you don't need to do both. If it's only the one character (i.e., not part of a real character class like [!.?]), I advise using backslashes. I know it's the same number of characters in Java, but I think the backslashes make it a little more self-documenting.


EDIT: Here's the code I used:

String[] inputs = { "p()", "p(?a)", "p(?a,?b,c,?d)", "p(a,b,c)" };
Pattern p = Pattern.compile(
    "([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)");

for ( String s : inputs )
{
  Matcher m = p.matcher(s);
  if ( m.matches() )
  {
    System.out.printf("%nFull match: %s%nPredicate name:%n  %s%n",
                      m.group(), m.group(1));
    String allArgs = m.group(2);
    if (allArgs.length() == 0)
    {
      System.out.println("No arguments");
    }
    else
    {
      System.out.println("Arguments:");
      for (String arg : allArgs.split(","))
      {
        System.out.printf("  %s%n", arg);
      }
    }
  }
}
圈圈圆圆圈圈 2024-11-22 03:08:29

给定元素 p(a,b,c),我发现这里有两个问题

  1. 你不能(轻松)用正则表达式做这样的事情。 (在 Perl 中,你可以使用一些技巧来做到这一点。)
  2. 使用类似 (?:,(\w+))

另一种情况,当我输入p()时,为什么会得到一个元素为空的组?

因为应该与“参数”匹配的组根本不匹配,因此未定义。这就是捕获组的工作方式。比赛结束后,您可以根据需要选择/过滤抱怨声。

您想为此使用/构造一个适当的解析器,而不仅仅是使用一个正则表达式。

There are two problems here I found, given element p(a,b,c)

  1. You can't (easily) do such things with regex. (In Perl you could use some tricks to do that tho.)
  2. With something like (?:,(\w+))

The other case, when I input p(), why did it get a group in which the element is null?

Because the groups that are supposed to match the "parameters" are not matched at all, thus not defined. This is how capturing groups work. You can pick/filter whine once you want after the match.

You want to use/construct a proper parser for this and not just use one regex.

错爱 2024-11-22 03:08:29

“谓词应该是这样的(我懒得写完整的表达式),p(), p(?a), p(?a,?b?,c,?d)。”

我想添加评论,但 ie6 给我带来了麻烦。如果你给出更好的解释,我会给你一个解决方案。

你正在处理的是文本!不要试图将其粉饰为更奢侈的东西。
“懒惰”并不能解释 p(), p(?a), p(?a,?b?,c,?d) 的含义。每个文本字符/符号都必须完全理解。
正则表达式非常强大,而且非常令人畏惧。正则表达式公式(抽象)不能
从抽象中推断出来。

抱歉,我实在看不懂这些参数。我要删除我的帖子...
(显然我无法删除它。如果有人可以帮我删除它,谢谢!)

"The predicate should be something like this (I am too lazy to write the complete expression), p(), p(?a), p(?a,?b?,c,?d)."

I wanted to add a comment but ie6 is giving me trouble. If you give a better explanation, I will give you a solution.

What you are dealing with is text! Don't try to whitewash it as something more extravagant.
Being 'lazy' does not explain what p(), p(?a), p(?a,?b?,c,?d) means. Every single text character/symbol must be fully understood.
Regex is powerful and can be extremely daunting. A regex formulae (abstraction) cannot be
inferred from an abstraction.

I'm sorry, I just can't understand the parameters. I'm going to delete my post...
(Apparently, I can't delete it. If someone could delete this for me, thanks!)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文