Java Matcher 组:理解“(?:X|Y)”和“(?:X|Y)”之间的区别和“(?:X)|(?:Y)”

发布于 2024-09-04 06:16:29 字数 1262 浏览 1 评论 0原文

谁能解释一下:

  1. 为什么下面使用的两种模式会给出不同的结果? (在下面回答)
  2. 为什么第二个示例给出的组数为 1 但表示开始 组 1 的末尾是 -1?
 public void testGroups() throws Exception
 {
  String TEST_STRING = "After Yes is group 1 End";
  {
   Pattern p;
   Matcher m;
   String pattern="(?:Yes|No)(.*)End";
   p=Pattern.compile(pattern);
   m=p.matcher(TEST_STRING);
   boolean f=m.find();
   int count=m.groupCount();
   int start=m.start(1);
   int end=m.end(1);

   System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count + 
     " Start of group 1=" + start + " End of group 1=" + end );
  }

  {
   Pattern p;
   Matcher m;

   String pattern="(?:Yes)|(?:No)(.*)End";
   p=Pattern.compile(pattern);
   m=p.matcher(TEST_STRING);
   boolean f=m.find();
   int count=m.groupCount();
   int start=m.start(1);
   int end=m.end(1);

   System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count + 
     " Start of group 1=" + start + " End of group 1=" + end );
  }

 }

这给出了以下输出:

Pattern=(?:Yes|No)(.*)End  Found=true Group count=1 Start of group 1=9 End of group 1=21
Pattern=(?:Yes)|(?:No)(.*)End  Found=true Group count=1 Start of group 1=-1 End of group 1=-1

Can anyone explain:

  1. Why the two patterns used below give different results? (answered below)
  2. Why the 2nd example gives a group count of 1 but says the start
    and end of group 1 is -1?
 public void testGroups() throws Exception
 {
  String TEST_STRING = "After Yes is group 1 End";
  {
   Pattern p;
   Matcher m;
   String pattern="(?:Yes|No)(.*)End";
   p=Pattern.compile(pattern);
   m=p.matcher(TEST_STRING);
   boolean f=m.find();
   int count=m.groupCount();
   int start=m.start(1);
   int end=m.end(1);

   System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count + 
     " Start of group 1=" + start + " End of group 1=" + end );
  }

  {
   Pattern p;
   Matcher m;

   String pattern="(?:Yes)|(?:No)(.*)End";
   p=Pattern.compile(pattern);
   m=p.matcher(TEST_STRING);
   boolean f=m.find();
   int count=m.groupCount();
   int start=m.start(1);
   int end=m.end(1);

   System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count + 
     " Start of group 1=" + start + " End of group 1=" + end );
  }

 }

Which gives the following output:

Pattern=(?:Yes|No)(.*)End  Found=true Group count=1 Start of group 1=9 End of group 1=21
Pattern=(?:Yes)|(?:No)(.*)End  Found=true Group count=1 Start of group 1=-1 End of group 1=-1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦忆晨望 2024-09-11 06:16:29
  1. 区别在于,在第二个模式“(?:Yes)|(?:No)(.*)End”中,连接(“XY”中的“X后跟Y”) ) 的优先级高于选择(“X|Y”中的“X 或 Y”) ),就像乘法的优先级高于加法,因此该模式相当于

    "(?:是)|(?:(?:否)(.*)End)"
    

    您想要得到的是以下模式:

    <前><代码>“(?:(?:是)|(?:否))(.*)结束”

    这会产生与第一个模式相同的输出。

    在您的测试中,第二个模式的组 1 位于(空)范围 [-1, -1[,因为该组不匹配(包括开始 -1,结束-1 被排除,使半开区间为空)。

  2. 捕获组可以捕获输入的组。如果它捕获,也可以说它匹配输入的某个子字符串。如果正则表达式包含选项,则并非每个捕获组都可以实际捕获输入,因此即使正则表达式匹配,也可能存在不匹配的组。

  3. 组计数,由 Matcher.groupCount(),纯粹是通过计算捕获组的分组括号来获得的,无论它们中的任何一个是否可以匹配在任何给定的输入上。您的模式只有一个捕获组:(.*)。这是第 1 组。文档指出:

    (?:X) X,作为非捕获组
    

    解释

    <块引用>

    (? 开头的组是纯非捕获组,不捕获文本且不计入组总数,或者是命名捕获组。

    任何特定组是否匹配给定输入,与该定义无关。例如,在模式 (Yes)|(No) 中,有两个组((Yes) 是组 1,(No) 是组 2),但只有其中一个可以匹配任何给定的输入。

  4. 如果正则表达式在某个子字符串上匹配,则对 Matcher.find() 的调用将返回 true。您可以通过查看开始来确定哪些组匹配:如果是 -1,则该组不匹配,在这种情况下,结束也是 -1 没有内置方法告诉您如何匹配。许多捕获组在调用 find()match() 后实际上已匹配,您必须通过查看每个组的开头来自行计数。

  5. 当谈到反向引用时,还要注意正则表达式教程所说的内容:

    <块引用>

    反向引用之间存在差异
    一个与任何内容都不匹配的捕获组,以及一个与
    根本没有参加比赛的捕获组。

  1. The difference is that in the second pattern "(?:Yes)|(?:No)(.*)End", the concatenation ("X followed by Y" in "XY") has higher precedence than the choice ("Either X or Y" in "X|Y"), like multiplication has higher precedence than addition, so the pattern is equivalent to

    "(?:Yes)|(?:(?:No)(.*)End)"
    

    What you wanted to get is the following pattern:

    "(?:(?:Yes)|(?:No))(.*)End"
    

    This yields the same output as your first pattern.

    In your test, the second pattern has the group 1 at the (empty) range [-1, -1[ because that group did not match (the start -1 is included, the end -1 is excluded, making the half-open interval empty).

  2. A capturing group is a group that may capture input. If it captures, one also says it matches some substring of the input. If the regex contains choices, then not every capturing group may actually capture input, so there may be groups that do not match even if the regex matches.

  3. The group count, as returned by Matcher.groupCount(), is gained purely by counting the grouping brackets of capturing groups, irrespective of whether any of them could match on any given input. Your pattern has exactly one capturing group: (.*). This is group 1. The documentation states:

    (?:X)    X, as a non-capturing group
    

    and explains:

    Groups beginning with (? are either pure, non-capturing groups that do not capture text and do not count towards the group total, or named-capturing group.

    Whether any specific group matches on a given input, is irrelevant for that definition. E.g., in the pattern (Yes)|(No), there are two groups ((Yes) is group 1, (No) is group 2), but only one of them can match for any given input.

  4. The call to Matcher.find() returns true if the regex was matched on some substring. You can determine which groups matched by looking at their start: If it is -1, then the group did not match. In that case, the end is at -1, too. There is no built-in method that tells you how many capturing groups actually matched after a call to find() or match(). You'd have to count these yourself by looking at each group's start.

  5. When it comes to backreferences, also note what the regex tutorial has to say:

    There is a difference between a backreference to
    a capturing group that matched nothing, and one to
    a capturing group that did not participate in the match at all.

夏有森光若流苏 2024-09-11 06:16:29

总而言之,

1)由于运算符的优先级规则,这两种模式给出了不同的结果。

  • (?:Yes|No)(.*)End 匹配(是或
    No) 后跟 .*End
  • (?:Yes)|(?:No)(.*)End 匹配 (Yes)
    或(No 后跟 .*End)

2) 第二个模式给出的组计数为 1,但开始和结束为 -1,因为 Matcher 返回的结果的含义(不一定直观) > 方法调用。

  • 如果找到匹配项,Matcher.find() 返回 true。在您的情况下,匹配位于模式的 (?:Yes) 部分。
  • Matcher.groupCount() 返回模式中捕获组的数量,无论捕获组是否实际参与匹配。在您的情况下,只有模式的非捕获 (?:Yes) 部分参与了匹配,但捕获 (.*) 组仍然是模式的一部分,因此组计数为 1。
  • Matcher.start(n)Matcher.end(n) 返回与 n匹配的子序列的开始和结束索引/em> 第一个捕获组。在您的例子中,虽然找到了整体匹配,但 (.*) 捕获组没有参与匹配,因此没有捕获子序列,因此结果为 -1。

3)(评论中提出的问题。)为了确定有多少个捕获组实际捕获了子序列,请从 0 到 Matcher.groupCount() 迭代 Matcher.start(n) > 计算非-1结果的数量。 (请注意,Matcher.start(0) 是代表整个模式的捕获组,出于您的目的,您可能希望将其排除。)

To summarise,

1) The two patterns give different results because of the precedence rules of the operators.

  • (?:Yes|No)(.*)End matches (Yes or
    No) followed by .*End
  • (?:Yes)|(?:No)(.*)End matches (Yes)
    or (No followed by .*End)

2) The second pattern gives a group count of 1 but a start and end of -1 because of the (not necessarily intuitive) meanings of the results returned by the Matcher method calls.

  • Matcher.find() returns true if a match was found. In your case the match was on the (?:Yes) part of the pattern.
  • Matcher.groupCount() returns the number of capturing groups in the pattern regardless of whether the capturing groups actually participated in the match. In your case only the non capturing (?:Yes) part of the pattern participated in the match, but the capturing (.*) group was still part of the pattern so the group count is 1.
  • Matcher.start(n) and Matcher.end(n) return the start and end index of the subsequence matched by the n th capturing group. In your case, although an overall match was found, the (.*) capturing group did not participate in the match and so did not capture a subsequence, hence the -1 results.

3) (Question asked in comment.) In order to determine how many capturing groups actually captured a subsequence, iterate Matcher.start(n) from 0 to Matcher.groupCount() counting the number of non -1 results. (Note that Matcher.start(0) is the capturing group representing the whole pattern, which you may want to exclude for your purposes.)

淡笑忘祈一世凡恋 2024-09-11 06:16:29

由于“|”的优先级模式中的运算符,第二个模式相当于:

(?:Yes)|((?:No)(.*)End)

你想要的是

(?:(?:Yes)|(?:No))(.*)End

Due to the precedence of the "|" operator in the pattern, the second pattern is equivalent to:

(?:Yes)|((?:No)(.*)End)

What you want is

(?:(?:Yes)|(?:No))(.*)End
ゃ懵逼小萝莉 2024-09-11 06:16:29

使用正则表达式时,重要的是要记住有一个隐式的 AND 运算符在起作用。这可以从涵盖逻辑运算符的 java.util.regex.Pattern 的 JavaDoc 中看出:

逻辑运算符
XY X 后跟 Y
X|Y X 或 Y
(X) X,作为捕获基团

AND 优先于第二个模式中的 OR。第二个模式相当于
(?:是)|(?:(?:否)(.*)End)
为了使其等同于第一个模式,必须将其更改为
(?:(?:是)|(?:否))(.*)结束

When using regular expression is it important to remember there there is an implicit AND operator at work. This can be seen from the JavaDoc for java.util.regex.Pattern covering the logical operators:

Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group

This AND takes precedence over the OR in the second Pattern. The second Pattern is equivalent to
(?:Yes)|(?:(?:No)(.*)End).
In order for it to be equivalent to the first Pattern it must be changed to
(?:(?:Yes)|(?:No))(.*)End

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文