PHP正则表达式非捕获非匹配组

发布于 2024-11-06 06:08:08 字数 367 浏览 7 评论 0原文

我正在制作一个日期匹配正则表达式,一切进展顺利,到目前为止我已经得到了:

"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"

它将(希望)匹配 21 世纪的一位数或两位数的日期和月份,以及两位数或四位数的年份。一些尝试和错误让我走到了这一步。

但是,关于这些结果,我有两个简单的问题:

  1. (?: ) 对此的简单解释是什么?显然这是一个不匹配的组。但是然后...

  2. 尾随的 是什么? 的用途是什么?例如 (? )?

I'm making a date matching regex, and it's all going pretty well, I've got this so far:

"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"

It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.

But, I've got two simple questions regarding these results:

  1. (?: ) what is a simple explanation for this? Apparently it's a non-matching group. But then...

  2. What is the trailing ? for? e.g. (? )?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

丢了幸福的猪 2024-11-13 06:08:08

[(再次)编辑以改进格式并修复简介。]

这是一条评论和一个答案。

答案部分......我确实同意亚历克斯之前的回答。

  1. (?: )( ) 不同,用于避免捕获文本,通常是为了减少与您确实想要的内容一起抛出的反向引用或提高速度性能。

  2. 那个?在 (?: ) 后面 - 或在除 * + ?{} 之外的任何内容之后 - 意味着前面的项目可能或可能在合法匹配中找不到。例如,/z34?/ 将匹配 z3 以及 z34,但不会匹配 z35 或 z 等。

注释部分...我对您所使用的正则表达式进行了可能被认为是改进的内容正在处理:

(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)

-- 首先,它避免像 0-0-2011 这样的东西

-- 其次,它避免像 233443-4-201154564 这样的东西

-- 第三,它包括像这样的东西1-1-2022

-- 第四,它包括诸如 1-1-11 之类的内容

-- 第五,它避免诸如 34-4-11 之类的内容

-- 第六,它允许您捕获日、月和年,以便您可以在代码中更容易地引用这些..例如,可以进行进一步检查的代码(是第二个捕获组 2 并且是第一个捕获组 29 并且这是闰年,或者第一个捕获组是 <29 ) 为了看看 2 月 29 日的日期是否合格。

最后,请注意,您仍然会得到不存在的日期,例如 31-6-11。如果您想避免这些,请尝试:

(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)

另外,我假设日期前面和后面都有一个空格(或行尾),但您可能需要调整它(例如,允许使用标点符号)。

其他地方的评论者引用了此资源,您可能会发现它有用:
http://rubular.com/

[Edited (again) to improve formatting and fix the intro.]

This is a comment and an answer.

The answer part... I do agree with alex' earlier answer.

  1. (?: ), in contrast to ( ), is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.

  2. The ? following the (?: ) -- or when following anything except * + ? or {} -- means that the preceding item may or may not be found within a legitimate match. Eg, /z34?/ will match z3 as well as z34 but it won't match z35 or z etc.

The comment part... I made what might considered to be improvements to the regex you were working on:

(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)

-- First, it avoids things like 0-0-2011

-- Second, it avoids things like 233443-4-201154564

-- Third, it includes things like 1-1-2022

-- Forth, it includes things like 1-1-11

-- Fifth, it avoids things like 34-4-11

-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.

Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:

(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)

Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).

A commenter elsewhere referenced this resource which you might find useful:
http://rubular.com/

多彩岁月 2024-11-13 06:08:08
  1. 它是一个非捕获组。您无法反向引用它。通常用于整理反向引用和/或提高性能。
  2. 这意味着之前的捕获组是可选的。
  1. It is a non capturing group. You can not back reference it. Usually used to declutter backreferences and/or increase performance.
  2. It means the previous capturing group is optional.
夏至、离别 2024-11-13 06:08:08

子模式

子模式由圆括号(圆括号)分隔,可以嵌套。将模式的一部分标记为子模式有两个作用:

  1. 它本地化一组替代方案。例如,模式
    cat(aract|erpillar|) 匹配单词“cat”、“cataract”之一或
    “毛虫”。如果没有括号,它将匹配“cataract”,
    “erpillar”或空字符串。
  2. 它将子模式设置为捕获子模式(如定义
    多于)。当整个模式匹配时,主题的该部分
    与子模式匹配的字符串通过以下方式传递回调用者
    pcre_exec() 的 ovector 参数。左括号被计算在内
    从左到右(从1开始)获取
    捕获子模式。

例如,如果字符串“the red king”与模式 ((red|white) (king|queen)) 匹配,则捕获的子字符串为“red king”、“red”和“king”,并编号1、2 和 3。

事实上,普通括号满足两个功能并不总是有帮助。很多时候,需要分组子模式而不需要捕获要求。如果左括号后跟“?:”,则该子模式不会执行任何捕获,并且在计算任何后续捕获子模式的数量时不会被计算在内。例如,如果字符串“the White queen”与模式 ((?:red|white) (king|queen)) 匹配,则捕获的子字符串为“white queen”和“queen”,编号为 1 和 2捕获的子字符串的最大数量为 65535。但是,可能无法编译如此大的模式,具体取决于 libpcre 的配置选项。

作为一种方便的速记方式,如果在非捕获子模式的开头需要任何选项设置,则选项字母可能出现在“?”之间。和“:”。因此,这两个模式

(?i:saturday|sunday)
(?:(?i)saturday|sunday)

与同一组字符串完全匹配。由于从左到右尝试替代分支,并且直到到达子模式末尾才会重置选项,因此一个分支中的选项设置确实会影响后续分支,因此上述模式匹配“SUNDAY”和“Saturday”。

可以使用语法 (?Ppattern) 来命名子模式。然后,该子模式将在 matches 数组中按其正常数字位置和名称进行索引。 PHP 5.2.2 引入了两种替代语法 (?pattern) 和 (?'name'pattern)。

有时,正则表达式中需要有多个匹配但交替的子组。通常,即使其中只有一个可能匹配,但它们中的每一个都会被赋予自己的反向引用编号。为了克服这个问题, (?| 语法允许有重复的数字。考虑以下与字符串 Sunday 匹配的正则表达式:

(?:(Sat)ur|(Sun))day

这里 Sun 存储在反向引用 2 中,而反向引用 1 为空。匹配在反向引用 1 中产生 Sat,而反向引用 2 不存在更改模式以使用 (?| 修复了此问题:

(?|(Sat)ur|(Sun))day

使用此模式,Sun 和 Sat 都将存储在反向引用 1 中。

参考:http://php.net/manual/en/regexp.reference.subpatterns.php

Subpatterns

Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpattern does two things:

  1. It localizes a set of alternatives. For example, the pattern
    cat(aract|erpillar|) matches one of the words "cat", "cataract", or
    "caterpillar". Without the parentheses, it would match "cataract",
    "erpillar" or the empty string.
  2. It sets up the subpattern as a capturing subpattern (as defined
    above). When the whole pattern matches, that portion of the subject
    string that matched the subpattern is passed back to the caller via
    the ovector argument of pcre_exec(). Opening parentheses are counted
    from left to right (starting from 1) to obtain the numbers of the
    capturing subpatterns.

For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.

The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 65535. It may not be possible to compile such large patterns, however, depending on the configuration options of libpcre.

As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns

(?i:saturday|sunday)
(?:(?i)saturday|sunday)

match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".

It is possible to name a subpattern using the syntax (?Ppattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name. PHP 5.2.2 introduced two alternative syntaxes (?pattern) and (?'name'pattern).

Sometimes it is necessary to have multiple matching, but alternating subgroups in a regular expression. Normally, each of these would be given their own backreference number even though only one of them would ever possibly match. To overcome this, the (?| syntax allows having duplicate numbers. Consider the following regex matched against the string Sunday:

(?:(Sat)ur|(Sun))day

Here Sun is stored in backreference 2, while backreference 1 is empty. Matching yields Sat in backreference 1 while backreference 2 does not exist. Changing the pattern to use the (?| fixes this problem:

(?|(Sat)ur|(Sun))day

Using this pattern, both Sun and Sat would be stored in backreference 1.

Reference : http://php.net/manual/en/regexp.reference.subpatterns.php

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文