空反向引用会导致 PHP 中的匹配失败...有解决方法吗?
我在使用 PHP 中的正则表达式时遇到问题,该表达式使用可能为空的反向引用。我希望它能按照 http://www.regular-expressions.info/ 中的说明工作括号.html:
如果在 特定的匹配尝试(例如 第一个例子的问题 标记进行了第一个反向引用 可选),它只是空的。使用 正则表达式中的空反向引用是 完全没问题。这将只是 取而代之的是虚无。
然而 PHP 似乎有点不同......来自 http:// php.net/manual/en/regexp.reference.back-references.php:
如果子模式实际上尚未被 用于特定比赛,然后任何 对它的反向引用总是失败。
作为一个简化的示例,我想使用此正则表达式匹配以下两件事:
- {something} ... {/something}
- {something:else} ... {/something:else}
其中“something”是提前知道的,“else”可以是任何东西(或什么都不是)。
所以我尝试了以下正则表达式(为了简单起见,对“else”进行了硬编码):
preg_match("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches)
不幸的是 if (:else)?不匹配,\2 反向引用失败。如果我将 \2 设为可选(\2?),那么我可能会匹配 {something} ... {something:else},这是不好的。
我是否遇到了正则表达式的限制(臭名昭著的“你需要一个解析器,而不是正则表达式”)或者这个可以修复吗?
测试程序:
<?php
$data = "{something} ... {/something}
{something:else} ... {/something:else}
{something:else} ... {/something}";
// won't match {something} ... {/something}
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches);
print_r($matches);
// change \\2 to \\2? and it matches too much
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2?\}/is", $data, $matches);
print_r($matches);
?>
I'm having trouble with a regular expression in PHP that uses a potentially empty backreference. I was hoping that it would work as explained in http://www.regular-expressions.info/brackets.html:
If a backreference was not used in a
particular match attempt (such as in
the first example where the question
mark made the first backreference
optional), it is simply empty. Using
an empty backreference in the regex is
perfectly fine. It will simply be
replaced with nothingness.
However it seems PHP is a bit different... from http://php.net/manual/en/regexp.reference.back-references.php:
If a subpattern has not actually been
used in a particular match, then any
back references to it always fail.
As a simplified example, I want to match the following two things with this regex:
- {something} ... {/something}
- {something:else} ... {/something:else}
Where "something" is known ahead of time, and "else" can be anything (or nothing).
so I tried the following regex ("else" hardcoded for simplicity):
preg_match("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches)
Unfortunately if (:else)? doesn't match, the \2 backreference fails. If I make \2 optional (\2?), then I might match {something} ... {something:else}, which is no good.
Have I run into a limitation of regular expressions (the infamous "you need a parser, not a regex") or is this fixable?
Test program:
<?php
$data = "{something} ... {/something}
{something:else} ... {/something:else}
{something:else} ... {/something}";
// won't match {something} ... {/something}
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches);
print_r($matches);
// change \\2 to \\2? and it matches too much
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2?\}/is", $data, $matches);
print_r($matches);
?>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
那么,为什么不更换呢?与或?
更改
为
这种方式,引用将始终被捕获,但有时会为空(这是可以的)...
Well, why not replace the ? with an or?
Change
To
That way the reference will always be captured, but it will sometimes be empty (which is ok)...
为什么不简单地使用 \1 而不是 \2 呢?
至于“你需要一个解析器”问题,你将/确实需要它来解析嵌套结构。
why don't you simply use \1 instead of \2?
as to "you need a parser" problem, you will / do need it to parse nested constructs.
针对这种情况构建了以下课程(例如
{某事} ... {/某事} 或 {某事} ... {某事} ... {/某事} {/某事}
等等。
SL5_preg_contentFinder 类
https://gist.github 的示例.com/sl5net/7029093#file-sl5_preg_contentfinder-php
嗨[02.o0]2
何3
``;
following class in constucted for such cases (like
{something} ... {/something} or {something} ... {something} ... {/something} {/something}
and more.
example with SL5_preg_contentFinder class
https://gist.github.com/sl5net/7029093#file-sl5_preg_contentfinder-php
hi [02.o0]2
ho 3
`';