为什么我的非贪婪 Perl 正则表达式没有匹配任何内容?

发布于 2024-07-16 06:18:34 字数 463 浏览 14 评论 0原文

我以为我对 Perl RE 的理解达到了合理的程度,但这让我感到困惑:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured $1\n";
       print "Matched $&";
}
else {
       print "What?!!";
}

打印

被捕获
匹配'

看起来它已经匹配了单独的结尾”,因此什么也没捕获。
我本希望它能够匹配整个事情,或者如果它完全不贪婪,则什么都不匹配(因为一切都有一个可选的匹配)。
这种中间行为让我感到困惑,有人能解释发生了什么吗?

I thought I understood Perl RE to a reasonable extent, but this is puzzling me:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured $1\n";
       print "Matched 
amp;";
}
else {
       print "What?!!";
}

prints

Captured
Matched '

It seems it has matched the ending ' alone, and so captured nothing.
I would have expected it to match the entire thing, or if it's totally non-greedy, nothing at all (as everything there is an optional match).
This in between behaviour baffles me, can anyone explain what is happening?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

茶底世界 2024-07-23 06:18:34

开头和结尾的\'?表示贪婪地匹配0或1个撇号。 (正如另一位发帖者指出的,为了使其非贪婪,它必须是 \'??

中间的 .*? 表示 非贪婪地匹配 0 个或多个字符

Perl 正则表达式引擎将查看字符串的第一部分。 它将匹配开头,但这样做非常贪婪,因此它选择了第一个撇号。 然后它会非贪婪地匹配(因此需要尽可能少的时间),后跟一个可选的撇号。 这与空字符串匹配。

The \'? at the beginning and end means match 0 or 1 apostrophes greedily. (As another poster has pointed out, to make it non-greedy, it would have to be \'??)

The .*? in the middle means match 0 or more characters non-greedily.

The Perl regular expression engine will look at the first part of the string. It will match the beginning, but does so greedily, so it picks up the first apostrophe. It then matches non-greedily (so takes as little as it can) followed by an optional apostrophe. This is matched by the empty string.

望笑 2024-07-23 06:18:34

我认为你的意思是这样的:

/'(.*?)'/      // matches everything in single quotes

或者

/'[^']*'/      // matches everything in single quotes, but faster

单引号不需要转义,据我所知。

I think you mean something like:

/'(.*?)'/      // matches everything in single quotes

or

/'[^']*'/      // matches everything in single quotes, but faster

The singe quotes don't need to be escaped, AFAIK.

太阳哥哥 2024-07-23 06:18:34

pattern? 是贪婪的,如果你希望它是非贪婪的,你必须说 pattern??:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured [$1]\n";
       print "Matched  [
amp;]\n";
}
if($test =~ /\'??(.*?)\'??/) {
       print "Captured [$1]\n";
       print "Matched  [
amp;]\n";
}

from perldoc perlre:

可以识别以下标准量词:

* 匹配0次或多次 
  + 匹配1次或多次 
  ?   匹配 1 或 0 次 
  {n} 精确匹配 n 次 
  {n,} 至少匹配 n 次 
  {n,m} 匹配至少 n 次但不超过 m 次 
  

默认情况下,量化子模式是“贪婪”的,即它会匹配
尽可能多次(给定特定的起始位置),同时
仍然允许模式的其余部分匹配。 如果你想要它
匹配尽可能少的次数,量词后面带有
A ”?”。 请注意,含义没有改变,只是“贪婪”:

<前><代码>*? 匹配0次或多次
+? 匹配1次或多次
?? 匹配 0 或 1 次
{n}? 精确匹配n次
{n,}? 至少匹配n次
{n,m}? 匹配至少n次但不超过m次

pattern? is greedy, if you want it to be non-greedy you must say pattern??:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured [$1]\n";
       print "Matched  [
amp;]\n";
}
if($test =~ /\'??(.*?)\'??/) {
       print "Captured [$1]\n";
       print "Matched  [
amp;]\n";
}

from perldoc perlre:

The following standard quantifiers are recognized:

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

By default, a quantified subpattern is "greedy", that is, it will match
as many times as possible (given a particular starting location) while
still allowing the rest of the pattern to match. If you want it to
match the minimum number of times possible, follow the quantifier with
a "?". Note that the meanings don’t change, just the "greediness":

*?     Match 0 or more times
+?     Match 1 or more times
??     Match 0 or 1 time
{n}?   Match exactly n times
{n,}?  Match at least n times
{n,m}? Match at least n but not more than m times
小…红帽 2024-07-23 06:18:34

请注意不要将正则表达式的所有元素设置为可选(即使用 * 或 ? 量化所有元素)。 这使得 Perl 正则表达式引擎可以根据需要进行匹配(甚至什么都不匹配),同时仍然认为匹配成功。

我怀疑你想要的是

/'(.*?)'/

Beware of making all elements of your regex optional (i.e. having all elements quantified with * or ? ). This lets the Perl regex engine match as much as it wants (even nothing), while still considering the match successful.

I suspect what you want is

/'(.*?)'/
无尽的现实 2024-07-23 06:18:34

我想说,最接近您正在寻找的答案是

/'?([^']*)'?/

“获取单引号(如果存在)”、“获取任何非单引号的内容”、“获取最后一个单引号(如果存在)”。

除非你想匹配“'不要这样做'” - 但无论如何谁在单引号中使用撇号(并且长期使用它)? :)

I would say the closest answer to what you are looking for is

/'?([^']*)'?/

So "get the single quote if it's there", "get anything and everything that's not a single quote", "get the last single quote if it's there".

Unless you want to match "'don't do this'" - but who uses an apostrophe in a single quote anyway (and gets away with it for long)? :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文