使用 Perl 替换运算符保留捕获
有人可以解释为什么下面的代码...
#!/opt/local/bin/perl
use strict;
use warnings;
my $string;
$string = "\t\t\tEntry";
print "String: >$string<\n";
$string =~ s/^(\t*)//gi;
print "\$1: >$1<\n";
print "String: >$string<\n";
print "\n";
$string = "\t\t\tEntry";
$string =~ s/^(\t*)([^\t]+)/$2/gi;
print "\$1: >$1<\n";
print "String: >$string<\n";
print "\n";
exit 0;
...产生以下输出...
String: > Entry<
Use of uninitialized value in concatenation (.) or string at ~/sandbox.pl line 12.
$1: ><
String: >Entry<
$1: > <
String: >Entry<
...或者更直接:为什么第一个替换中的匹配值没有保留在 $1 中?
Can someone explain why the following code...
#!/opt/local/bin/perl
use strict;
use warnings;
my $string;
$string = "\t\t\tEntry";
print "String: >$string<\n";
$string =~ s/^(\t*)//gi;
print "\$1: >$1<\n";
print "String: >$string<\n";
print "\n";
$string = "\t\t\tEntry";
$string =~ s/^(\t*)([^\t]+)/$2/gi;
print "\$1: >$1<\n";
print "String: >$string<\n";
print "\n";
exit 0;
...produces the following output...
String: > Entry<
Use of uninitialized value in concatenation (.) or string at ~/sandbox.pl line 12.
$1: ><
String: >Entry<
$1: > <
String: >Entry<
...or more directly: Why is the matched value in the first substitution not retained in $1?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我在 Perl 5.12 的两个实现上尝试过这一点,但没有遇到问题。 5.8 做到了。
因为您有
g
选项,perl 会尝试匹配该模式,直到失败。请参阅下面的调试输出。所以它在 Perl 5.8 中不起作用,但 this 可以:
因此每次匹配时,它都会将其保存到
$c1
中。这就是
use re 'debug'
告诉我的:因为您试图匹配行开头的空格,所以您既不需要
g
也不需要i
。因此,这可能是您正在尝试做其他事情的情况。I tried this on two implementations of Perl 5.12, and did not encounter the problem. 5.8 did.
Because you have the
g
options, perl tries to match the pattern until it fails. See the debug output below.So it doesn't work in Perl 5.8, but this does:
Thus each time it matches, it saves it to
$c1
.This is what
use re 'debug'
tells me:Because you are trying to match whitespace at the beginning of the line, you need neither the
g
nor thei
. So it might be a case where you're trying to do something else.我认为 5.10 及更高版本,如果存在匹配,它只会影响捕获缓冲区。
您的示例中有趣的事情是
$string =~ s/^(\t*)([^\t]+)/$2/gi;
它没有重置捕获缓冲区。这似乎是因为序言中估计了
如果应该尝试匹配。在本例中,
([^\t]+)
消耗了第一个中的整个字符串
匹配,因此出现
字符串太短
并且缓冲区从未重置。我无法测试它,但
$string =~ s/^(\t*)([^\t])//gi
应该给出相同的警告。if ( s///g ) {}
并且在这种情况下捕获缓冲区的测试不一定包含任何事物。 5.8版本就是这种情况。即使在更高版本中,它实际上也只是一个调试功能。
编辑 @theracoon - 关于您的评论:“我相当确定 ([^\t]+) 实际上并未消耗整个字符串。输出绝对没有反映这一点。”
这是您的正则表达式在第一个匹配(第 1 轮)中消耗了整个字符串的证明。
第二遍就没有什么可匹配的了。这就是 /g 修饰符的工作方式。
它尝试在字符串中最后一次匹配结束的位置再次匹配整个正则表达式。
通过 1 ..
将 REx
"^(\t*)([^\t]+)"
与"%t%t%tEntry"
8 <
%t%t%tEntry
> <>匹配成功!
通过 2 ..
将 REx
"^(\t*)([^\t]+)"
与""
进行匹配(不,没有什么可以匹配的)
字符串太短 [regexec_flags]...
匹配失败
'入口'
I think version 5.10 and beyond, it only affects capture buffers if there was a match.
The interesting thing in your example, is that with
$string =~ s/^(\t*)([^\t]+)/$2/gi;
it didin't reset the capture buffers. That appears to be because of a preamble that estimates
if the match should be tried. In this case,
([^\t]+)
consumed the entire string in the firstmatch, so a
string too short
occured and the buffers were never reset.I can't test it but
$string =~ s/^(\t*)([^\t])//gi
should give the same warning.if ( s///g ) {}
and testing of capture buffers in this case is not certain to containanything. This was the case in version 5.8. Even in later versions its really just a debug feature.
Edit @theracoon - on your comment: "I'm reasonably certain that ([^\t]+) did not actually consume the entire string. The output definitely does not reflect that."
This is a proof that your regex consumed the entire string on the first match, Pass 1.
There is nothing left to match on the second pass. That is the way the /g modifier works.
It tries to match the entire regex again, in the postion in the string where the last match left off.
Pass 1 ..
Matching REx
"^(\t*)([^\t]+)"
against"%t%t%tEntry"
8 <
%t%t%tEntry
> <>Match successful!
Pass 2 ..
Matching REx
"^(\t*)([^\t]+)"
against""
(Nope, nothing left to match)
String too short [regexec_flags]...
Match failed
'Entry'