Perl 正则表达式问题
作为一名刚刚接触 Perl 的 PHP 程序员,我在“Perl 编程”中遇到过以下正则表达式:
/^(.*?): (.*)$/;
该正则表达式旨在解析电子邮件标头并将其插入到哈希中。电子邮件标头包含在单独的 .txt 文件中,格式如下:
From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here
我用于处理此示例正则表达式的整个代码如下:
use warnings;
use strict;
my %fields = ();
open(FILE, 'header.txt') or die('Could not open.');
while(<FILE>)
{
/^(.*?): (.*)$/;
$fields{$1} = $2;
}
foreach(%fields)
{
print;
print "\n";
}
现在,讨论我的问题。我不确定为什么第一个子模式已修改为使用最小量词。这也许是一个值得纠结的小问题,但我不明白为什么这样做。
感谢您的回复。
As a PHP programmer new to Perl working through 'Programming Perl', I have come across the following regex:
/^(.*?): (.*)$/;
This regex is intended to parse an email header and insert it into a hash. The email header is contained in a seperate .txt file and is in the following format:
From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here
The entire code I am using to work with this example regex is as follows:
use warnings;
use strict;
my %fields = ();
open(FILE, 'header.txt') or die('Could not open.');
while(<FILE>)
{
/^(.*?): (.*)$/;
$fields{$1} = $2;
}
foreach(%fields)
{
print;
print "\n";
}
Now, onto my question. I am unsure as to why the first subpattern has been modified to use a minimal quantifier. It is perhaps a small point to get hung up with, but I cannot see why it has been done.
Thanks for any replies.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
否则,如果值包含
:
,则存在无法正确匹配的风险。想象一下:
如果没有最小匹配,
$1
将得到Subject: Urgent
,而$2
将是Need a regex
。If it hadn't, there is a risk that it wouldn't match correctly if the value contains
:<space>
.Imagine:
Without the minimal match
$1
would getSubject: Urgent
, and$2
would beNeed a regex
.考虑一下如果主题是
主题:RE:回复某事
,会发生什么。最小量词将在
Subject
之后停止,但贪婪量词将匹配RE
。Consider what happens if the subject is
Subject: RE: reply to something
.A minimal quantifier will stop after
Subject
, but the greedy quantifier will match up toRE
.因为否则它将匹配所有字符,直到最后一个“:”。例如,如果没有最小量词,此字符串:
将匹配“Test: My: Weird”作为第一组。但使用最小量词时,它将仅匹配“Test”。
Because otherwise it will match all characters till last ':'. For example, without minimal quantifier this string:
will match "Test: My: Weird" as the first group. But with minimal quantifier it will match only "Test".
它使用最小量词的原因是它不需要读取冒号以外的任何内容。事实上,不应该。我不确定这些关键字中可以存在哪些字符,但我很确定
.
有点太宽,这就是问题所在。如果您的字段包含任何冒号,则非最小正则表达式会将其全部吞噬,例如:如果第一个子模式是贪婪的,它将抓取
Subject: Counter Strike
,而不仅仅是Subject
。The reason it uses a minimal quantifier is that it does not need to read any further than the colon. And in fact, it should not. I'm not sure what characters can exist in these keywords, but I am pretty sure
.
is a bit too wide, and that is the problem. If your fields contain any colons, a non-minimal regex would gobble it all up, for example:If the first subpattern was greedy, it would grab
Subject: Counter Strike
, and not justSubject
.如果没有最小量词,日期行的第一个捕获不是“Date: Mon, 1st Jan 2000 09:00:”而不是“Date:”吗?
Without a minimal quantifier, wouldn't the first capture for the Date line be "Date: Mon, 1st Jan 2000 09:00:" instead of "Date:"?
如果没有这个最小量词,从“Date:”行获得的 $1 的值实际上是“Date: Mon, 1st Jan 2000 09:00”,因为 Perl 正则表达式默认是贪婪的。
Without that minimal quantifier, the value for $1 obtained from the "Date:" line would actually be "Date: Mon, 1st Jan 2000 09:00" due to Perl regex being greedy by default.