Perl 正则表达式问题

发布于 2024-11-08 16:07:37 字数 878 浏览 2 评论 0原文

作为一名刚刚接触 Perl 的 PHP 程序员,我在“Perl 编程”中遇到过以下正则表达式:

/^(.*?): (.*)$/;

该正则表达式旨在解析电子邮件标头并将其插入到哈希中。电子邮件标头包含在单独的 .txt 文件中,格式如下:

From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here

我用于处理此示例正则表达式的整个代码如下:

use warnings;
use strict;

my %fields = ();

open(FILE, 'header.txt') or die('Could not open.');

while(<FILE>)
{
    /^(.*?): (.*)$/;
    $fields{$1} = $2;
}

foreach(%fields)
{
    print;
    print "\n";
}

现在,讨论我的问题。我不确定为什么第一个子模式已修改为使用最小量词。这也许是一个值得纠结的小问题,但我不明白为什么这样做。

感谢您的回复。

As a PHP programmer new to Perl working through 'Programming Perl', I have come across the following regex:

/^(.*?): (.*)$/;

This regex is intended to parse an email header and insert it into a hash. The email header is contained in a seperate .txt file and is in the following format:

From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here

The entire code I am using to work with this example regex is as follows:

use warnings;
use strict;

my %fields = ();

open(FILE, 'header.txt') or die('Could not open.');

while(<FILE>)
{
    /^(.*?): (.*)$/;
    $fields{$1} = $2;
}

foreach(%fields)
{
    print;
    print "\n";
}

Now, onto my question. I am unsure as to why the first subpattern has been modified to use a minimal quantifier. It is perhaps a small point to get hung up with, but I cannot see why it has been done.

Thanks for any replies.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

画中仙 2024-11-15 16:07:37

否则,如果值包含 :,则存在无法正确匹配的风险。

想象一下:

Subject: Urgent: Need a regex

如果没有最小匹配,$1 将得到 Subject: Urgent,而 $2 将是 Need a regex

If it hadn't, there is a risk that it wouldn't match correctly if the value contains :<space>.

Imagine:

Subject: Urgent: Need a regex

Without the minimal match $1 would get Subject: Urgent, and $2 would be Need a regex.

暮凉 2024-11-15 16:07:37

考虑一下如果主题是主题:RE:回复某事,会发生什么。

最小量词将在 Subject 之后停止,但贪婪量词将匹配 RE

Consider what happens if the subject is Subject: RE: reply to something.

A minimal quantifier will stop after Subject, but the greedy quantifier will match up to RE.

江南月 2024-11-15 16:07:37

因为否则它将匹配所有字符,直到最后一个“:”。例如,如果没有最小量词,此字符串:

Test: My: Weird: String

将匹配“Test: My: Weird”作为第一组。但使用最小量词时,它将仅匹配“Test”。

Because otherwise it will match all characters till last ':'. For example, without minimal quantifier this string:

Test: My: Weird: String

will match "Test: My: Weird" as the first group. But with minimal quantifier it will match only "Test".

左秋 2024-11-15 16:07:37

它使用最小量词的原因是它不需要读取冒号以外的任何内容。事实上,不应该。我不确定这些关键字中可以存在哪些字符,但我很确定 . 有点太宽,这就是问题所在。如果您的字段包含任何冒号,则非最小正则表达式会将其全部吞噬,例如:

Subject: Counter Strike: Source

如果第一个子模式是贪婪的,它将抓取 Subject: Counter Strike,而不仅仅是 Subject

The reason it uses a minimal quantifier is that it does not need to read any further than the colon. And in fact, it should not. I'm not sure what characters can exist in these keywords, but I am pretty sure . is a bit too wide, and that is the problem. If your fields contain any colons, a non-minimal regex would gobble it all up, for example:

Subject: Counter Strike: Source

If the first subpattern was greedy, it would grab Subject: Counter Strike, and not just Subject.

花开半夏魅人心 2024-11-15 16:07:37

如果没有最小量词,日期行的第一个捕获不是“Date: Mon, 1st Jan 2000 09:00:”而不是“Date:”吗?

Without a minimal quantifier, wouldn't the first capture for the Date line be "Date: Mon, 1st Jan 2000 09:00:" instead of "Date:"?

愁以何悠 2024-11-15 16:07:37

如果没有这个最小量词,从“Date:”行获得的 $1 的值实际上是“Date: Mon, 1st Jan 2000 09:00”,因为 Perl 正则表达式默认是贪婪的。

Without that minimal quantifier, the value for $1 obtained from the "Date:" line would actually be "Date: Mon, 1st Jan 2000 09:00" due to Perl regex being greedy by default.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文