Perl 正则表达式问题

发布于 2024-11-08 16:07:37 字数 878 浏览 6 评论 0原文

作为一名刚刚接触 Perl 的 PHP 程序员，我在“Perl 编程”中遇到过以下正则表达式：

/^(.*?): (.*)$/;

该正则表达式旨在解析电子邮件标头并将其插入到哈希中。电子邮件标头包含在单独的 .txt 文件中，格式如下：

From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here

我用于处理此示例正则表达式的整个代码如下：

use warnings;
use strict;

my %fields = ();

open(FILE, 'header.txt') or die('Could not open.');

while(<FILE>)
{
    /^(.*?): (.*)$/;
    $fields{$1} = $2;
}

foreach(%fields)
{
    print;
    print "\n";
}

现在，讨论我的问题。我不确定为什么第一个子模式已修改为使用最小量词。这也许是一个值得纠结的小问题，但我不明白为什么这样做。

感谢您的回复。

原文

As a PHP programmer new to Perl working through 'Programming Perl', I have come across the following regex:

/^(.*?): (.*)$/;

This regex is intended to parse an email header and insert it into a hash. The email header is contained in a seperate .txt file and is in the following format:

From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here

The entire code I am using to work with this example regex is as follows:

use warnings;
use strict;

my %fields = ();

open(FILE, 'header.txt') or die('Could not open.');

while(<FILE>)
{
    /^(.*?): (.*)$/;
    $fields{$1} = $2;
}

foreach(%fields)
{
    print;
    print "\n";
}

Now, onto my question. I am unsure as to why the first subpattern has been modified to use a minimal quantifier. It is perhaps a small point to get hung up with, but I cannot see why it has been done.

Thanks for any replies.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

画中仙 2024-11-15 16:07:37

否则，如果值包含 :，则存在无法正确匹配的风险。

想象一下：

Subject: Urgent: Need a regex

如果没有最小匹配，$1 将得到 Subject: Urgent，而 $2 将是 Need a regex。

If it hadn't, there is a risk that it wouldn't match correctly if the value contains :<space>.

Imagine:

Subject: Urgent: Need a regex

Without the minimal match $1 would get Subject: Urgent, and $2 would be Need a regex.

回复收藏 0 原文

暮凉 2024-11-15 16:07:37

考虑一下如果主题是主题：RE：回复某事，会发生什么。

最小量词将在 Subject 之后停止，但贪婪量词将匹配 RE。

回复收藏 0 原文

江南月 2024-11-15 16:07:37

因为否则它将匹配所有字符，直到最后一个“：”。例如，如果没有最小量词，此字符串：

Test: My: Weird: String

将匹配“Test: My: Weird”作为第一组。但使用最小量词时，它将仅匹配“Test”。

Because otherwise it will match all characters till last ':'. For example, without minimal quantifier this string:

Test: My: Weird: String

will match "Test: My: Weird" as the first group. But with minimal quantifier it will match only "Test".

回复收藏 0 原文

左秋 2024-11-15 16:07:37

它使用最小量词的原因是它不需要读取冒号以外的任何内容。事实上，不应该。我不确定这些关键字中可以存在哪些字符，但我很确定 . 有点太宽，这就是问题所在。如果您的字段包含任何冒号，则非最小正则表达式会将其全部吞噬，例如：

Subject: Counter Strike: Source

如果第一个子模式是贪婪的，它将抓取 Subject: Counter Strike，而不仅仅是 Subject。

The reason it uses a minimal quantifier is that it does not need to read any further than the colon. And in fact, it should not. I'm not sure what characters can exist in these keywords, but I am pretty sure . is a bit too wide, and that is the problem. If your fields contain any colons, a non-minimal regex would gobble it all up, for example: