如何使用 Perl 正则表达式从一行中提取多个电子邮件地址?
我有一个文本文件,最初是数据库表的 mysql 转储。如何编写 Perl 脚本来从此文本文件中提取所有电子邮件地址?
我遇到的问题是,我逐行阅读文档的行,然后执行正则表达式,但是,如果同一行上有多个电子邮件地址,我的脚本似乎会失败。
open (FP, '<my-large-file.txt');
while($line = <FP>)
{
#if($line =~ /\s([\S]{1,80}[@]{1}[\S]{2,100})\s/)
#if($line =~ /([\S]{1,80}[@]{1}[\S]{2,100})\s/)
#if($line =~ /([\S]{1,80}[@]{1}[\S]{2,100})[,]/)
{
push(@emails, $1);
}
}
close (FP);
我一直在使用上面的代码,但没有得到想要的结果。
I have a text file which was originally a mysql dump of a database table. How do I write a Perl script to extract all the email addresses from this text file?
The problem I am having is that I read in the lines of the document one by one then do a regular expression, however in cases where there are more than one email address on the same line my script seems to fail.
open (FP, '<my-large-file.txt');
while($line = <FP>)
{
#if($line =~ /\s([\S]{1,80}[@]{1}[\S]{2,100})\s/)
#if($line =~ /([\S]{1,80}[@]{1}[\S]{2,100})\s/)
#if($line =~ /([\S]{1,80}[@]{1}[\S]{2,100})[,]/)
{
push(@emails, $1);
}
}
close (FP);
I have been playing with the above code but did not get the desired outcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用
/g
修饰符多次匹配正则表达式:Use the
/g
modifier to match a regular expression multiple times:听起来您对正则表达式很满意,但只需要捕获多个匹配项。
为此,您需要停止使用
if
并使用/g
修饰符。此外,如果文件足够小,您可以将整个文件读入单个字符串,因为您现在可以处理多个匹配项。
这是一个示例:
匹配项在数组上下文中返回,因此不需要捕获括号或 $1。
Sounds like you're happy with the regex but just need to capture multiple matches.
To do that, you need to stop using
if
and use the/g
modifier.Additionally, if the file is small enough, you could then read the whole file into a single string since you are able to deal with multiple matches now.
Here is an example:
The matches are returned in the array context, so don't need capturing parens or $1 either.
伊万·涅沃斯楚耶夫的回答有缺陷。
这将不起作用,因为
@matches
与$_
匹配,而$_
将不存在,因为
正在被读入$行
。另外,恕我直言,创建两个数组符号(一个来自另一个)似乎实现得很差。对我来说,从结果列表中创建一个符号似乎更好,并且该符号是您需要的所有数据的容器。请参阅下面我的示例。*** 这是电子邮件正则表达式的一个很好的链接 http://www .ex-parrot.com/pdw/Mail-RFC822-Address.html。我建议以这样的方式进行工作(过度杀戮?是的,可能是(:):
Ivan Nevostruev's answer has flaws.
This will not work because
@matches
is matching against$_
, which will not exist because<FP>
is being read into$line
. Also, imho, creating two array symbols (one from the other) seems poorly implemented. It seems better to me to create one symbol from the resulting list and that one symbol being the container for all of the data you need. See below for my example.*** Here's a nice link for an email regex http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html. I recommend working it in something like this (over-kill? Yeah, probably (:):
假设您的正则表达式本身是正确的,要匹配多次,请将
g
修饰符放在末尾以匹配多次,然后返回列表。你的代码看起来像这样:assuming your regex itself is correct, to match more than once, place the
g
modifier at the end to match multiple times, and then return the list. Your code then looks something like this:假设电子邮件地址本身就是一个由空格分隔的单词,不需要太多正则表达式
assuming email addr are one word by itself separated by spaces, no need for too much regex
我认为可能的错误之一是使用
\s
而不是\b
。尝试以下条件:同时始终检查文件是否打开成功。并
使用严格/警告
:I think one of posible falure is using
\s
instead of\b
. Try following condition:Also always check if file is opened successful. And
use strict/warnings
: