Perl - 如何计算和打印电子邮件地址数组中域的出现次数?
我已经为此苦苦挣扎了几天,似乎无法弄清楚。
我有一组电子邮件地址,是通过 while 循环中的 push(@emails,$email)
创建的。
我正在尝试创建一个唯一域列表,其中每个域的出现次数在数组中。
按出现次数排序。
因此,如果数组 @emails
具有:
[电子邮件受保护] [电子邮件受保护] [电子邮件受保护] [email protected]
我可以打印:
yadoo.com 2
geemail.net 1
zoohoo.org 1
我根据电子邮件找到了这个示例一个文件,但是,超出了我的能力范围。有人可以帮助我编写一个可以与电子邮件地址数组一起使用的更详细的代码示例吗?
perl -e 'while(<>){chomp;/^[^@]+@([^@]+)$/;$h{$1}++;}
foreach $k (sort { $h{$b} <=> $h{$a} } keys %h) {print $h{$k}." ".$k."\n";} infile
我还尝试过:(更多的是我缺乏理解的程度)
foreach my $domain (sort keys %$domains) {
print "$domain"."=";
print $domains->{$domain}."\n";
};
并且
my %countdoms;
$countdoms{$_}++ for @domains;
print "$_ $countdoms{$_}\n" for keys %countdoms;
我在许多不同尝试中得到的最佳结果是总计数(这是1812(准确计数),旁边有一个数字2。我很接近,可能吗?
I have been struggling with this for a couple days now and cannot seem to figure it out.
I have an array of email addresses that were created via push(@emails,$email)
in a while loop.
I am attempting to create a list of unique domains with occurrence count of each in the array.
Ordered by number of occurrences.
So, if the array @emails
has:
[email protected] [email protected] [email protected] [email protected]
I can print:
yadoo.com 2
geemail.net 1
zoohoo.org 1
I found this example based on emails in a file but, WAY over my head. Can someone help me in a more verbose code example that can be used with an array of email addresses?
perl -e 'while(<>){chomp;/^[^@]+@([^@]+)$/;$h{$1}++;}
foreach $k (sort { $h{$b} <=> $h{$a} } keys %h) {print $h{$k}." ".$k."\n";} infile
I also tried: (more to my level of lack of understanding)
foreach my $domain (sort keys %$domains) {
print "$domain"."=";
print $domains->{$domain}."\n";
};
AND
my %countdoms;
$countdoms{$_}++ for @domains;
print "$_ $countdoms{$_}\n" for keys %countdoms;
The best result I got of many different attempts was a total count (which was 1812 (accurate count) with a number 2 next to it. I am close, possibly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
让我解释一下您的代码示例在做什么,而不是给您另一个答案:
第一行计算文件中电子邮件的域。
while(<>)
逐行迭代输入文件。输入文件是作为参数传递的文件,如果没有传递参数,则输入文件是标准输入。每行都放在$_
中。chomp;
只是删除$_
末尾的换行符。/^[^@]+@([^@]+)$/
是解析出域的正则表达式,应用于$_
。它检查第一部分中没有“@”的内容,然后检查“@”,最后一部分中没有“@”的内容。它会记住最后一部分,并将其存储在$1
中。^
和$
分别代表字符串的开头和结尾。$h{$1}++;
使用域(在$1
中)来增加哈希%h
中的计数。即使它不存在,这也可以工作,因为undef
在这里的行为类似于 0。为了使其适用于您的列表,您只需执行
第二行打印哈希
%h 中的域
。排序 { $h{$b} <=> $h{$a} }keys %h
使用比较函数$h{$b} <=> 返回按出现次数降序排序的域列表$h{$a}
查找计数。请注意,它是 b <=> a,不是<=>; b,这使其下降。第 2 行的其余部分打印结果。
Instead of giving you another answer, let me explain you what your code example is doing:
The first line counts the domains from emails in files.
while(<>)
iterates over the input files line by line. The input files are the file(s) passed as arguments or stdin if no arguments were passed. Each line is placed in$_
.chomp;
simply removes the newline from the end of$_
./^[^@]+@([^@]+)$/
is the regular expression that parses out the domain and is applied to$_
. It checks for something that has no '@' in the first part, then a '@' and then no '@' in the last part. It remembers the last part, which will be stored in$1
.^
and$
stand for the beginning and the end of the string, respectively.$h{$1}++;
uses the domain (in$1
) to increment the count in the hash%h
. This works even if it's not present, becauseundef
behaves here like 0.In order to make this work for your list, you can just do
The second line prints the domains from the hash
%h
.sort { $h{$b} <=> $h{$a} } keys %h
returns a list of domains sorted by descending occurrence by using the comparison function$h{$b} <=> $h{$a}
to look up the count. Note that it's b <=> a, not a <=> b, this makes it descending.The rest of line 2 prints out the result.
如果您的电子邮件地址填充在数组中,这将为您提供每个域的计数。我相信有人可以制作出更漂亮的东西!
If you have your email address populated in an array this'll get you a count for each domain. I'm sure someone can produce something prettier!
这有点粗糙,因为我对 Perl 很生疏,但这应该可以完成工作:
It's a bit crude because I am rusty on Perl but this should do the job:
另一种变化:
Another variation: