统计词频然后排序
我正在编写一个 perl 脚本,其中 a 应该处理文本,然后向字典提供单词频率,然后对字典进行排序。该文本摘自埃德加·坡的《Golden Bug》,目的是计算所有单词的频率。但我做错了,因为我没有得到输出。我什么时候做错事了?谢谢。
open(TEXT, "goldenbug.txt") or die("File not found");
while(<TEXT>)
{
chomp;
$_=lc;
s/--/ /g;
s/ +/ /g;
s/[.,:;?"()]//g;
@word=split(/ /);
foreach $word (@words)
{
if( /(\w+)'\W/ )
{
if($1 eq 'bug')
{
$word=~s/'//g;
}
}
if( /\W'(\w+)/)
{
if(($1 ne 'change') and ($1 ne 'em') and ($1 ne 'prentices'))
{
$word=~s/'//g;
}
}
$dictionary{$word}+=1;
}
}
foreach $word(sort byDescendingValues keys %dictionary)
{
print "$word, $dictionary{$word}\n";
}
sub byDescendingValues
{
$value=$dictionaty{$b} <=> $dictionary{$a};
if ($value==0)
{
return $a cmp $b
}
else
{
return $value;
}
}
I'm writing a perl script where a should process the text and then provide the dictionary with word frequences and then sort the dictionary. The text is an extract from "Golden Bug" by Edgar Poe and the purpose is to calculate frequences of all of the words. But I do smth wrong because I get no output. When am I doing wrong? Thanks.
open(TEXT, "goldenbug.txt") or die("File not found");
while(<TEXT>)
{
chomp;
$_=lc;
s/--/ /g;
s/ +/ /g;
s/[.,:;?"()]//g;
@word=split(/ /);
foreach $word (@words)
{
if( /(\w+)'\W/ )
{
if($1 eq 'bug')
{
$word=~s/'//g;
}
}
if( /\W'(\w+)/)
{
if(($1 ne 'change') and ($1 ne 'em') and ($1 ne 'prentices'))
{
$word=~s/'//g;
}
}
$dictionary{$word}+=1;
}
}
foreach $word(sort byDescendingValues keys %dictionary)
{
print "$word, $dictionary{$word}\n";
}
sub byDescendingValues
{
$value=$dictionaty{$b} <=> $dictionary{$a};
if ($value==0)
{
return $a cmp $b
}
else
{
return $value;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的代码中有:
您在拆分期间将数组命名为
@word
,但在 for 循环中使用数组@words
。应该是
byDescendingValues 例程中的另一个拼写错误:
正如其他答案中所建议的,您确实应该添加
使用这些,您可以轻松地捕获这些拼写错误。没有他们,你会浪费很多时间。
You have in your code:
You've named the array as
@word
during the split but you are using the array@words
in the for loop.should be
Another typo in the
byDescendingValues
routine:As suggested in other answer, you really should add
Using these you could have easily caught these typos. Without them you'll be wasting lot of your time.
除了混淆 @word 和 @words 之外,您还使用 $dictionaty 而不是 $dictionary。明智的做法是
在程序开始时使用
my
声明所有变量。这样,像这样的小错误就可以由 Perl 本身修复。As well as confusing @word and @words, you are also using $dictionaty instead of $dictionary. It is wise to
at the start of your program and declare all of your variables using
my
. That way trivial bugs like this are fixed by Perl itself.