使用 Perl 计算文本中每个单词的字母
我正在尝试用 Perl 编写一个程序,它应该返回文件中所有单词的频率以及文件中每个单词的长度(不是所有字符的总和!),以从西班牙语文本(不是如果您不知道齐普夫曲线是什么,那就很重要了)。现在我的问题是:我可以完成第一部分,并且获得所有单词的频率,但我不知道如何获得每个单词的长度! :( 我知道命令行 $word_length = length($words) 但在尝试更改代码后,我真的不知道应该在哪里包含它以及如何计算每个单词的长度。
这就是我的代码的样子,直到知道:
#!/usr/bin/perl
use strict;
use warnings;
my %count_of;
while (my $line = <>) { #read from file or STDIN
foreach my $word (split /\s+/gi, $line){
$count_of{$word}++;
}
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
print "$word: $count_of{$word}\n";
}
__END__
我希望有人有任何建议!
I am trying to write a program wit Perl which should returns the frequency of all words in the file and the length of each word in the file (not the sum of all characters!) to produce a Zipf curve from a Spanish text (is not a big deal if you don't know what a Zipf's curve is). Now my problem is: I can do the first part and I get the frequency of all word but I don't how to get the length of each word! :( I know the command line
$word_length = length($words) but after trying to change the code I really don't know where I should include it and how to count the length for each word.
That's how my code looks like until know:
#!/usr/bin/perl
use strict;
use warnings;
my %count_of;
while (my $line = <>) { #read from file or STDIN
foreach my $word (split /\s+/gi, $line){
$count_of{$word}++;
}
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
print "$word: $count_of{$word}\n";
}
__END__
I hope somebody have any suggestions!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果你想存储单词的长度,你可以使用哈希的哈希。
You can use hash of hashes if you want to store the length of the word.
这将在计数旁边打印长度:
This will print the length right next to the count:
仅供您参考 - 另一种可能性
可能是:
它不像 toolic 那样清晰的解决方案,但可以为您提供对此问题的其他看法(TIMTOWTDI :))
小解释:
\w 和 g修饰符匹配 $word 中的每个字母
$1 防止用 s/// 覆盖原始 $word
s/// 返回 $word 中的字母数(与 \w 匹配)
Just for your information - the other possibility for
might be:
It is not as clear solution as toolic but can give you other view on this issue (TIMTOWTDI :))
Little explanation:
\w and g modifier matches every letter in your $word
$1 prevents overwriting original $word by s///
s/// returns number of letters (matched with \w) in $word