使用 Perl 计算文本中每个单词的字母

发布于 2024-11-11 14:00:57 字数 625 浏览 15 评论 0原文

我正在尝试用 Perl 编写一个程序,它应该返回文件中所有单词的频率以及文件中每个单词的长度(不是所有字符的总和!),以从西班牙语文本(不是如果您不知道齐普夫曲线是什么,那就很重要了)。现在我的问题是:我可以完成第一部分,并且获得所有单词的频率,但我不知道如何获得每个单词的长度! :( 我知道命令行 $word_length = length($words) 但在尝试更改代码后,我真的不知道应该在哪里包含它以及如何计算每个单词的长度。

这就是我的代码的样子,直到知道:

#!/usr/bin/perl
use strict;
use warnings;

my %count_of;
while (my $line = <>) { #read from file or STDIN
  foreach my $word (split /\s+/gi, $line){
     $count_of{$word}++;
  }
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
  print "$word: $count_of{$word}\n";
}
__END__

我希望有人有任何建议!

I am trying to write a program wit Perl which should returns the frequency of all words in the file and the length of each word in the file (not the sum of all characters!) to produce a Zipf curve from a Spanish text (is not a big deal if you don't know what a Zipf's curve is). Now my problem is: I can do the first part and I get the frequency of all word but I don't how to get the length of each word! :( I know the command line
$word_length = length($words) but after trying to change the code I really don't know where I should include it and how to count the length for each word.

That's how my code looks like until know:

#!/usr/bin/perl
use strict;
use warnings;

my %count_of;
while (my $line = <>) { #read from file or STDIN
  foreach my $word (split /\s+/gi, $line){
     $count_of{$word}++;
  }
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
  print "$word: $count_of{$word}\n";
}
__END__

I hope somebody have any suggestions!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

尽揽少女心 2024-11-18 14:00:57

如果你想存储单词的长度,你可以使用哈希的哈希。

while (my $line = <>) {
    foreach my $word (split /\s+/, $line) {
        $count_of{$word}{word_count}++;
        $count_of{$word}{word_length} = length($word);
    }
}

print "All words and their counts and length: \n";
for my $word (sort keys %count_of) {
    print "$word: $count_of{$word}{word_count} ";
    print "Length of the word:$count_of{$word}{word_length}\n";
}

You can use hash of hashes if you want to store the length of the word.

while (my $line = <>) {
    foreach my $word (split /\s+/, $line) {
        $count_of{$word}{word_count}++;
        $count_of{$word}{word_length} = length($word);
    }
}

print "All words and their counts and length: \n";
for my $word (sort keys %count_of) {
    print "$word: $count_of{$word}{word_count} ";
    print "Length of the word:$count_of{$word}{word_length}\n";
}
万劫不复 2024-11-18 14:00:57

这将在计数旁边打印长度:

  print "$word: $count_of{$word} ", length($word), "\n";

This will print the length right next to the count:

  print "$word: $count_of{$word} ", length($word), "\n";
南冥有猫 2024-11-18 14:00:57

仅供您参考 - 另一种可能性

length length($word)

可能是:

$word =~ s/(\w)/$1/g

它不像 toolic 那样清晰的解决方案,但可以为您提供对此问题的其他看法(TIMTOWTDI :))

小解释:

\wg修饰符匹配 $word 中的每个字母

$1 防止用 s/// 覆盖原始 $word

s/// 返回 $word 中的字母数(与 \w 匹配)

Just for your information - the other possibility for

length length($word)

might be:

$word =~ s/(\w)/$1/g

It is not as clear solution as toolic but can give you other view on this issue (TIMTOWTDI :))

Little explanation:

\w and g modifier matches every letter in your $word

$1 prevents overwriting original $word by s///

s/// returns number of letters (matched with \w) in $word

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文