使用正则表达式在 Perl 中计算句子/单词
这个正则表达式的东西已经过时了。 :( 还有一个问题: 我需要计算段落中的单词数和句子数。我尝试使用的代码是这样的:
my $sentencecount = $file =~ s/((^|\s)\S).*?(\.|\?|\!)/$1/g;
my $count = $file =~ s/((^|\s)\S)/$2/g;
print "Input file $ARGV[1] contains $sentencecount sentences and $count words.";
我的结果返回 63 的两个计数。我知道这是不正确的,至少就字数而言。这是使用替代计数过程的结果吗?如果是这样,我该如何纠正这个问题?
This regex thing is getting old. :(
Yet another question:
I need to count the number of words and number of sentences in a paragraph. The code I tried using was this:
my $sentencecount = $file =~ s/((^|\s)\S).*?(\.|\?|\!)/$1/g;
my $count = $file =~ s/((^|\s)\S)/$2/g;
print "Input file $ARGV[1] contains $sentencecount sentences and $count words.";
My results return 63 for both counts. I know this is incorrect, as least as far as word count goes. Is this a result of using a substitution for the counting process? If so, how do I correct this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议查看 perl
split
函数,请参阅perlfunc(1)
:I suggest looking into the perl
split
function, seeperlfunc(1)
:像我们这里一样在标量上下文中进行
//g
匹配可以避免构建所有单词或所有句子的巨大列表,如果文件很大,则可以节省内存。句子计数代码会将任意数量的句末定界符计为单个句子(例如Hello...world!
将计为 2 个句子。)Doing
//g
matching in scalar context as we are here avoids building an enormous list of all words or all sentences, saving on memory if the file is large. The sentence counting code will count any number of end-of-sentence delimiters as a single sentence (e.g.Hello... world!
will be counted as 2 sentences.)这会从
$file
获取句子和字符的计数This gets the count of sentences and chars from
$file