使用正则表达式在 Perl 中计算句子/单词

发布于 2024-10-15 04:33:06 字数 348 浏览 6 评论 0原文

这个正则表达式的东西已经过时了。 :( 还有一个问题：我需要计算段落中的单词数和句子数。我尝试使用的代码是这样的：

my $sentencecount = $file =~ s/((^|\s)\S).*?(\.|\?|\!)/$1/g;
my $count = $file =~ s/((^|\s)\S)/$2/g;
print "Input file $ARGV[1] contains $sentencecount sentences and $count words.";

我的结果返回 63 的两个计数。我知道这是不正确的，至少就字数而言。这是使用替代计数过程的结果吗？如果是这样，我该如何纠正这个问题？

原文

This regex thing is getting old. :(
Yet another question:
I need to count the number of words and number of sentences in a paragraph. The code I tried using was this:

my $sentencecount = $file =~ s/((^|\s)\S).*?(\.|\?|\!)/$1/g;
my $count = $file =~ s/((^|\s)\S)/$2/g;
print "Input file $ARGV[1] contains $sentencecount sentences and $count words.";

My results return 63 for both counts. I know this is incorrect, as least as far as word count goes. Is this a result of using a substitution for the counting process? If so, how do I correct this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦在深巷 2024-10-22 04:33:06

我建议查看 perl split 函数，请参阅 perlfunc(1)：

           If EXPR is omitted, splits the $_ string.  If PATTERN is also
           omitted, splits on whitespace (after skipping any leading
           whitespace).  Anything matching PATTERN is taken to be a
           delimiter separating the fields.  (Note that the delimiter may
           be longer than one character.)

I suggest looking into the perl split function, see perlfunc(1):

           If EXPR is omitted, splits the $_ string.  If PATTERN is also
           omitted, splits on whitespace (after skipping any leading
           whitespace).  Anything matching PATTERN is taken to be a
           delimiter separating the fields.  (Note that the delimiter may
           be longer than one character.)

回复收藏 0 原文

浅紫色的梦幻 2024-10-22 04:33:06

my $wordCount = 0;
++$wordCount while $file =~ /\S+/g;

my $sentenceCount = 0;
++$sentenceCount while $file =~ /[.!?]+/g;

像我们这里一样在标量上下文中进行 //g 匹配可以避免构建所有单词或所有句子的巨大列表，如果文件很大，则可以节省内存。句子计数代码会将任意数量的句末定界符计为单个句子（例如 Hello...world! 将计为 2 个句子。）

my $wordCount = 0;
++$wordCount while $file =~ /\S+/g;

my $sentenceCount = 0;
++$sentenceCount while $file =~ /[.!?]+/g;

Doing //g matching in scalar context as we are here avoids building an enormous list of all words or all sentences, saving on memory if the file is large. The sentence counting code will count any number of end-of-sentence delimiters as a single sentence (e.g. Hello... world! will be counted as 2 sentences.)

回复收藏 0 原文

奢欲 2024-10-22 04:33:06

这会从 $file 获取句子和字符的计数

$file="This is praveen worki67ng in RL websolutions";
my $count = () = $file =~ /\S+/g;
my $counter = () = $file =~ /\S/g;

This gets the count of sentences and chars from $file

$file="This is praveen worki67ng in RL websolutions";
my $count = () = $file =~ /\S+/g;
my $counter = () = $file =~ /\S/g;

回复收藏 0 原文

~没有更多了~

关于作者

友情链接

文江博客

使用正则表达式在 Perl 中计算句子/单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

使用正则表达式在 Perl 中计算句子/单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。