当前位置：文江博客话题详情

Rails 中的字数统计？

发布于 2024-08-19 04:36:22 字数 112 浏览 10 评论 0原文

假设我有一个带有标题和正文的博客模型。如何显示正文中的字数和标题中的字符数？我希望输出是这样的

标题：Lorem 正文：Lorem Lorem Lorem

这篇文章的字数为 3。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拥抱影子 2024-08-26 04:36:22

"Lorem Lorem Lorem".scan(/\w+/).size
=> 3

更新：如果您需要将摇滚乐作为一个单词进行匹配，您可以这样做

"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4

"Lorem Lorem Lorem".scan(/\w+/).size
=> 3

UPDATE: if you need to match rock-and-roll as one word, you could do like

"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4

回复收藏 0 原文

楠木可依 2024-08-26 04:36:22

还：

"Lorem Lorem Lorem".split.size
=> 3

Also:

"Lorem Lorem Lorem".split.size
=> 3

回复收藏 0 原文

森末i 2024-08-26 04:36:22

如果您对性能感兴趣，我编写了一个快速基准测试：

require 'benchmark'
require 'bigdecimal/math'
require 'active_support/core_ext/string/filters'

# Where "shakespeare" is the full text of The Complete Works of William Shakespeare...

puts 'Benchmarking shakespeare.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.squish.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.split.size } }
puts 'Benchmarking shakespeare.squish.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.split.size } }

结果：

Benchmarking shakespeare.scan(/\w+/).size x50
 13.980000   0.240000  14.220000 ( 14.234612)
Benchmarking shakespeare.squish.scan(/\w+/).size x50
 40.850000   0.270000  41.120000 ( 41.109643)
Benchmarking shakespeare.split.size x50
  5.820000   0.210000   6.030000 (  6.028998)
Benchmarking shakespeare.squish.split.size x50
 31.000000   0.260000  31.260000 ( 31.268706)

换句话说，squish 对于 Very Large Strings™ 来说很慢。除此之外，split 速度更快（如果您不使用 squish，速度是两倍）。

If you're interested in performance, I wrote a quick benchmark:

require 'benchmark'
require 'bigdecimal/math'
require 'active_support/core_ext/string/filters'

# Where "shakespeare" is the full text of The Complete Works of William Shakespeare...

puts 'Benchmarking shakespeare.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.squish.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.split.size } }
puts 'Benchmarking shakespeare.squish.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.split.size } }

The results:

Benchmarking shakespeare.scan(/\w+/).size x50
 13.980000   0.240000  14.220000 ( 14.234612)
Benchmarking shakespeare.squish.scan(/\w+/).size x50
 40.850000   0.270000  41.120000 ( 41.109643)
Benchmarking shakespeare.split.size x50
  5.820000   0.210000   6.030000 (  6.028998)
Benchmarking shakespeare.squish.split.size x50
 31.000000   0.260000  31.260000 ( 31.268706)

In other words, squish is slow with Very Large Strings™. Other than that, split is faster (twice as fast if you're not using squish).

回复收藏 0 原文

爱你不解释 2024-08-26 04:36:22

这里的答案有几个问题：

它们不考虑 utf 和 unicode 字符（变音符号）：áâãêü 等...
它们不考虑撇号和连字符。因此，Joe's 将被视为两个单词 Joe 和 's，这显然是不正确的。 twenty-two 也是如此，它是一个复合词。

这样的方法效果更好并解决了这些问题：

foo.scan(/[\p{Alpha}\-']+/)

您可能需要查看我的字数统计宝石。它可以计算单词、单词的出现次数、长度以及其他一些内容。它也有很好的记录。

counter = WordsCounted::Counter.new(post.body)
counter.word_count #=> 3
counter.most_occuring_words #=> [["lorem", 3]]
# This also takes into capitalisation into account.
# So `Hello` and `hello` are counted as the same word.

The answers here have a couple of issues:

They don't account for utf and unicode chars (diacritics): áâãêü etc...
They don't account for apostrophes and hyphens. So Joe's will be considered two words Joe and 's which is obviously incorrect. As will twenty-two, which is a single compound word.

Something like this works better and account for those issues:

foo.scan(/[\p{Alpha}\-']+/)

You might want to look at my Words Counted gem. It allows to count words, their occurrences, lengths, and a couple of other things. It's also very well documented.

counter = WordsCounted::Counter.new(post.body)
counter.word_count #=> 3
counter.most_occuring_words #=> [["lorem", 3]]
# This also takes into capitalisation into account.
# So `Hello` and `hello` are counted as the same word.

回复收藏 0 原文

音盲 2024-08-26 04:36:22

"caçapão adipisicing elit".scan(/[\w-]+/).size 
=> 5

但正如我们所看到的，该句子只有 3 个单词。该问题与重音字符有关，因为正则表达式 \w 不将它们视为单词字符 [A-Za-z0-9_]。

改进的解决方案是

I18n.transliterate("caçapão adipisicing elit").scan(/[\w-]+/).size
=> 3

"caçapão adipisicing elit".scan(/[\w-]+/).size 
=> 5

But as we can see, the sentence has only 3 words. The problem is related with the accented characters, because the regex \w doesn't consider them as a word character [A-Za-z0-9_].

An improved solution would be

I18n.transliterate("caçapão adipisicing elit").scan(/[\w-]+/).size
=> 3

回复收藏 0 原文

波浪屿的海角声 2024-08-26 04:36:22

"Lorem Lorem Lorem".scan(/\S+/).size
=> 3

"Lorem Lorem Lorem".scan(/\S+/).size
=> 3

回复收藏 0 原文

~没有更多了~

关于作者

浪漫人生路

暂无简介

0 文章

0 评论

25 人气

关注发私信

胡图图

文章 0 评论 0

关注

zt006

文章 0 评论 0

关注

z祗昰~

文章 0 评论 0

关注

冰葑

文章 0 评论 0

关注

野の

文章 0 评论 0

关注

天空

文章 0 评论 0

友情链接

文江博客

Rails 中的字数统计？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

Rails 中的字数统计？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。