Rails 中的字数统计?

发布于 2024-08-19 04:36:22 字数 112 浏览 6 评论 0原文

假设我有一个带有标题和正文的博客模型。如何显示正文中的字数和标题中的字符数?我希望输出是这样的

标题:Lorem 正文:Lorem Lorem Lorem

这篇文章的字数为 3。

Say I have a blog model with Title and Body. How I do show the number of words in Body and characters in Title? I want the output to be something like this

Title: Lorem
Body: Lorem Lorem Lorem

This post has word count of 3.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

拥抱影子 2024-08-26 04:36:22
"Lorem Lorem Lorem".scan(/\w+/).size
=> 3

更新:如果您需要将摇滚乐作为一个单词进行匹配,您可以这样做

"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4
"Lorem Lorem Lorem".scan(/\w+/).size
=> 3

UPDATE: if you need to match rock-and-roll as one word, you could do like

"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4
楠木可依 2024-08-26 04:36:22

还:

"Lorem Lorem Lorem".split.size
=> 3

Also:

"Lorem Lorem Lorem".split.size
=> 3
森末i 2024-08-26 04:36:22

如果您对性能感兴趣,我编写了一个快速基准测试:

require 'benchmark'
require 'bigdecimal/math'
require 'active_support/core_ext/string/filters'

# Where "shakespeare" is the full text of The Complete Works of William Shakespeare...

puts 'Benchmarking shakespeare.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.squish.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.split.size } }
puts 'Benchmarking shakespeare.squish.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.split.size } }

结果:

Benchmarking shakespeare.scan(/\w+/).size x50
 13.980000   0.240000  14.220000 ( 14.234612)
Benchmarking shakespeare.squish.scan(/\w+/).size x50
 40.850000   0.270000  41.120000 ( 41.109643)
Benchmarking shakespeare.split.size x50
  5.820000   0.210000   6.030000 (  6.028998)
Benchmarking shakespeare.squish.split.size x50
 31.000000   0.260000  31.260000 ( 31.268706)

换句话说,squish 对于 Very Large Strings™ 来说很慢。除此之外,split 速度更快(如果您不使用 squish,速度是两倍)。

If you're interested in performance, I wrote a quick benchmark:

require 'benchmark'
require 'bigdecimal/math'
require 'active_support/core_ext/string/filters'

# Where "shakespeare" is the full text of The Complete Works of William Shakespeare...

puts 'Benchmarking shakespeare.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.squish.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.split.size } }
puts 'Benchmarking shakespeare.squish.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.split.size } }

The results:

Benchmarking shakespeare.scan(/\w+/).size x50
 13.980000   0.240000  14.220000 ( 14.234612)
Benchmarking shakespeare.squish.scan(/\w+/).size x50
 40.850000   0.270000  41.120000 ( 41.109643)
Benchmarking shakespeare.split.size x50
  5.820000   0.210000   6.030000 (  6.028998)
Benchmarking shakespeare.squish.split.size x50
 31.000000   0.260000  31.260000 ( 31.268706)

In other words, squish is slow with Very Large Strings™. Other than that, split is faster (twice as fast if you're not using squish).

爱你不解释 2024-08-26 04:36:22

这里的答案有几个问题:

  1. 它们不考虑 utf 和 unicode 字符(变音符号):áâãêü 等...
  2. 它们不考虑撇号和连字符。因此,Joe's 将被视为两个单词 Joe's,这显然是不正确的。 twenty-two 也是如此,它是一个复合词。

这样的方法效果更好并解决了这些问题:

foo.scan(/[\p{Alpha}\-']+/)

您可能需要查看我的字数统计宝石。它可以计算单词、单词的出现次数、长度以及其他一些内容。它也有很好的记录。

counter = WordsCounted::Counter.new(post.body)
counter.word_count #=> 3
counter.most_occuring_words #=> [["lorem", 3]]
# This also takes into capitalisation into account.
# So `Hello` and `hello` are counted as the same word.

The answers here have a couple of issues:

  1. They don't account for utf and unicode chars (diacritics): áâãêü etc...
  2. They don't account for apostrophes and hyphens. So Joe's will be considered two words Joe and 's which is obviously incorrect. As will twenty-two, which is a single compound word.

Something like this works better and account for those issues:

foo.scan(/[\p{Alpha}\-']+/)

You might want to look at my Words Counted gem. It allows to count words, their occurrences, lengths, and a couple of other things. It's also very well documented.

counter = WordsCounted::Counter.new(post.body)
counter.word_count #=> 3
counter.most_occuring_words #=> [["lorem", 3]]
# This also takes into capitalisation into account.
# So `Hello` and `hello` are counted as the same word.
音盲 2024-08-26 04:36:22
"caçapão adipisicing elit".scan(/[\w-]+/).size 
=> 5

但正如我们所看到的,该句子只有 3 个单词。该问题与重音字符有关,因为正则表达式 \w 不将它们视为单词字符 [A-Za-z0-9_]。

改进的解决方案是

I18n.transliterate("caçapão adipisicing elit").scan(/[\w-]+/).size
=> 3
"caçapão adipisicing elit".scan(/[\w-]+/).size 
=> 5

But as we can see, the sentence has only 3 words. The problem is related with the accented characters, because the regex \w doesn't consider them as a word character [A-Za-z0-9_].

An improved solution would be

I18n.transliterate("caçapão adipisicing elit").scan(/[\w-]+/).size
=> 3
波浪屿的海角声 2024-08-26 04:36:22
"Lorem Lorem Lorem".scan(/\S+/).size
=> 3
"Lorem Lorem Lorem".scan(/\S+/).size
=> 3
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文