验证字段不超过 n 个单词的最快方法是什么?

发布于 2024-08-31 08:29:35 字数 619 浏览 2 评论 0原文

我有一个 Ruby-on-Rails 模型:

class Candidate < ActiveRecord::Base
  validates_presence_of :application_essay
  validate :validate_length_of_application_essay

  protected

  def validate_length_of_application_essay
    return if application_essay.blank? # don't add a second error message if they didn't fill it out
    errors.add(:application_essay, :too_long), unless ...
  end
end

在不使用 C 的情况下,检查 application_essay 包含不超过 500 个单词的最快方法是什么?您可以假设大多数论文至少有 200 个单词,不太可能超过 5000 个单词,并且是英语(或有时称为“商业英语”的伪英语)。您还可以将任何您想要的内容分类为“单词”,只要您的分类对于典型用户来说是显而易见的。 (注意:这里不是讨论什么是“典型用户”的地方:))

I have a Ruby-on-Rails model:

class Candidate < ActiveRecord::Base
  validates_presence_of :application_essay
  validate :validate_length_of_application_essay

  protected

  def validate_length_of_application_essay
    return if application_essay.blank? # don't add a second error message if they didn't fill it out
    errors.add(:application_essay, :too_long), unless ...
  end
end

Without dropping into C, what is the fastest way to check that the application_essay contains no more than 500 words? You can assume that most essays will be at least 200 words, are unlikely to be more than 5000 words, and are in English (or the pseudo-English sometimes called "business-ese"). You can also classify anything you want as a "word" as long as your classification would be immediately obvious to a typical user. (NB: this is not the place to debate what a "typical user" is :) )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

你曾走过我的故事 2024-09-07 08:29:35

在 Rails3 中,使用 :tokenizerlambda 方法也可以工作。

validates_length_of :essay, :minimum => 100, :too_short => "Your essay must be at least 100 words."), :tokenizer => lambda {|str| str.scan(/\w+/) }

它可能不是最快的,但肯定是最干净的方式。

In Rails3 using a :tokenizer with a lambda method works too.

validates_length_of :essay, :minimum => 100, :too_short => "Your essay must be at least 100 words."), :tokenizer => lambda {|str| str.scan(/\w+/) }

It may not be the fastest, but is certainly the cleanest way.

殊姿 2024-09-07 08:29:35

抱歉,您不会比线性搜索更快(除非这是针对某种文本编辑器,并且您可以增量跟踪)

You're not going to get any faster than a linear search, sorry (unless this is for some sort of text-editor, and you can keep track incrementally)

无声静候 2024-09-07 08:29:35

我只想使用类似的内容:

string.split(" ").length <= 500

您看到什么性能问题?
一个 500 字左右的字符串应该不是什么大问题。

I would just use something like:

string.split(" ").length <= 500

What performance issue are you seeing?
A string a 500 or so words shouldn't be much of a problem.

﹎☆浅夏丿初晴 2024-09-07 08:29:35

您可以估计单词的典型大小并通过除法猜测单词的数量。

这里有一些提示:http://blogamundo.net/lab/wordlengths/

你可以尝试像5.1并通过运行一些测试来查看您的准确性。

很可能除以 6.1,因为有空格。

请记住,您会假设您的文本不仅仅是大量的空格或其他内容。
好吧,但如果您真的只是有兴趣确保它不超过 x 个单词。您可以在 x 上尝试一个较小的数字,也许是 5,如果它少于 x 乘以 5 个字符,您可以非常确定它没有超过 x 个单词。

因此,您最好按照其他答案中所述进行线性搜索。线性搜索根本没有那么糟糕。这仅取决于您想做什么。

You could estimate the typical size of a word and guess the amount of words by dividing.

some hints here:http://blogamundo.net/lab/wordlengths/

You could try like 5.1 and see how accurate you are by running a few tests.

Well probably dividing by 6.1 since you have whitespaces.

Keep in mind you would be assuming that your text is not just huge amount of white spaces or something.
Well but if your really just interested to make sure it has not more than x words. You could try a low number on x maybe 5 and if it has less then x times 5 characters you can be pretty sure it does not have more then x words.

So you are maybe better off doing a linear search as stated in the other answers. A linear search isnt that bad at all. It just depends on what you want to do.

大海や 2024-09-07 08:29:35

有一个插件可以实现这一点,不过我自己还没用过:)

http:// code.google.com/p/validates-word-count/

该插件将所有相邻的“单词字符”切换为单个字符,然后删除所有非单词字符并对其进行计数。不确定这是否是最快的。

There's a plugin for that, havn't used it myself tho :)

http://code.google.com/p/validates-word-count/

That plugin switches all adjacent "word characters" into a single character, then removes all non-word characters and count them. Not sure if it's the fastest tho.

养猫人 2024-09-07 08:29:35

这是一篇不错的文章,您可能会喜欢

http://dotnetperls.com/word-count

Here is a nice article that you might like

http://dotnetperls.com/word-count

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文