验证字段不超过 n 个单词的最快方法是什么?
我有一个 Ruby-on-Rails 模型:
class Candidate < ActiveRecord::Base
validates_presence_of :application_essay
validate :validate_length_of_application_essay
protected
def validate_length_of_application_essay
return if application_essay.blank? # don't add a second error message if they didn't fill it out
errors.add(:application_essay, :too_long), unless ...
end
end
在不使用 C 的情况下,检查 application_essay
包含不超过 500 个单词的最快方法是什么?您可以假设大多数论文至少有 200 个单词,不太可能超过 5000 个单词,并且是英语(或有时称为“商业英语”的伪英语)。您还可以将任何您想要的内容分类为“单词”,只要您的分类对于典型用户来说是显而易见的。 (注意:这里不是讨论什么是“典型用户”的地方:))
I have a Ruby-on-Rails model:
class Candidate < ActiveRecord::Base
validates_presence_of :application_essay
validate :validate_length_of_application_essay
protected
def validate_length_of_application_essay
return if application_essay.blank? # don't add a second error message if they didn't fill it out
errors.add(:application_essay, :too_long), unless ...
end
end
Without dropping into C, what is the fastest way to check that the application_essay
contains no more than 500 words? You can assume that most essays will be at least 200 words, are unlikely to be more than 5000 words, and are in English (or the pseudo-English sometimes called "business-ese"). You can also classify anything you want as a "word" as long as your classification would be immediately obvious to a typical user. (NB: this is not the place to debate what a "typical user" is :) )
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
在 Rails3 中,使用
:tokenizer
和lambda
方法也可以工作。它可能不是最快的,但肯定是最干净的方式。
In Rails3 using a
:tokenizer
with alambda
method works too.It may not be the fastest, but is certainly the cleanest way.
抱歉,您不会比线性搜索更快(除非这是针对某种文本编辑器,并且您可以增量跟踪)
You're not going to get any faster than a linear search, sorry (unless this is for some sort of text-editor, and you can keep track incrementally)
我只想使用类似的内容:
您看到什么性能问题?
一个 500 字左右的字符串应该不是什么大问题。
I would just use something like:
What performance issue are you seeing?
A string a 500 or so words shouldn't be much of a problem.
您可以估计单词的典型大小并通过除法猜测单词的数量。
这里有一些提示:http://blogamundo.net/lab/wordlengths/
你可以尝试像5.1并通过运行一些测试来查看您的准确性。
很可能除以 6.1,因为有空格。
请记住,您会假设您的文本不仅仅是大量的空格或其他内容。
好吧,但如果您真的只是有兴趣确保它不超过 x 个单词。您可以在 x 上尝试一个较小的数字,也许是 5,如果它少于 x 乘以 5 个字符,您可以非常确定它没有超过 x 个单词。
因此,您最好按照其他答案中所述进行线性搜索。线性搜索根本没有那么糟糕。这仅取决于您想做什么。
You could estimate the typical size of a word and guess the amount of words by dividing.
some hints here:http://blogamundo.net/lab/wordlengths/
You could try like 5.1 and see how accurate you are by running a few tests.
Well probably dividing by 6.1 since you have whitespaces.
Keep in mind you would be assuming that your text is not just huge amount of white spaces or something.
Well but if your really just interested to make sure it has not more than x words. You could try a low number on x maybe 5 and if it has less then x times 5 characters you can be pretty sure it does not have more then x words.
So you are maybe better off doing a linear search as stated in the other answers. A linear search isnt that bad at all. It just depends on what you want to do.
有一个插件可以实现这一点,不过我自己还没用过:)
http:// code.google.com/p/validates-word-count/
该插件将所有相邻的“单词字符”切换为单个字符,然后删除所有非单词字符并对其进行计数。不确定这是否是最快的。
There's a plugin for that, havn't used it myself tho :)
http://code.google.com/p/validates-word-count/
That plugin switches all adjacent "word characters" into a single character, then removes all non-word characters and count them. Not sure if it's the fastest tho.
这是一篇不错的文章,您可能会喜欢
http://dotnetperls.com/word-count
Here is a nice article that you might like
http://dotnetperls.com/word-count