javascript中的截断单词函数(研究dojo的代码)
“截断单词”将采用一串单词并仅返回前 10 个单词。
在dojo(javascript库)中他们有这样一个函数,其代码是这样的:
truncatewords: function(value, arg){
// summary: Truncates a string after a certain number of words
// arg: Integer
// Number of words to truncate after
arg = parseInt(arg);
if(!arg){
return value;
}
for(var i = 0, j = value.length, count = 0, current, last; i < value.length; i++){
current = value.charAt(i);
if(dojox.dtl.filter.strings._truncatewords.test(last)){
if(!dojox.dtl.filter.strings._truncatewords.test(current)){
++count;
if(count == arg){
return value.substring(0, j + 1);
}
}
}else if(!dojox.dtl.filter.strings._truncatewords.test(current)){
j = i;
}
last = current;
}
return value;
}
其中dojox.dtl.filter.strings._truncatewords.
是/(&.*?;|< .*?>|(\w[\w-]*))/g
为什么不这样写:
function truncate(value,arg) {
var value_arr = value.split(' ');
if(arg < value_arr.length) {
value = value_arr.slice(0,arg).join(' '); }
return value;
}
有什么区别?
A 'truncate words' would take a string of words and return only the first, let's say, 10 words.
In dojo (javascript library) they have such a function, whose code is this:
truncatewords: function(value, arg){
// summary: Truncates a string after a certain number of words
// arg: Integer
// Number of words to truncate after
arg = parseInt(arg);
if(!arg){
return value;
}
for(var i = 0, j = value.length, count = 0, current, last; i < value.length; i++){
current = value.charAt(i);
if(dojox.dtl.filter.strings._truncatewords.test(last)){
if(!dojox.dtl.filter.strings._truncatewords.test(current)){
++count;
if(count == arg){
return value.substring(0, j + 1);
}
}
}else if(!dojox.dtl.filter.strings._truncatewords.test(current)){
j = i;
}
last = current;
}
return value;
}
where dojox.dtl.filter.strings._truncatewords.
is /(&.*?;|<.*?>|(\w[\w-]*))/g
Why isn't this written like so:
function truncate(value,arg) {
var value_arr = value.split(' ');
if(arg < value_arr.length) {
value = value_arr.slice(0,arg).join(' '); }
return value;
}
and what are the differences?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您的分割应考虑到任何空白字符序列都是单词分隔符。 您应该按照
\s+
之类的正则表达式进行拆分。但除此之外,dojo 的代码似乎也将实体和 xml 标签作为单词。 如果您知道字符串中没有这样的东西,那么您的实现可能会成功。 请注意,您的切片不会超出找到的单词数,这可能需要进行一些检查。
Your split should take into account that any sequence of blank characters is a word separator. You should split on a regexp like
\s+
.But other than that, it seems dojo's code takes entities and xml tags as words as well. If you know you don't have such things in your string, your implementation might do the trick. Be careful though that your slice does not go beyond the number of words found, this might need a little check.
您正在查看的代码来自 dtl 库,该库用于支持 django 模板语言。 (http://www.dojotoolkit .org/book/dojo-book-0-9/part-5-dojox/dojox-dtl)。 我确信其中的代码不仅仅是进行直接字符串分割,而是解析他们正在使用的模板。
另外,看看该正则表达式,他们处理的场景不仅仅是空格......例如,<.*?> 将导致开始和结束标记中包含的任何单词组被视为“单词”。
The code you're looking at is from the dtl library, which is for supporting the django templating language. (http://www.dojotoolkit.org/book/dojo-book-0-9/part-5-dojox/dojox-dtl). I'm sure the code in there is not for just doing a straight string split, but rather parsing the templates they're using.
Also, looking at that regex, they're handling a lot more scenarios than just spaces...for example, the <.*?> will cause any group of words enclosed in opening and closing tags to be considered a "word".
function
声明:这可能是一个 javascript 对象,使用function_name: function(params) {...
有助于使 javascript 脱离全局范围。parseInt()
将允许10
和"10"
被接受。value
中只有 8 个单词,您就无法数到 10。 否则,您会得到数组越界或对象不存在错误。function
declaration: this is probably a javascript object, and usingfunction_name: function(params) {...
helps keep javascript out of the global scope.arg
variable, they're ensuring that an integer was passed. UsingparseInt()
will allow both10
and"10"
to be accepted.value
. Otherwise, you'd get an array out of bounds or object does not exist error.正则表达式由 3 部分组成
它不仅仅是在空格上分割。 它正在寻找它认为可能是单词一部分的东西,一旦发现不是单词的一部分,它就会增加字数。
它应该采用逗号或竖线分隔的列表,并且可以像空格分隔的列表一样工作。
the regex is 3 parts
it's not just spliting on space. It's looking for things it thinks could be part of a word, and once it finds something that is not, it ups the word count.
It should take a comma or pipe seperated list and work as well as a space seperated list.