javascript中的截断单词函数(研究dojo的代码)

发布于 2024-07-24 02:27:29 字数 1342 浏览 3 评论 0原文

“截断单词”将采用一串单词并仅返回前 10 个单词。

在dojo(javascript库)中他们有这样一个函数,其代码是这样的:

truncatewords: function(value, arg){
    // summary: Truncates a string after a certain number of words
   // arg: Integer
   //              Number of words to truncate after
   arg = parseInt(arg);
   if(!arg){
           return value;
   }

   for(var i = 0, j = value.length, count = 0, current, last; i < value.length; i++){
           current = value.charAt(i);
           if(dojox.dtl.filter.strings._truncatewords.test(last)){
                   if(!dojox.dtl.filter.strings._truncatewords.test(current)){
                           ++count;
                           if(count == arg){
                                   return value.substring(0, j + 1);
                           }
                   }
           }else if(!dojox.dtl.filter.strings._truncatewords.test(current)){
                   j = i;
           }
           last = current;
   }
   return value;
}

其中dojox.dtl.filter.strings._truncatewords./(&.*?;|< .*?>|(\w[\w-]*))/g

为什么不这样写:

function truncate(value,arg) {
    var value_arr = value.split(' ');
    if(arg < value_arr.length) {
        value = value_arr.slice(0,arg).join(' '); }
    return value;
}

有什么区别?

A 'truncate words' would take a string of words and return only the first, let's say, 10 words.

In dojo (javascript library) they have such a function, whose code is this:

truncatewords: function(value, arg){
    // summary: Truncates a string after a certain number of words
   // arg: Integer
   //              Number of words to truncate after
   arg = parseInt(arg);
   if(!arg){
           return value;
   }

   for(var i = 0, j = value.length, count = 0, current, last; i < value.length; i++){
           current = value.charAt(i);
           if(dojox.dtl.filter.strings._truncatewords.test(last)){
                   if(!dojox.dtl.filter.strings._truncatewords.test(current)){
                           ++count;
                           if(count == arg){
                                   return value.substring(0, j + 1);
                           }
                   }
           }else if(!dojox.dtl.filter.strings._truncatewords.test(current)){
                   j = i;
           }
           last = current;
   }
   return value;
}

where dojox.dtl.filter.strings._truncatewords. is /(&.*?;|<.*?>|(\w[\w-]*))/g

Why isn't this written like so:

function truncate(value,arg) {
    var value_arr = value.split(' ');
    if(arg < value_arr.length) {
        value = value_arr.slice(0,arg).join(' '); }
    return value;
}

and what are the differences?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

埋情葬爱 2024-07-31 02:27:29

您的分割应考虑到任何空白字符序列都是单词分隔符。 您应该按照 \s+ 之类的正则表达式进行拆分。

但除此之外,dojo 的代码似乎也将实体和 xml 标签作为单词。 如果您知道字符串中没有这样的东西,那么您的实现可能会成功。 请注意,您的切片不会超出找到的单词数,这可能需要进行一些检查。

Your split should take into account that any sequence of blank characters is a word separator. You should split on a regexp like \s+.

But other than that, it seems dojo's code takes entities and xml tags as words as well. If you know you don't have such things in your string, your implementation might do the trick. Be careful though that your slice does not go beyond the number of words found, this might need a little check.

墨落成白 2024-07-31 02:27:29

您正在查看的代码来自 dtl 库,该库用于支持 django 模板语言。 (http://www.dojotoolkit .org/book/dojo-book-0-9/part-5-dojox/dojox-dtl)。 我确信其中的代码不仅仅是进行直接字符串分割,而是解析他们正在使用的模板。

另外,看看该正则表达式,他们处理的场景不仅仅是空格......例如,<.*?> 将导致开始和结束标记中包含的任何单词组被视为“单词”。

The code you're looking at is from the dtl library, which is for supporting the django templating language. (http://www.dojotoolkit.org/book/dojo-book-0-9/part-5-dojox/dojox-dtl). I'm sure the code in there is not for just doing a straight string split, but rather parsing the templates they're using.

Also, looking at that regex, they're handling a lot more scenarios than just spaces...for example, the <.*?> will cause any group of words enclosed in opening and closing tags to be considered a "word".

鸠书 2024-07-31 02:27:29
  1. function 声明:这可能是一个 javascript 对象,使用 function_name: function(params) {... 有助于使 javascript 脱离全局范围。
  2. 通过检查 arg 变量,他们确保传递了一个整数。 使用 parseInt() 将允许 10"10" 被接受。
  3. 此方法可以通过所使用的正则表达式处理比空格更多的分隔符。
  4. 此代码对于数组溢出是安全的。 如果 value 中只有 8 个单词,您就无法数到 10。 否则,您会得到数组越界或对象不存在错误。
  1. function declaration: this is probably a javascript object, and using function_name: function(params) {... helps keep javascript out of the global scope.
  2. By checking the arg variable, they're ensuring that an integer was passed. Using parseInt() will allow both 10 and "10" to be accepted.
  3. This method can handle more delimiters than spaces by the regex being used.
  4. This code is safe for array overflow. You can't count to 10 if there are only 8 words in value. Otherwise, you'd get an array out of bounds or object does not exist error.
总以为 2024-07-31 02:27:29

正则表达式由 3 部分组成

  1. &.*?; 将匹配字符实体(如 &)
  2. <.*?> 将匹配尖括号
  3. (\w[\w-]*) 中的内容将匹配以 [a-zA-Z0-9_] 开头的字符串,后跟相同的破折号,

它不仅仅是在空格上分割。 它正在寻找它认为可能是单词一部分的东西,一旦发现不是单词的一部分,它就会增加字数。

它应该采用逗号或竖线分隔的列表,并且可以像空格分隔的列表一样工作。

the regex is 3 parts

  1. &.*?; will match character entities (like &)
  2. <.*?> will match thing in angle brackets
  3. (\w[\w-]*) will match strings starting with [a-zA-Z0-9_] and followed by the same with a dash

it's not just spliting on space. It's looking for things it thinks could be part of a word, and once it finds something that is not, it ups the word count.

It should take a comma or pipe seperated list and work as well as a space seperated list.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文