编写一个统计词频的书签

发布于 2024-09-14 18:19:33 字数 173 浏览 6 评论 0原文

我想创建一个书签,计算网页上的所有文本,然后在绝对定位的 div 中显示从最多到最少的结果。

我所做的每次谷歌搜索都会讨论计算表单或文本区域或已知 div id 中的单词总数。那不是我想要的。我想要每个 /w 在整个网页上出现的次数。

我知道足够的 javascript 知道我不知道如何做到这一点。

I want to create a bookmarklet that counts up all the text on a webpage and then displays the results from most to least it in a absolutely positioned div.

Every google search i've done talks about counting the total number of words in a form or text area or known div id. That's not what i want. I want the number of times each /w appears on the entire webpage.

I know enough javascript to know that i don't know how to do this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

佞臣 2024-09-21 18:19:33

像这样的东西应该有效:

function countWordFrequency() {
  var freq={};
  // Traverse the DOM looking for text nodes.
  recurseTextNodes(function(textNode) {
    // Split the text into words, removing punctuation.
    var words = textNode.data.replace(/[^\w\s]/g, '').split(/\s+/)
      , len = words.length;
    // Count the word frequency.
    for (var i=0; i<len; i++) {
      // if (freq[words[i]]) { bug if one of the words is "constructor"!
      if (typeof freq[words[i]] === 'number') {
        freq[words[i]] += 1;
      } else  {
        freq[words[i]] = 1;
      }
    }
  });
  return freq;
}

这个解决方案可能过于简单,因为它删除了标点符号并解析单词,但应该演示这个想法。另外,recurseTextNodes 函数留给读者作为练习 =)。如何将此例程存储为书签(尤其是如何向最终用户显示结果)也有影响,但我再次假设您已经了解如何做到这一点。

Something like this should work:

function countWordFrequency() {
  var freq={};
  // Traverse the DOM looking for text nodes.
  recurseTextNodes(function(textNode) {
    // Split the text into words, removing punctuation.
    var words = textNode.data.replace(/[^\w\s]/g, '').split(/\s+/)
      , len = words.length;
    // Count the word frequency.
    for (var i=0; i<len; i++) {
      // if (freq[words[i]]) { bug if one of the words is "constructor"!
      if (typeof freq[words[i]] === 'number') {
        freq[words[i]] += 1;
      } else  {
        freq[words[i]] = 1;
      }
    }
  });
  return freq;
}

This solution might be overly simple in the way that it removes punctuation and parses words but should demonstrate the idea. Also the recurseTextNodes function is left as an exercise to the reader =). There are also implications of how to store this routine as a bookmarklet (esp. how to display the results to the end-user) but again, I'll assume you've got some idea of how to do that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文