存储词频数据

发布于 2024-12-11 17:53:34 字数 529 浏览 3 评论 0原文

我正在尝试使用 Mongo 存储词频数据。每个单词都需要与一个用户相关联,这样我就可以计算出一个人使用每个单词的频率。目前我的单词集看起来像这样:

{'Hello':3, 'user_id':1}

这显然只能在“一对一”的基础上起作用,而且不好。

我正在尝试找出如何最好地使其成为用户和单词之间的“一对多”关系。我是否会将用户关系存储在我的单词集合中,如下所示:

{'word':"Hello", 'users':[{'id':1, 'count':4},{'id':2, 'count':10}]}

或者我是否会将单词计数附加到用户集合中?

{'id':1, 'username':'SomeUser', 'words':[{'Hello':4}]}

第二种方法的明显缺点是不同的用户将使用相同的单词,因此拥有单个单词集合将有助于缩小数据大小。

谁能告诉我我应该在这里做什么?文档中是否有我可能忽略的方法?

I am trying to store word frequency data using Mongo. Each word needs to be associated to a user so I can calculate how often an individual uses each word. Currently my words collection looks like this:

{'Hello':3, 'user_id':1}

Which obviously only works on a 'One To One' basis and is no good.

I am trying to work out how best to make this a 'One To Many' relationshop between the user and the words. Would I store the user relationship in my words collection like so:

{'word':"Hello", 'users':[{'id':1, 'count':4},{'id':2, 'count':10}]}

Or would I attach the word counts to the user collection instead?

{'id':1, 'username':'SomeUser', 'words':[{'Hello':4}]}

The obvious disadvantage to the second approach is that the same words will be used across different users, so having a single words collection would help to keeping the data size down.

Can anyone advise me as to what I should do here? Is there a method I have perhaps overlooked in the documentation?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

行雁书 2024-12-18 17:53:34

第二种方法的明显缺点是相同的单词
将在不同的用户中使用,因此只有一个单词
收集将有助于减少数据大小。

不,这就是使用文档数据库的本质。在非 SQL 解决方案中,数据大小实际上不是问题,重要的是访问数据的容易程度和速度。

您的第一种方法是典型的教科书关系模型。在 mongo 中使用它没有任何优势(尽管您可以在 mongo 中以关系方式对其进行建模)。相反,第二种方法为您提供

  • Fatser 读/写,因为每个单词都存储在用户内部。您不需要为此执行多个查询

The obvious disadvantage to the second approach is that the same words
will be used across different users, so having a single words
collection would help to keeping the data size down.

Nope, that's the nature of using document db. Data size is really not a matter in non sql solutions, important thing is how easy and how fast you can access your data.

Your first approach is a typical textbook relational model. There is no advantage of using this in mongo (Though you can model this in relational way in mongo). Instead the second approach gives you

  • Fatser reads/writes since every word is stored inside user. You dont need to perform multiple queries for this
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文