存储词频数据
我正在尝试使用 Mongo 存储词频数据。每个单词都需要与一个用户相关联,这样我就可以计算出一个人使用每个单词的频率。目前我的单词集看起来像这样:
{'Hello':3, 'user_id':1}
这显然只能在“一对一”的基础上起作用,而且不好。
我正在尝试找出如何最好地使其成为用户和单词之间的“一对多”关系。我是否会将用户关系存储在我的单词集合中,如下所示:
{'word':"Hello", 'users':[{'id':1, 'count':4},{'id':2, 'count':10}]}
或者我是否会将单词计数附加到用户集合中?
{'id':1, 'username':'SomeUser', 'words':[{'Hello':4}]}
第二种方法的明显缺点是不同的用户将使用相同的单词,因此拥有单个单词集合将有助于缩小数据大小。
谁能告诉我我应该在这里做什么?文档中是否有我可能忽略的方法?
I am trying to store word frequency data using Mongo. Each word needs to be associated to a user so I can calculate how often an individual uses each word. Currently my words collection looks like this:
{'Hello':3, 'user_id':1}
Which obviously only works on a 'One To One' basis and is no good.
I am trying to work out how best to make this a 'One To Many' relationshop between the user and the words. Would I store the user relationship in my words collection like so:
{'word':"Hello", 'users':[{'id':1, 'count':4},{'id':2, 'count':10}]}
Or would I attach the word counts to the user collection instead?
{'id':1, 'username':'SomeUser', 'words':[{'Hello':4}]}
The obvious disadvantage to the second approach is that the same words will be used across different users, so having a single words collection would help to keeping the data size down.
Can anyone advise me as to what I should do here? Is there a method I have perhaps overlooked in the documentation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,这就是使用文档数据库的本质。在非 SQL 解决方案中,数据大小实际上不是问题,重要的是访问数据的容易程度和速度。
您的第一种方法是典型的教科书关系模型。在 mongo 中使用它没有任何优势(尽管您可以在 mongo 中以关系方式对其进行建模)。相反,第二种方法为您提供
Nope, that's the nature of using document db. Data size is really not a matter in non sql solutions, important thing is how easy and how fast you can access your data.
Your first approach is a typical textbook relational model. There is no advantage of using this in mongo (Though you can model this in relational way in mongo). Instead the second approach gives you