MongoDB 文档架构
我一直在开发一个使用 MongoDB 数据库层的 Web 项目。我有一个特定的实体,无法正确映射到文档数据库,认为最好获得一些反馈。
假设我有 User 和 Item 集合。用户可以喜欢或不喜欢项目。项目中也有标签,用户也可以喜欢或不喜欢标签。我需要能够足够快地查找喜欢/不喜欢的计数。
我想出的是这样的(对于项目):
{
name: "Item Name",
statistics : {
likes: 5,
dislikes: 6
},
tags: [
{ name: "Foo", likes: 10, dislikes: 20 },
{ name: "Bar", likes: 5, dislikes: 1 }
]
}
这相当不错。但问题是,我需要知道用户是否喜欢/不喜欢某个标签或项目。现在,我想到的是这样的:
{
name: "Item Name",
statistics : {
likes: 5,
dislikes: 6
},
tags: [
{
name: "Foo",
likes: 2,
dislikes: 1,
votes: [
{ user: "user1_id", vote: 1 }, //like
{ user: "user2_id", vote: 1 }, //like
{ user: "user3_id", vote: -1 }, //dislike
]
},
{
name: "Bar",
likes: 0,
dislikes: 0,
votes: []
}
]
}
这看起来很有希望,我在这里看到的最大好处是,如果有人改变主意并且不喜欢他以前喜欢的东西,我可以进行原子更新。
但是,我预计每个项目大约有 10 个标签,每个标签可能有 100 票。然后,每个项目都有大约 1000 个嵌套投票对象。我知道 mongodb 可以处理 16mb 的文档,但是,将这么多数据存储在一个文档中可以吗?
我应该选择标准化模型吗?也许有一个“tagvotes”集合和一个itemvotes集合?实际上对我来说感觉更自然。
只是想知道我是在思考关系性的还是理性的?
谢谢。
I've been working on a web project with MongoDB database layer. I have a particular entity that I can not map to document db properly, thought it would be better to get some feedback.
Say, I have User and Item collections. Users can like or dislike items. There are also tags in items and users also can like or dislike tags. I need to be able to look up for like / dislike counts fast enough.
What I came up with is something like this (for item):
{
name: "Item Name",
statistics : {
likes: 5,
dislikes: 6
},
tags: [
{ name: "Foo", likes: 10, dislikes: 20 },
{ name: "Bar", likes: 5, dislikes: 1 }
]
}
This is pretty decent. But the problem is, I need to know if a user liked / disliked a tag or item. Now, what I came up with is something like this:
{
name: "Item Name",
statistics : {
likes: 5,
dislikes: 6
},
tags: [
{
name: "Foo",
likes: 2,
dislikes: 1,
votes: [
{ user: "user1_id", vote: 1 }, //like
{ user: "user2_id", vote: 1 }, //like
{ user: "user3_id", vote: -1 }, //dislike
]
},
{
name: "Bar",
likes: 0,
dislikes: 0,
votes: []
}
]
}
This looks promising, and the biggest benefit I see here is that I can do atomic updates if someone changes his mind and dislikes something that he liked before.
But, I expect around 10 tags in each item, with, maybe 100 votes each. Then I have around 1000 nested vote objects for each item. I know that mongodb can handle 16mb documents but still, is it ok to store this much data in one document?
Should I go for a normalized model. Maybe with a "tagvotes" collection and an itemvotes collection? It feels more natural to me actually.
Just wandering if I'm thinking relational or rational?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在某些时候,随着 M 和 N 的增长,在任何 M x N 类型的情况下尝试嵌入所有内容都变得不可能。在达到这一点之前,您需要创建一个单独的集合并进行客户端连接;但这并不意味着你必须让一切完全正常化。
在这种情况下,考虑一下您想要向用户展示什么视图:显然您想要显示该项目、它有多少喜欢和不喜欢以及已应用于它的标签集,以及每个标签的受欢迎程度是。但是喜欢/不喜欢该对象以及喜欢/不喜欢每个标签的用户的实际列表可以进入单独的文档(在单独的集合中)。
使用这样的模式,您可以执行一个查询来获取该项目以及与该项目一起显示所需的所有内容。然后,如果您需要,只需再进行一次查询即可获取当前用户对该项目的意见以及他们投票的与该项目相关的所有标签。
At some point trying to embed everything becomes impossible in any M x N type of situation as M and N grow. Well before you reach that point you need to create a separate collection and do client-side joins; but that doesn't mean you have to normalize everything totally.
In this case, think about what views you want to show the user: clearly you will want to show the item, how many likes and dislikes it has and the set of tags that have been applied to it and maybe how popular each of those tags are. But the actual list of users who liked/disliked the object and liked/disliked each tag can go into a separate document (in a separate collection).
With a schema like that you can do one query to get the item and everything you need to display alongside that item. And then, if you need it, just one more query to get the current user's opinions about that item and all of the tags they have voted on that are relevant to that item.
我没有发现每个对象存储的数据量存在问题,但您的读取/更新模式令人担忧:每次获取项目时,您还会获取所有投票、每个用户的投票id等。此外,当添加投票时,您将增加该对象。有时,MongoDB 必须重新分配您的对象,这需要一些时间。随着时间的推移,它会了解到您经常增长对象,并且填充因子将会增加,但频繁增长对象并不是最好的主意。
这有点棘手。您可以使用
$pull
和$push
,但我不知道如何才能保持likes
和不喜欢
计数同步。此外,如果用户真的改变了主意会发生什么?你必须同时执行$push
和$pull
,如果我没记错的话,这是不可能的。两个都。这是一个关系问题 :-)
现在我想得出结论,您应该对计数进行非规范化并将关系存储在不同的集合中,但 Hightechrider 已经这样写了。太慢了。 <代码>;-)
I don't see problems with the amount of data you store per object, but your read/update patterns are worrying: every time you fetch the item, you'll also fetch all the votes, each user's id, etc. Also, when adding votes, you will grow the object. Sometimes, MongoDB will have to reallocate your object, which takes a bit of time. Over time, it will learn that you are frequently growing objects, and the padding factor will increase, but frequently growing objects is not the best idea.
This is a bit tricky. You can use
$pull
and$push
, but off the top of my head I don't know how you can also keep thelikes
anddislikes
counts in sync. Moreover, what happens if a user really changed his mind? You'd have to do both$push
and$pull
, and that is not possible if I remember correctly.Both. This is a relational problem :-)
Now I wanted to conclude that you should denormalize the counts and store the relations in a different collection, but Hightechrider already wrote that. Too slow.
;-)