面向文档的数据库中的关系?

发布于 2024-08-23 16:02:13 字数 364 浏览 9 评论 0原文

我对面向文档的数据库感兴趣,并且想使用 MongoDB。因此,我开始了一个相当简单的项目(一个问题跟踪器),但我很难以非关系方式思考。

我的问题:

  1. 我有两个相互关联的对象(例如 issue = {code:"asdf-11", title:"asdf", reports:{username:"qwer", role:"manager "}} - 这里我有一个与该问题相关的用户)。我应该创建另一个文档“用户”并通过其 ID 在“问题”文档中引用它(就像在关系数据库中一样),还是应该将所有用户的数据保留在子文档中?

  2. 如果我在文档中有对象(子文档),我可以在单个查询中更新它们吗?

I'm interested in document-oriented databases, and I'd like to play with MongoDB. So I started a fairly simple project (an issue tracker), but am having hard times thinking in a non-relational way.

My problems:

  1. I have two objects that relate to each other (e.g. issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}} - here I have a user related to the issue). Should I create another document 'user' and reference it in 'issue' document by its id (like in relational databases), or should I leave all the user's data in the subdocument?

  2. If I have objects (subdocuments) in a document, can I update them all in a single query?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

内心旳酸楚 2024-08-30 16:02:13

我对面向文档的数据库完全陌生,现在我正在尝试使用 node.js 和 mongodb 开发某种 CMS,所以我面临着与您相同的问题。

通过反复试验,我发现了这条经验法则:我为可能成为查询“主题”的每个实体创建一个集合,同时将其余部分嵌入其他对象中。

例如,可以嵌入博客条目中的评论,因为通常它们绑定到条目本身,并且我无法考虑对所有评论进行全局的有用查询。另一方面,附加到帖子的标签可能值得拥有自己的集合,因为即使它们绑定到帖子,您也可能希望对所有标签进行全局推理(例如制作趋势主题列表)。

I'm totally new to document-oriented databases, and right now I'm trying to develop sort of a CMS using node.js and mongodb so I'm facing the same problems as you.

By trial and error I found this rule of thumb: I make a collection for every entity that may be a "subject" for my queries, while embedding the rest inside other objects.

For example, comments in a blog entry can be embedded, because usually they're bound to the entry itself and I can't think about a useful query made globally on all comments. On the other side, tags attached to a post might deserve their own collection, because even if they're bound to the post, you might want to reason globally about all the tags (for example making a list of trending topics).

夏日浅笑〃 2024-08-30 16:02:13

在我看来,这实际上非常简单。 嵌入文档只能通过其主文档访问。如果您可以预见需要查询主文档上下文之外的对象,则不要嵌入它。使用参考。

对于您的示例,

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

我会将问题和报告者各自制作为自己的文档,并在问题中引用报告者。您还可以参考记者中的问题列表。这样您就不会在问题中重复报告者,您可以单独查询每个报告者,您可以按问题查询报告者,也可以按报告者查询问题。如果您在问题中嵌入报告者,则只能通过一种方式查询报告者。

如果嵌入文档,则可以在单个查询中更新所有文档,但必须在每个主文档中重复更新。这是使用参考文档的另一个好理由。

In my mind this is actually pretty simple. Embedded documents can only be accessed via their master document. If you can envision a need to query an object outside the context of the master document, then don't embed it. Use a ref.

For your example

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

I would make issue and reporter each their own document, and reference the reporter in the issue. You could also reference a list of issues in reporter. This way you won't duplicate reporters in issues, you can query them each separately, you can query reporter by issue, and you can query issues by reporter. If you embed reporter in issue, you can only query the one way, reporter by issue.

If you embed documents, you can update them all in a single query, but you have to repeat the update in each master document. This is another good reason to use reference documents.

迷途知返 2024-08-30 16:02:13

mongodb 和其他“NoSQL”产品的优点在于无需设计任何模式。我使用 MongoDB 并且喜欢它,不必编写 SQL 查询和糟糕的 JOIN 查询!所以回答你的两个问题。

1 - 如果您创建多个文档,则需要对数据库进行两次调用。并不是说这是一件坏事,但如果您可以将所有内容放入一个文档中,为什么不呢?我记得当我使用MySQL时,我会创建一个“博客”表和一个“评论”表。现在,我将评论附加到同一集合(也称为表)中的记录中,并继续在此基础上进行构建。

2 - 是的...

The beauty of mongodb and other "NoSQL" product is that there isn't any schema to design. I use MongoDB and I love it, not having to write SQL queries and awful JOIN queries! So to answer your two questions.

1 - If you create multiple documents, you'll need make two calls to the DB. Not saying it's a bad thing but if you can throw everything into one document, why not? I recall when I used to use MySQL, I would create a "blog" table and a "comments" table. Now, I append the comments to the record in the same collection (aka table) and keep building on it.

2 - Yes ...

固执像三岁 2024-08-30 16:02:13

面向文档的数据库中的模式设计一开始似乎很困难,但是在使用 Symfony2 和 MongoDB 构建我的初创公司时,我发现 80% 的时间就像关系数据库一样。


首先,将其视为普通数据库:

首先,只需像使用关系数据库一样创建架构:

每个实体应该有自己的集合< /code>,尤其是当您需要对其中的文档进行分页时

(在 Mongo 中,您可以对嵌套文档数组进行某种程度的分页,但功能有限)


然后只需删除过于复杂的规范化:

  • 我需要一个单独的类别表吗? (只需将列/属性中的类别写为字符串或嵌入文档)
  • 我可以将评论计数直接作为 Int 存储在 Author 集合中吗? (然后用事件更新计数,例如在 Doctrine ODM 中)

嵌入文档:

使用嵌入文档仅用于:

  • 清晰度 (嵌套文档如:地址信息,用户集合中的billingInfo
  • 来存储标签/类别(例如:[ name:“Sport”,parent:“Hobby”,页面:“/运动”
    ]
    )
  • 存储简单多个值(例如,在用户集合中:专业列表、个人网站列表)< /em>

不要在以下情况下使用它们:

  • 父文档会变得太大
  • 当您需要对它们进行分页时,
  • 当您认为该实体足够重要,值得拥有自己的集合时

跨集合和预计算计数的重复值:

如果您需要对 where 条件中的每个值进行查询,请将一些列/属性值从一个集合复制到另一个集合。 (记住没有 join

例如:在 Ticket 集合中还输入作者姓名(不仅仅是 ID

另外,如果您需要一个计数器(用户打开的票证数量,按类别,ecc),预先计算它们。


嵌入引用:

当您有一对多或多对多引用时,请使用带有引用文档 ID 列表的嵌入数组(请参阅MongoDB 数据库参考)。

如果引用的文档被删除,您将需要再次使用事件来删除 ID。
(如果您使用 Doctrine ODM,则有一个扩展:参考完整性)

此类引用由 Doctrine ODM 直接管理: 参考很多


很容易修复错误:

如果你发现你在架构设计中犯了一个错误,修复起来非常简单只需几行 Javascript 即可直接在 Mongo 控制台中运行。

(存储过程变得简单:不需要复杂的迁移脚本)

警告:不要使用 Doctrine ODM 迁移,稍后您会后悔的。

The schema design in Document-oriented DBs can seems difficult at first, but building my startup with Symfony2 and MongoDB I've found that the 80% of the time is just like with a relational DB.


At first, think it like a normal db:

To start, just create your schema as you would with a relational Db:

Each Entity should have his own Collection, especially if you'll need to paginate the documents in it.

(in Mongo you can somewhat paginate nested document arrays, but the capabilities are limited)


Then just remove overly complicated normalization:

  • do I need a separate category table? (simply write the category in a column/property as a string or embedded doc)
  • Can I store comments count directly as an Int in the Author collection? (then update the count with an event, for example in Doctrine ODM)

Embedded documents:

Use embedded documents only for:

  • clearness (nested documents like: addressInfo, billingInfo in the User collection)
  • to store tags/categories ( eg: [ name: "Sport", parent: "Hobby", page: "/sport"
    ]
    )
  • to store simple multiple values (for eg. in User collection: list of specialties, list of personal websites)

Don't use them when:

  • the parent Document will grow too large
  • when you need to paginate them
  • when you feel the entity is important enough to deserve his own collection

Duplicate values across collection and precompute counts:

Duplicate some columns/attributes values from a Collection to another if you need to do a query with each values in the where conditions. (remember there aren't joins)

eg: In the Ticket collection put also the author name (not only the ID)

Also if you need a counter (number of tickets opened by user, by category, ecc), precompute them.


Embed references:

When you have a One-to-Many or Many-to-Many reference, use an embedded array with the list of the referenced document ids (see MongoDB DB Ref).

You'll need to use an Event again to remove an id if the referenced document get deleted.
(There is an extension for Doctrine ODM if you use it: Reference Integrity)

This kind of references are directly managed by Doctrine ODM: Reference Many


Its easy to fix errors:

If you find late that you have made a mistake in the schema design, its quite simply to fix it with few lines of Javascript to run directly in the Mongo console.

(stored procedures made easy: no need of complex migration scripts)

Waring: don't use Doctrine ODM Migrations, you'll regret that later.

聚集的泪 2024-08-30 16:02:13

重新修改了这个答案,因为原来的答案由于阅读不正确而把关系搞错了。

问题 = {代码:“asdf-11”,标题:“asdf”,记者:{用户名:“qwer”,角色:“经理”}}

至于是否嵌入一些有关工单用户(创建者)的重要信息明智的决定与否取决于系统的具体情况。

您是否允许这些用户登录并报告他们发现的问题?如果是这样,那么您可能希望将该关系分解为用户集合。

另一方面,如果情况并非如此,那么您可以轻松摆脱此模式。我在这里看到的一个问题是,如果你想联系记者,而他们的工作角色发生了变化,那就有点尴尬了;然而,这是现实世界的困境,而不是数据库的困境。

由于子文档代表与记者的单一一对一关系,因此您也不应该遇到我原来的答案中提到的碎片问题。

该模式有一个明显的问题,那就是重复更改重复数据(规范化表格内容)。

让我们举个例子。想象一下,您遇到了我之前提到的现实世界的困境,一位名为 Nigel 的用户希望他的角色从现在起能够反映他的新工作职位。这意味着您必须更新 Nigel 是报告者的所有行,并将其角色更改为新职位。对于 MongoDB 来说,这可能是一个冗长且消耗资源的查询。

再次矛盾的是,如果每个用户只有 100 个票证(又名可管理的东西),那么更新操作可能不会太糟糕,而且事实上,数据库很容易管理;另外,由于(希望)文件缺乏移动,这将是一个完全到位的更新。

因此,是否应该嵌入它在很大程度上取决于您的查询和文档等,但是,我想说这个模式不是一个好主意;特别是由于许多根文档中重复更改数据。从技术上讲,是的,你可以逃脱惩罚,但我不会尝试。

相反,我会把两者分开。

如果我的文档中有对象(子文档),我可以在单个查询中更新它们吗?

就像我原来的答案中的关系风格一样,是的,而且很容易。

例如,让我们将 Nigel 的角色更新为 MD (如前所述)并将票证状态更改为已完成:

db.tickets.update({'reporter.username':'Nigel'},{$set:{'reporter.role':'MD', status: 'completed'}})

因此,在这种情况下,单个文档模式确实使 CRUD 更容易。

需要注意的一件事是,由于您的英语原因,您无法使用位置运算符来更新根文档下的所有子文档。相反,它只会更新第一个找到的。

再次希望这是有道理的,我没有遗漏任何东西。 HTH


原始答案

这里我有一个与该问题相关的用户)。我应该创建另一个文档“用户”并通过其 ID 在“问题”文档中引用它(就像在关系数据库中一样),还是应该将所有用户的数据保留在子文档中?

这是一个相当重要的问题,在继续之前需要一些背景知识。

首先要考虑的是问题的大小:

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

不是很大,并且由于您不再需要 reporter 信息(位于根文档上),它可能会更小,但是,问题是从来没有那么简单。例如,如果您查看 MongoDB JIRA: https://jira.mongodb.org/browse /SERVER-9548(作为一个随机页面证明了我的观点)“票”的内容实际上可以相当大。

从嵌入票证中获得真正好处的唯一方法是,您可以将所有用户信息存储在单个 16 MB 的连续存储块中,这是 BSON 文档的最大大小(由 mongod< /code> 当前)。

我认为您无法将所有门票存储在单个用户下。

即使您将票据缩小为代码、标题和描述,您仍然可能会遇到由 MongoDB 中的文档定期更新和更改引起的“瑞士奶酪”问题,就像这样: http://www.10gen.com/presentations/storage-engine-internals 是一个很好的参考我的意思是。

当用户向其根用户文档添加多个票证时,您通常会遇到此问题。门票本身也会发生变化,但可能不会发生剧烈或频繁的变化。

当然,您可以通过使用 2 次大小分配来稍微解决这个问题: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes 它将完全按照罐头上的说明进行操作。

好吧,假设,如果您只有 codetitle 那么是的,您可以将票证作为子文档存储在根用户中,而不会出现太多问题,但是,这是归结为赏金受让人没有提到的具体细节。

如果我的文档中有对象(子文档),我可以在单个查询中更新它们吗?

是的,很容易。这是通过嵌入变得更容易的一件事。您可以使用如下查询:

db.users.update({user_id:uid,'tickets.code':'asdf-1'}, {$set:{'tickets.$.title':'Oh NOES'}})

但是,请注意,您只能使用位置运算符一次更新一个子文档。因此,这意味着您无法在单个原子操作中将单个用户的所有工单日期更新为未来 5 天。

至于添加新票证,这非常简单:

db.users.update({user_id:uid},{$push:{tickets:{code:asdf-1,title:"Whoop"}}})

所以是的,您可以非常简单地根据您的查询在一次调用中更新整个用户数据。

这是一个很长的答案,希望我没有错过任何内容,希望它能有所帮助。

Redid this answer since the original answer took the relation the wrong way round due to reading incorrectly.

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

As to whether embedding some important information about the user (creator) of the ticket is a wise decision or not depends upon the system specifics.

Are you giving these users the ability to login and report issues they find? If so then it is likely you might want to factor that relation off to a user collection.

On the other hand, if that is not the case then you could easily get away with this schema. The one problem I see here is if you wish to contact the reporter and their job role has changed, that's somewhat awkward; however, that is a real world dilemma, not one for the database.

Since the subdocument represents a single one-to-one relation to a reporter you also should not suffer fragmentation problems mentioned in my original answer.

There is one glaring problem with this schema and that is duplication of changing repeating data (Normalised Form stuff).

Let's take an example. Imagine you hit the real world dilemma I spoke about earlier and a user called Nigel wants his role to reflect his new job position from now on. This means you have to update all rows where Nigel is the reporter and change his role to that new position. This can be a lengthy and resource consuming query for MongoDB.

To contradict myself again, if you were to only have maybe 100 tickets (aka something manageable) per user then the update operation would likely not be too bad and would, in fact, by manageable for the database quite easily; plus due to the lack of movement (hopefully) of the documents this would be a completely in place update.

So whether this should be embedded or not depends heavily upn your querying and documents etc, however, I would say this schema isn't a good idea; specifically due to the duplication of changing data across many root documents. Technically, yes, you could get away with it but I would not try.

I would instead split the two out.

If I have objects (subdocuments) in a document, can I update them all in a single query?

Just like the relation style in my original answer, yes and easily.

For example, let's update the role of Nigel to MD (as hinted earlier) and change the ticket status to completed:

db.tickets.update({'reporter.username':'Nigel'},{$set:{'reporter.role':'MD', status: 'completed'}})

So a single document schema does make CRUD easier in this case.

One thing to note, stemming from your English, you cannot use the positional operator to update all subdocuments under a root document. Instead it will update only the first found.

Again hopefully that makes sense and I haven't left anything out. HTH


Original Answer

here I have a user related to the issue). Should I create another document 'user' and reference it in 'issue' document by its id (like in relational databases), or should I leave all the user's data in the subdocument?

This is a considerable question and requires some background knowledge before continuing.

First thing to consider is the size of a issue:

issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}

Is not very big, and since you no longer need the reporter information (that would be on the root document) it could be smaller, however, issues are never that simple. If you take a look at the MongoDB JIRA for example: https://jira.mongodb.org/browse/SERVER-9548 (as a random page that proves my point) the contents of a "ticket" can actually be quite considerable.

The only way you would gain a true benefit from embedding the tickets would be if you could store ALL user information in a single 16 MB block of contigious sotrage which is the maximum size of a BSON document (as imposed by the mongod currently).

I don't think you would be able to store all tickets under a single user.

Even if you was to shrink the ticket to, maybe, a code, title and a description you could still suffer from the "swiss cheese" problem caused by regular updates and changes to documents in MongoDB, as ever this: http://www.10gen.com/presentations/storage-engine-internals is a good reference for what I mean.

You would typically witness this problem as users add multiple tickets to their root user document. The tickets themselves will change as well but maybe not in a drastic or frequent manner.

You can, of course, remedy this problem a bit by using power of 2 sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes which will do exactly what it says on the tin.

Ok, hypothetically, if you were to only have code and title then yes, you could store the tickets as subdocuments in the root user without too many problems, however, this is something that comes down to specifics that the bounty assignee has not mentioned.

If I have objects (subdocuments) in a document, can I update them all in a single query?

Yes, quite easily. This is one thing that becomes easier with embedding. You could use a query like:

db.users.update({user_id:uid,'tickets.code':'asdf-1'}, {$set:{'tickets.$.title':'Oh NOES'}})

However, to note, you can only update ONE subdocument at a time using the positional operator. As such this means you cannot, in a single atomic operation, update all ticket dates on a single user to 5 days in the future.

As for adding a new ticket, that is quite simple:

db.users.update({user_id:uid},{$push:{tickets:{code:asdf-1,title:"Whoop"}}})

So yes, you can quite simply, depending on your queries, update the entire users data in a single call.

That was quite a long answer so hopefully I haven't missed anything out, hope it helps.

憧憬巴黎街头的黎明 2024-08-30 16:02:13

我喜欢MongoDB,但我不得不说,我会在下一个项目中更清醒地使用它。

具体来说,我在嵌入式文档工具方面的运气并不如人们所承诺的那么好。

嵌入文档似乎对于组合很有用(请参阅 UML 组合),但对于聚合则没有用。叶节点很棒,对象图中间的任何内容都不应该是嵌入文档。这将使搜索和验证数据变得比您想象的更加困难。

MongoDB 中绝对更好的一件事是多对 X 关系。您可以仅使用两个表进行多对多关系,并且可以在任一表上表示多对一关系。也就是说,您可以将 1 个钥匙放入 N 行,或将 N 个钥匙放入 1 行,或两者都放置。值得注意的是,完成集合操作(​​交集、并集、不相交集等)的查询实际上可以被您的同事理解。我对这些 SQL 查询一直不满意。我经常不得不满足于“另外两个人会理解这一点”。

如果您的数据曾经变得非常大,您就会知道插入和更新可能会受到索引成本的限制。 MongoDB 中需要的索引更少; ABC 上的索引可用于查询 A、A & B,或A&住宿加早餐旅馆C(但不是 B、C、B & C 或 A & C)。另外,反转关系的功能允许您将一些索引移动到辅助表。我的数据还不够大,无法尝试,但我希望这会有所帮助。

I like MongoDB, but I have to say that I will use it a lot more soberly in my next project.

Specifically, I have not had as much luck with the Embedded Document facility as people promise.

Embedded Document seems to be useful for Composition (see UML Composition), but not for aggregation. Leaf nodes are great, anything in the middle of your object graph should not be an embedded document. It will make searching and validating your data more of a struggle than you'd want.

One thing that is absolutely better in MongoDB is your many-to-X relationships. You can do a many-to-many with only two tables, and it's possible to represent a many-to-one relationship on either table. That is, you can either put 1 key in N rows, or N keys in 1 row, or both. Notably, queries to accomplish set operations (intersection, union, disjoint set, etc) are actually comprehensible by your coworkers. I have never been satisfied with these queries in SQL. I often have to settle for "two other people will understand this".

If you've ever had your data get really big, you know that inserts and updates can be constrained by how much the indexes cost. You need fewer indexes in MongoDB; an index on A-B-C can be used to query for A, A & B, or A & B & C (but not B, C, B & C or A & C). Plus the ability to invert a relationship lets you move some indexes to secondary tables. My data hasn't gotten big enough to try, but I'm hoping that will help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文