MongoDB、C# 和 NoRM +非规范化
我正在尝试使用 MongoDB、C# 和 NoRM 来处理一些示例项目,但此时我'我很难理解数据模型。配合RDBMS的相关数据是没有问题的。然而,在 MongoDB 中,我很难决定如何处理它们。
让我们以 StackOverflow 为例...我毫不犹豫地理解问题页面上的大部分数据应该包含在一个文档中。标题、问题文本、修订、评论……所有这些都集中在一个文档对象中。
我开始变得模糊的是用户数据的问题,例如用户名、头像、声誉(尤其经常变化)......每次有用户时,您是否会非规范化并更新数千条文档记录改变或者以某种方式将数据链接在一起?
在不导致每次页面加载时发生大量查询的情况下,完成用户关系的最有效方法是什么?我注意到 NoRM 中的 DbReference
类型,但尚未找到使用它的好方法。如果我有可为空的可选关系怎么办?
感谢您的见解!
I am trying to use MongoDB, C# and NoRM to work on some sample projects, but at this point I'm having a much harder time wrapping my head around the data model. With RDBMS's related data is no problem. In MongoDB, however, I'm having a difficult time deciding what to do with them.
Let's use StackOverflow as an example... I have no problem understanding that the majority of data on a question page should be included in one document. Title, question text, revisions, comments... all good in one document object.
Where I start to get hazy is on the question of user data like username, avatar, reputation (which changes especially often)... Do you denormalize and update thousands of document records every time there is a user change or do you somehow link the data together?
What is the most efficient way to accomplish a user relationship without causing tons of queries to happen on each page load? I noticed the DbReference<T>
type in NoRM, but haven't found a great way to use it yet. What if I have nullable optional relationships?
Thanks for your insight!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我发现的平衡是使用 SQL 作为规范化数据库,使用 Mongo 作为非规范化副本。我使用 ESB 来保持它们彼此同步。我使用一个称为“准备文档”和“存储文档”的概念。存储文档是仅保存在 mongo 中的数据。对于不相关的数据很有用。准备好的文档包含可以使用规范化数据库中的数据重建的数据。它们在某种程度上充当活动缓存 - 如果数据不同步,它们可以从头开始重建(在复杂的文档中,这是一个昂贵的过程,因为这些文档需要重建许多查询)。它们也可以一次更新一个字段。这就是服务总线发挥作用的地方。它响应规范化数据库更新后发送的事件,然后更新相关的 mongo 准备文档。
充分利用每个数据库的优势。让SQL作为写入数据库,保证数据的完整性。让 Mongo 成为速度极快且可以包含子文档的只读数据库,这样您就需要更少的查询。
** 编辑 **
我刚刚重新阅读了您的问题并意识到您实际上在要求什么。我会留下我原来的答案,以防它有帮助。
我处理您提供的 Stackoverflow 示例的方法是将用户 ID 存储在每个评论中。您将加载包含所有评论的帖子。这就是一个查询。
然后,您将遍历评论数据并提取需要加载的用户 ID 数组。然后将它们加载为批量查询(使用 Q.In() 查询运算符)。总共有两个查询。然后,您需要将数据合并在一起形成最终形式。您需要在何时执行此操作和何时使用 ESB 之类的工具手动更新每个文档之间取得平衡。使用最适合数据结构的每个单独场景的方法。
The balance that I have found is using SQL as the normalized database and Mongo as the denormalized copy. I use a ESB to keep them in sync with each other. I use a concept that I call "prepared documents" and "stored documents". Stored documents are data that is only kept in mongo. Useful for data that isn't relational. The prepared documents contain data that can be rebuilt using the data within the normalized database. They act as living caches in a way - they can be rebuilt from scratch if the data ever falls out of sync (in complicated documents this is an expensive process because these documents require many queries to be rebuilt). They can also be updated one field at a time. This is where the service bus comes in. It responds to events sent after the normalized database has been updated and then updates the relevant mongo prepared documents.
Use each database to their strengths. Allow SQL to be the write database that ensures data integrity. Let Mongo be the read-only database that is blazing fast and can contain sub-documents so that you need less queries.
** EDIT **
I just re-read your question and realized what you were actually asking for. I'm leaving my original answer in case its helpful at all.
The way I would handle the Stackoverflow example you gave is to store the user id in each comment. You would load up the post which would have all of the comments in it. Thats one query.
You would then traverse the comment data and pull out an array of user ids that you need to load. Then load those as a batch query (using the Q.In() query operator). Thats two queries total. You would then need to merge the data together into a final form. There is a balance that you need to strike between when to do it like this and when to use something like an ESB to manually update each document. Use what works best for each individual scenario of your data structure.
我认为你需要取得平衡。
如果我是你,我只会在每个帖子中引用用户 ID,而不是他们的姓名/声誉。
但与 RDBMS 不同的是,您可以选择将注释嵌入到文档中。
I think you need to strike a balance.
If I were you, I'd just reference the userid instead of their name/reputation in each post.
Unlike a RDBMS though, you would opt to have comments embedded in the document.
为什么要避免非规范化和更新“数千条文档记录”? Mongodb 数据库专为非规范化而设计。 Stackoverlow 在后台处理数百万个不同的数据。有些数据可能会在短时间内过时,但这没关系。
所以上面说的主要思想是你应该有非规范化文档以便在用户界面上快速显示它们。
您无法通过引用的文档进行查询,以任何需要非规范化的方式。
我还建议查看 cqrs 架构。
Why you want to avoid denormalization and updating 'thousands of document records'? Mongodb db designed for denormalization. Stackoverlow handle millions of different data in background. And some data can be stale for some short period and it's okay.
So main idea of above said is that you should have denormalized documents in order to fast display them at ui.
You can't query by referenced document, in any way you need denormalization.
Also i suggest have a look into cqrs architecture.
尝试研究cqrs 和事件溯源架构。这将允许您按队列更新所有这些数据。
Try to investigate cqrs and event sourcing architecture. This will allow you to update all this data by queue.