MongoDB 模式设计 - 参考与嵌入
我正在编写一个模拟,需要一个支持数据库来存储结果。模拟写入大量数据。出于明显的性能原因,我选择尝试 NoSQL 数据库,特别是 MongoDB。然而,我对我的数据模型有点困惑。
在关系世界中,模式将转换为:
- 模拟保存模拟配置、状态等。
- 场景描述了具体模拟案例。
- 实现对测试结果进行分组。
模拟工作如下。首先,我们创建配置(映射到模拟表)并指定场景以及要计算的实现数量。然后我们开始模拟。模拟在场景中创建实现(并行地,同时计算许多实现并插入到模拟当前正在处理的场景中)。
然而,在NoSQL中,特别是MongoDB中,关系很差而且很慢,所以我应该尽可能地使用嵌入文档。所以我想出了这个:
这个模型应该在首先计算所有实现然后保存时为我提供最佳性能它作为(场景)单个插入到数据库中。
但是,出于性能原因,我想在计算完成后立即将实现插入场景中。这需要在每次实现完成时更新场景。这是一个坏主意吗?它在 MongoDB 参考中说,当将嵌入文档添加到父文档中时,父文档会更新,但无论如何都会有性能损失。
不将实现嵌入到场景中而是引用它会更快吗?稍后读取和聚合数据时会损失多少性能?我还应该知道其他哪些陷阱吗?
谢谢。
I am writing a simulation which requires a backing database to store the results. The simulation writes a huge amount of data. For obvious performance reasons, I chose to try out a NoSQL database, specifically MongoDB. However, I'm a bit puzzled over my data model.
In relational world, the schema would translate to this:
- Simulation holds simulation configuration, status, etc.
- Scenario describes a specific simulation case.
- Realization groups TestResults.
The simulation work as the following. First we create configuration (maps to Simulation table) and specify scenarios and how many Realization to calculate. Then we start the simulation. The simulation creates realizations in a scenario (in parallel, so many realizations and calculated at the same time and inserted into the scenario the simulation is currently working on).
However, in NoSQL, specifically MongoDB, relations are bad and slow, so I should make use of embedded documents as much as possible. So I came up with this:
This model should give me the best performance when first calculating all realizations and THEN saving it to the database as a single insert (of Scenario).
However, for performance reasons, I want to insert a Realization into Scenario as soon as it is computed. Which would require updating the Scenario every time a realization is complited. Is this a bad idea ? It says on the MongoDB reference that when adding a embedded document into a parent document, the parent document is updated but there is a performance loss anyway.
Would it be faster not to embed Realization into Scenario but reference it ? How much performance would be lost when reading and aggregating the data later ? Any other pitfalls I should know ?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这取决于您将如何使用数据 - 嵌入可能涉及更新多个文档,因此写入速度很慢,但读取始终只有一个文档,因此会很快。引用则相反 - 写入单个文档(快)但读取多个文档(慢)。
除了潜在的限制(例如达到嵌入文档的最大大小)之外,这只是取决于哪种类型的性能对您的场景更重要。
It depends how you will use the data - embedding can involve updating multiple documents, so is slow to write but reading is always one document only so will be fast. Referencing is the opposite - writing to a single document (fast) but reading multiple documents (slow).
Aside from potential limitations like reaching a maximum size for embedded documents, it just comes down to which type of performance is more important for your scenario.
您应该考虑的另一件事是您是否要更新您的记录,
例如,如果您有一个嵌入的用户列表(假设是朋友),如果您更改用户集合中其中一个用户的名字,则必须迭代整个朋友列表并手动更新他们的名字。
another thing that you should consider is if you are going to update your records,
for example if you have a embedded list of users (let's say friends), if you change the first name of one of the users in users collection, you must iterate the whole friends list and manually update their first name.