面向文档的数据库比关系数据库更适合持久化对象吗?
就数据库使用而言,过去十年是 ORM 的时代,数百家竞相将我们的对象图持久保存在普通的老式 RMDBS 中。现在我们似乎正在见证面向文档的数据库时代的到来。 这些 数据库针对无模式文档进行了高度优化,但也因其并行扩展和查询集群的能力而非常有吸引力。
在面向对象设计中持久保存数据模型方面,面向文档的数据库比 RDBMS 还具有一些优势。由于这些表是无模式的,因此可以在继承层次结构中并排存储属于不同类的对象。此外,当域模型发生变化时,只要代码能够处理从旧版本的域类中获取对象,就可以避免在每次更改时迁移整个数据库。
另一方面,面向文档的数据库的性能优势似乎主要出现在存储更深层次的文档时。用面向对象的术语来说,是由其他类组成的类,例如博客文章及其评论。不过,在我能想到的大多数示例中,例如博客示例,读取访问权限的增加似乎被每次新评论时必须编写整个博客文章“文档”的惩罚所抵消。额外。
在我看来,如果人们非常小心地组织深度图中的对象,并针对数据的读写方式进行优化,那么面向文档的数据库似乎可以为面向对象的系统带来显着的好处,但这意味着了解用例正面。在现实世界中,我们通常不知道,直到我们真正有一个可以分析的实时实现。
那么关系型数据库与面向文档型数据库的情况是否是一种摇摆和迂回的情况呢?我对人们的意见和建议感兴趣,特别是是否有人在面向文档的数据库上构建了任何重要的应用程序。
In terms of database usage, the last decade was the age of the ORM with hundreds competing to persist our object graphs in plain old-fashioned RMDBS. Now we seem to be witnessing the coming of age of document-oriented databases. These databases are highly optimized for schema-free documents but are also very attractive for their ability to scale out and query a cluster in parallel.
Document-oriented databases also hold a couple of advantages over RDBMS's for persisting data models in object-oriented designs. As the tables are schema-free, one can store objects belonging to different classes in an inheritance hierarchy side-by-side. Also, as the domain model changes, so long as the code can cope with getting back objects from an old version of the domain classes, one can avoid having to migrate the whole database at every change.
On the other hand, the performance benefits of document-oriented databases mainly appear to come about when storing deeper documents. In object-oriented terms, classes which are composed of other classes, for example, a blog post and its comments. In most of the examples of this I can come up with though, such as the blog one, the gain in read access would appear to be offset by the penalty in having to write the whole blog post "document" every time a new comment is added.
It looks to me as though document-oriented databases can bring significant benefits to object-oriented systems if one takes extreme care to organize the objects in deep graphs optimized for the way the data will be read and written but this means knowing the use cases up front. In the real world, we often don't know until we actually have a live implementation we can profile.
So is the case of relational vs. document-oriented databases one of swings and roundabouts? I'm interested in people's opinions and advice, in particular if anyone has built any significant applications on a document-oriented database.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这取决于您的数据的结构和数据访问模式。
文档数据库存储和检索文档,基本的原子存储单元是文档。正如您所说,您需要考虑数据访问模式/用例来创建智能文档模型。当您的域模型可以在某些文档之间拆分和分区时,文档数据库就像一个魅力。例如,对于博客软件、CMS 或 wiki 软件,文档数据库效果非常好。只要您能找到一种好方法将数据压缩到文档中,就不会有任何问题。但不要尝试 将关系模型装入文档数据库。
一旦数据访问模式在关系上使用大量“导航”,图形或对象数据库就是更自然的选择。
另一件事是关于读/写性能的权衡。例如博客软件。在过渡 RDBMS 数据模型中,数据被标准化。这意味着读取数据的成本很高,因为从不同的表读取数据,计算与连接的关系等以读取博客文章。作为交换,更换标签的成本并不高。
相比之下,在文档数据库中阅读博客文章很便宜,因为您只需加载后期文档。但是更新可能会更昂贵,因为您需要存储整个文档。或者更糟糕的是,浏览大量文档来更改某些内容(重命名标签场景)。在大多数系统中,阅读比写作更重要。因此,使用重新规范化的数据存储实际上是有意义的。
我认为在大型数据库上,无模式设计有其优势。在 RDBMS 中,您需要升级模式,这是一个非常痛苦的过程。尤其是将现有数据转换为新模式。在无模式数据库中,您的应用程序需要处理这个问题,这提供了更大的灵活性。例如,当访问旧文档时,您可以动态升级架构。这样,您可以保持庞大的数据库正常运行,同时应用程序可以动态处理旧版本。
Well it depends how your data is structured and on the data-access-patterns.
Document databases store and retrieve documents and basic atomic stored unit is a document. As you said, you need to think about your data-access patterns / use-cases to create a smart document-model. When your domain model can be split and partitioned across some documents, a document-database works like a charm. For example for a blog-software, a CMS or a wiki-software a document-db works extremely well. As long as you can find a good way to squeeze your data into a document you don't have any problems. But don't try to fit a relational-model into a document-database.
As soon as you data-access patterns use a lot of 'navigation' on relations, graph or object-databases are a more natural choice.
Another thing is about read/write-performance trade offs. For example a blog-software. In a transitional RDBMS data-model the data is normalized. This means, that reading the data is expensive, because read from different tables, calculate relations with joins etc to read a blog-post. In exchange, changing a tag is inexpensive.
In contrast, in a document-database reading a blog-post is cheap, because you just load the post-document. However updating is probably more expensive, because you need to store the whole document. Or worse, go through a lot of documents to change something (rename a tag-scenario). In most systems, reading is way more important than writing. So it actually makes sense to use the renormalized data stores.
I think that on large databases the schema-free design can have its advantages. In RDBMS you need to upgrade you schema which is a really painful process. Especially to convert the existing data to the new schema. In a schema-free database, you application needs to deal with that, which gives more flexibility. For example, you can upgrade the schema on the fly, when a old document is access. This way, you can keep your giant database up and running, while the application handles older versions on the fly.