在单个应用程序中使用多种数据库类型对数据进行建模
将应用程序的数据模型分解为不同的数据库系统是否有意义?例如,应用程序将所有用户数据和关系存储在图形数据库中(非常适合存储关系),而将其他数据存储在文档数据库中,例如 CouchDB 或 MongoDB?这将要求用户图形数据库引用文档数据库中的唯一 ID,反之亦然。
这是否使数据模型和应用程序过于复杂?或者这是否充分利用了两种类型的数据库系统来扩展您的应用程序?
Does it make sense to break up the data model of an application into different database systems? For example, the application stores all user data and relationships in a graph database (ideal for storing relationships), while storing other data in a document database, such as CouchDB or MongoDB? This would require the user graph database to reference unique ids in the document databases and vice versa.
Is this over complicating the data model and application? Or is this using the best uses of both types of database systems for scaling your application?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它绝对是有意义的,并且完全取决于您的应用程序的要求。如果您可以使用其他数据库系统来完成它们真正擅长的事情。
以全文搜索为例。当然,您可以使用像 MySql 这样的关系数据库进行或多或少复杂的全文搜索。但是有一些系统,例如 Lucene/Solr,针对此类事情进行了优化,并且可以在数百万个文档中快速搜索。因此,您可以使用这些系统来执行其特殊任务(此处:进行漂亮的全文搜索),然后返回标识符,并可能从 RDBMS 加载关系结构化数据。
或者 CouchDB。我在一些项目中使用 couchDB 作为缓存系统。与关系数据库结合。当然,我需要关心一致性,但这绝对值得付出努力。它极大地提高了项目的性能,并将服务器上的负载从 2 降低到 0.2。 :)
It definitely can make sense and depends fully on the requirements of your application. If you can use other database systems for things in which they are really good at.
Take for example full text search. Of course you can do more or less complex full text searches with a relational database like MySql. But there are systems like e.g. Lucene/Solr which are optimized for such things and can search fast in millions of documents. So you could use these systems for their special task (here: make a nifty full text search), then you return the identifiers and maybe load the relational structured data from the RDBMS.
Or CouchDB. I use couchDB in some projects as a caching systems. In combination with a relational database. Of course I need to care about consistency, but it it's definitely worth the effort. It pushed performance in the projects a lot and decreases for example load on the server from 2 to 0.2. :)
例如,类似的事情称为跨存储持久性。正如您提到的,您会将某些数据存储在关系数据库中,将社交关系存储在 graphdb 中,将用户生成的数据(文档)存储在文档数据库中,并将用户提供的多媒体文件(图片、音频、视频)存储在 Blob 存储(如 S3)中。
它主要是查看用例并确保从您需要的任何地方都可以访问每个商店的“主”或索引键(来回)。您可以将实际的查找封装在您的域或 dao 层中。
一些框架(例如 Spring Data 项目)提供了一些开箱即用的初始跨存储持久性,主要是集成具有不同 NOSQL 数据存储的 JPA。例如 Spring Data Graph 允许将您的实体存储在 JPA 中并添加社交图或其他高度互连的内容将数据作为次要关注点< /a> 并利用 a graphdb 用于典型的遍历和其他图形操作(例如排名、建议等)
Something like this is for instance called cross-store persistence. As you mentioned you would store certain data in your relational database, social relationships in a graphdb, user-generated data (documents) in a document-db and user provided multimedia files (pictures, audio, video) in a blob-store like S3.
It is mainly about looking at the use-cases and making sure that from wherever you need it you might access the "primary" or index key of each store (back and forth). You can encapsulate the actual lookup in your domain or dao layer.
Some frameworks like the Spring Data projects provide some initial kind of cross-store persistence out of the box, mostly integrating JPA with a different NOSQL datastore. For instance Spring Data Graph allows it to store your entities in JPA and add social graphs or other highly interconnected data as a secondary concern and leverage a graphdb for the typical traversal and other graph operations (e.g. ranking, suggestions etc.)
另一个术语是多语言持久性。
对于这个问题,有两种相反的立场:
优点:
“与此相反,我是多语言持久性的忠实粉丝。这仅仅意味着为每个用例使用正确的存储后端。例如文件存储、SQL、图形数据库、数据仓库、内存数据库、网络缓存目前,主要使用两种存储方式:文件和 SQL 数据库,这两种存储方式都不是针对每种使用情况的最佳选择。”
缺点:
“我认为我不需要说我是多语言持久性的支持者。而且我相信 Unix 工具哲学。但是在向系统添加更多组件时,您应该意识到这样的系统复杂性正在“爆炸式增长”因此,运营成本也会增加(注意:您还记得为什么 Twitter 开始使用 Cassandra 吗?更不用说您的系统拥有的组件越多,就必须投入更多的注意力和精力来弄清楚整体系统可用性等关键方面,延迟、吞吐量、和一致性。”
Another term for this is polyglot persistence.
Here are two contrary positions on the question:
Pro:
"Contrary to that, I’m a big fan of polyglot persistence. This simply means using the right storage backend for each of your usecases. For example file storages, SQL, graph databases, data ware houses, in-memory databases, network caches, NoSQL. Today there are mostly two storages used, files and SQL databases. Both are not optimal for every usecase."
Con:
"I don’t think I need to say that I’m a proponent of polyglot persistence. And that I believe in Unix tools philosophy. But while adding more components to your system, you should realize that such a system complexity is “exploding” and so will operational costs grow too (nb: do you remember why Twitter started to into using Cassandra?) . Not to mention that the more components your system has the more attention and care must be invested figuring out critical aspects like overall system availability, latency, throughput, and consistency."