我是否遗漏了有关文档数据库的某些内容?
我一直在关注 NoSql 运动的兴起以及随之而来的文档数据库(如 mongodb、ravendb 等)的流行。虽然其中有很多我喜欢的东西,但我觉得我没有理解一些重要的东西。
假设您正在实现一个商店应用程序,并且您想要在数据库中存储产品,所有这些产品都有一个唯一的类别。在关系数据库中,这可以通过两个表(产品表和类别表)来实现,产品表将有一个字段(可能称为“category_id”),该字段将引用类别表中保存正确类别条目的行。这有几个好处,包括数据不重复。
这还意味着,例如,如果您拼错了类别名称,您可以更新类别表,然后将其修复,因为这是该值存在的唯一位置。
但在文档数据库中,情况并非如此。您完全非规范化,这意味着在“产品”文档中,您实际上会有一个保存实际类别字符串的值,导致大量重复数据,并且错误更难以纠正。多想一想,这是否也意味着运行“给我该类别的所有产品”之类的查询可能会导致不具有完整性的结果。
当然,解决这个问题的方法是在文档数据库中重新实现整个“category_id”,但是当我想到这一点时,我意识到我应该只使用关系数据库而不是重新实现它们。
这让我相信我错过了有关文档数据库的一些关键点,导致我走上了这条错误的道路。所以我想把它放入堆栈溢出,我错过了什么?
I've been looking at the rise of the NoSql movement and the accompanying rise in popularity of document databases like mongodb, ravendb, and others. While there are quite a few things about these that I like, I feel like I'm not understanding something important.
Let's say that you are implementing a store application, and you want to store in the database products, all of which have a single, unique category. In Relational Databases, this would be accomplished by having two tables, a product and a category table, and the product table would have a field (called perhaps "category_id") which would reference the row in the category table holding the correct category entry. This has several benefits, including non-repetition of data.
It also means that if you misspelled the category name, for example, you could update the category table and then it's fixed, since that's the only place that value exists.
In document databases, though, this is not how it works. You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data, and errors are much more difficult to correct. Thinking about this more, doesn't it also mean that running queries like "give me all products with this category" can lead to result that do not have integrity.
Of course the way around this is to re-implement the whole "category_id" thing in the document database, but when I get to that point in my thinking, I realize I should just stay with relational databases instead of re-implementing them.
This leads me to believe I'm missing some key point about document databases that leads me down this incorrect path. So I wanted to put it to stack-overflow, what am I missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
确实,非规范化意味着存储额外的数据。它还意味着更少的集合(SQL 中的表),从而导致数据块之间的关系更少。每个文档都可以包含原本来自多个 SQL 表的信息。
现在,如果您的数据库分布在多个服务器上,则查询单个服务器比查询多个服务器更有效。由于文档数据库的非规范化结构,您更有可能只需要查询单个服务器即可获取所需的所有数据。使用 SQL 数据库时,相关数据很可能分布在多个服务器上,从而导致查询效率非常低。
也是如此。大多数 NoSQL 解决方案不保证诸如引用完整性之类的事情,而这对于 SQL 数据库来说很常见。因此,您的应用程序负责维护数据之间的关系。然而,由于文档数据库中的关系量非常小,因此它并不像听起来那么难。
文档数据库的优点之一是它无模式。您可以随时完全自由地定义文档的内容;您不必像使用 SQL 数据库那样受限于一组预定义的表和列。
实际示例
如果您要在 SQL 数据库之上构建 CMS,则您将为每个 CMS 内容类型创建一个单独的表,或者使用一个包含通用列的表来存储所有类型的内容。使用单独的表,您将拥有很多表。只需考虑一下您需要的所有连接表,例如每种内容类型的标签和评论等。。使用单个通用表,您的应用程序负责正确管理所有数据。此外,数据库中的原始数据在 CMS 应用程序之外很难更新,而且毫无意义。
通过文档数据库,您可以将每种类型的 CMS 内容存储在单个集合中,同时在每个文档中维护严格定义的结构。您还可以存储文档中的所有标签和注释,从而使数据检索非常高效。这种效率和灵活性是有代价的:您的应用程序需要更加负责管理数据的完整性。另一方面,与 SQL 数据库相比,使用文档数据库进行扩展的成本要低得多。
建议
正如您所看到的,SQL 和 NoSQL 解决方案都有优点和缺点。正如 David 已经指出的,每种类型都有其用途。我建议您分析您的需求并创建两种数据模型,一种用于 SQL 解决方案,另一种用于文档数据库。然后选择最适合的解决方案,同时牢记可扩展性。
True, denormalizing means storing additional data. It also means less collections (tables in SQL), thus resulting in less relations between pieces of data. Each single document can contain the information that would otherwise come from multiple SQL tables.
Now, if your database is distributed across multiple servers, it's more efficient to query a single server instead of multiple servers. With the denormalized structure of document databases, it's much more likely that you only need to query a single server to get all the data you need. With a SQL database, chances are that your related data is spread across multiple servers, making queries very inefficient.
Also true. Most NoSQL solutions don't guarantee things such as referential integrity, which are common to SQL databases. As a result, your application is responsible for maintaining relations between data. However, as the amount of relations in a document database is very small, it's not as hard as it may sound.
One of the advantages of a document database is that it is schema-less. You're completely free to define the contents of a document at all times; you're not tied to a predefined set of tables and columns as you are with a SQL database.
Real-world example
If you're building a CMS on top of a SQL database, you'll either have a separate table for each CMS content type, or a single table with generic columns in which you store all types of content. With separate tables, you'll have a lot of tables. Just think of all the join tables you'll need for things like tags and comments for each content type. With a single generic table, your application is responsible for correctly managing all of the data. Also, the raw data in your database is hard to update and quite meaningless outside of your CMS application.
With a document database, you can store each type of CMS content in a single collection, while maintaining a strongly defined structure within each document. You could also store all tags and comments within the document, making data retrieval very efficient. This efficiency and flexibility comes at a price: your application is more responsible for managing the integrity of the data. On the other hand, the price of scaling out with a document database is much less, compared to a SQL database.
Advice
As you can see, both SQL and NoSQL solutions have advantages and disadvantages. As David already pointed out, each type has its uses. I recommend you to analyze your requirements and create two data models, one for a SQL solution and one for a document database. Then choose the solution that fits best, keeping scalability in mind.
我想说,您忽略的第一件事(至少基于帖子的内容)是文档数据库并不意味着取代关系数据库。事实上,您给出的示例在关系数据库中确实运行得很好。它可能应该留在那里。文档数据库只是以另一种方式完成任务的另一种工具,它们并不适合所有任务。
文档数据库是为了解决这个问题(从另一个角度来看),关系数据库并不是解决所有问题的最佳方法。两种设计都有其用途,但本质上并不比另一种更好。
查看 MongoDB 网站上的用例: http://www.mongodb.org /显示/文档/用例+案例
I'd say that the number one thing you're overlooking (at least based on the content of the post) is that document databases are not meant to replace relational databases. The example you give does, in fact, work really well in a relational database. It should probably stay there. Document databases are just another tool to accomplish tasks in another way, they're not suited for every task.
Document databases were made to address the problem that (looking at it the other way around), relational databases aren't the best way to solve every problem. Both designs have their use, neither is inherently better than the other.
Take a look at the Use Cases on the MongoDB website: http://www.mongodb.org/display/DOCS/Use+Cases
文档数据库在您开始使用时会给您一种自由的感觉。您不再需要编写创建表和更改表脚本。您只需将详细信息嵌入主“记录”中即可。
但过了一段时间你就会意识到你被以一种不同的方式锁定了。以您在存储数据时认为不需要的方式组合或聚合数据变得不太容易。数据挖掘/商业智能(搜索未知)变得更加困难。
这意味着检查您的应用程序是否以正确的方式将数据存储在数据库中也变得更加困难。
例如,您有两个集合,每个集合大约有 10000 条“记录”。现在您想知道“表”A 中存在哪些 id 不存在于“表”B 中。
对于 SQL 来说很简单,对于 MongoDB 则困难得多。
但我喜欢 MongoDB!
A document db gives a feeling of freedom when you start. You no longer have to write create table and alter table scripts. You simply embed details in the master 'records'.
But after a while you realize that you are locked in a different way. It becomes less easy to combine or aggregate the data in a way that you didn't think was needed when you stored the data. Data mining/business intelligence (searching for the unknown) becomes harder.
That means that it is also harder to check if your app has stored the data in the db in a correct way.
For instance you have two collection with each approximately 10000 'records'. Now you want to know which ids are present in 'table' A that are not present in 'table' B.
Trivial with SQL, a lot harder with MongoDB.
But I like MongoDB !!
例如,OrientDB 支持无模式、全模式或混合模式。在某些情况下,您需要约束、验证等,但您需要灵活地添加字段而不触及架构。这是一种架构混合模式。
例子:
中,字段“name”和“surname”是强制字段(通过在架构中定义它们),但字段“invented”仅为该文档创建。您的所有应用程序都不需要知道它,但您可以对其执行查询:
它将仅返回具有“invented”字段的文档。
OrientDB, for example, supports schema-less, schema-full or mixed mode. In some contexts you need constraints, validation, etc. but you would need the flexibility to add fields without touch the schema. This is a schema mixed mode.
Example:
In this example the fields "name" and "surname" are mandatories (by defining them in the schema), but the field "invented" has been created only for this document. All your app need to don't know about it but you can execute queries against it:
It will return only the documents with the field "invented".