我是否遗漏了有关文档数据库的某些内容？

发布于 2024-09-13 08:01:30 字数 584 浏览 16 评论 0原文

我一直在关注 NoSql 运动的兴起以及随之而来的文档数据库（如 mongodb、ravendb 等）的流行。虽然其中有很多我喜欢的东西，但我觉得我没有理解一些重要的东西。

假设您正在实现一个商店应用程序，并且您想要在数据库中存储产品，所有这些产品都有一个唯一的类别。在关系数据库中，这可以通过两个表（产品表和类别表）来实现，产品表将有一个字段（可能称为“category_id”），该字段将引用类别表中保存正确类别条目的行。这有几个好处，包括数据不重复。

这还意味着，例如，如果您拼错了类别名称，您可以更新类别表，然后将其修复，因为这是该值存在的唯一位置。

但在文档数据库中，情况并非如此。您完全非规范化，这意味着在“产品”文档中，您实际上会有一个保存实际类别字符串的值，导致大量重复数据，并且错误更难以纠正。多想一想，这是否也意味着运行“给我该类别的所有产品”之类的查询可能会导致不具有完整性的结果。

当然，解决这个问题的方法是在文档数据库中重新实现整个“category_id”，但是当我想到这一点时，我意识到我应该只使用关系数据库而不是重新实现它们。

这让我相信我错过了有关文档数据库的一些关键点，导致我走上了这条错误的道路。所以我想把它放入堆栈溢出，我错过了什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹雨成花 2024-09-20 08:01:30

您完全非规范化，这意味着在“产品”文档中，您实际上会有一个保存实际类别字符串的值，导致大量重复数据[...]

确实，非规范化意味着存储额外的数据。它还意味着更少的集合（SQL 中的表），从而导致数据块之间的关系更少。每个文档都可以包含原本来自多个 SQL 表的信息。

现在，如果您的数据库分布在多个服务器上，则查询单个服务器比查询多个服务器更有效。由于文档数据库的非规范化结构，您更有可能只需要查询单个服务器即可获取所需的所有数据。使用 SQL 数据库时，相关数据很可能分布在多个服务器上，从而导致查询效率非常低。

[...]并且错误更难纠正。

也是如此。大多数 NoSQL 解决方案不保证诸如引用完整性之类的事情，而这对于 SQL 数据库来说很常见。因此，您的应用程序负责维护数据之间的关系。然而，由于文档数据库中的关系量非常小，因此它并不像听起来那么难。

文档数据库的优点之一是它无模式。您可以随时完全自由地定义文档的内容；您不必像使用 SQL 数据库那样受限于一组预定义的表和列。

实际示例

如果您要在 SQL 数据库之上构建 CMS，则您将为每个 CMS 内容类型创建一个单独的表，或者使用一个包含通用列的表来存储所有类型的内容。使用单独的表，您将拥有很多表。只需考虑一下您需要的所有连接表，例如每种内容类型的标签和评论等。。使用单个通用表，您的应用程序负责正确管理所有数据。此外，数据库中的原始数据在 CMS 应用程序之外很难更新，而且毫无意义。

通过文档数据库，您可以将每种类型的 CMS 内容存储在单个集合中，同时在每个文档中维护严格定义的结构。您还可以存储文档中的所有标签和注释，从而使数据检索非常高效。这种效率和灵活性是有代价的：您的应用程序需要更加负责管理数据的完整性。另一方面，与 SQL 数据库相比，使用文档数据库进行扩展的成本要低得多。

建议

正如您所看到的，SQL 和 NoSQL 解决方案都有优点和缺点。正如 David 已经指出的，每种类型都有其用途。我建议您分析您的需求并创建两种数据模型，一种用于 SQL 解决方案，另一种用于文档数据库。然后选择最适合的解决方案，同时牢记可扩展性。

You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data [...]

True, denormalizing means storing additional data. It also means less collections (tables in SQL), thus resulting in less relations between pieces of data. Each single document can contain the information that would otherwise come from multiple SQL tables.

Now, if your database is distributed across multiple servers, it's more efficient to query a single server instead of multiple servers. With the denormalized structure of document databases, it's much more likely that you only need to query a single server to get all the data you need. With a SQL database, chances are that your related data is spread across multiple servers, making queries very inefficient.

[...] and errors are much more difficult to correct.

Also true. Most NoSQL solutions don't guarantee things such as referential integrity, which are common to SQL databases. As a result, your application is responsible for maintaining relations between data. However, as the amount of relations in a document database is very small, it's not as hard as it may sound.

One of the advantages of a document database is that it is schema-less. You're completely free to define the contents of a document at all times; you're not tied to a predefined set of tables and columns as you are with a SQL database.

Real-world example

If you're building a CMS on top of a SQL database, you'll either have a separate table for each CMS content type, or a single table with generic columns in which you store all types of content. With separate tables, you'll have a lot of tables. Just think of all the join tables you'll need for things like tags and comments for each content type. With a single generic table, your application is responsible for correctly managing all of the data. Also, the raw data in your database is hard to update and quite meaningless outside of your CMS application.

With a document database, you can store each type of CMS content in a single collection, while maintaining a strongly defined structure within each document. You could also store all tags and comments within the document, making data retrieval very efficient. This efficiency and flexibility comes at a price: your application is more responsible for managing the integrity of the data. On the other hand, the price of scaling out with a document database is much less, compared to a SQL database.

Advice

As you can see, both SQL and NoSQL solutions have advantages and disadvantages. As David already pointed out, each type has its uses. I recommend you to analyze your requirements and create two data models, one for a SQL solution and one for a document database. Then choose the solution that fits best, keeping scalability in mind.

回复收藏 0 原文