当前位置：文江博客话题详情

如何停止“相关”思考

发布于 2024-07-25 15:54:07 字数 314 浏览 14 评论 0原文

在工作中，我们最近启动了一个使用 CouchDB（面向文档的数据库）的项目。我一直很难忘记我所有的关系数据库知识。

我想知道你们中的一些人是如何克服这个障碍的？你是如何停止关系性思考并开始记录性思考的（我很抱歉编造了这个词）。

有什么建议么？有帮助的提示？

编辑：如果有什么区别，我们正在使用 Ruby & CouchPotato 连接到数据库。

编辑2：SO一直在骚扰我接受答案。我认为我选择了对我学习帮助最大的一个。然而，我认为没有真正的“正确”答案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凌乱心跳 2024-08-01 15:54:07

我认为，在仔细阅读了有关该主题的几页内容之后，这完全取决于您正在处理的数据类型。

RDBMS 代表一种自上而下的方法，数据库设计者可以断言数据库中存在的所有数据的结构。您定义一个人有名字、姓氏、中间名和家庭地址等。您可以使用 RDBMS 强制执行此操作。如果您没有关于某个人的家乡星球的专栏，那么想成为拥有与地球不同的家乡星球的人的人就很不幸了；您必须稍后添加一列，否则数据无法存储在 RDBMS 中。无论如何，大多数程序员都会在他们的应用程序中做出这样的假设，因此假设和执行这并不是一件愚蠢的事情。定义事物可能是件好事。但是，如果您将来需要记录其他属性，则必须将它们添加进去。关系模型假设您的数据属性不会发生太大变化。

使用 MapReduce 之类的“云”类型数据库，在您的例子中是 CouchDB，不要做出上述假设，而是从下往上查看数据。数据输入到文档中，文档可以具有任意数量的不同属性。它假设您的数据，根据其定义，其可能具有的属性类型是多种多样的。它说：“我只知道数据库 Person 中有这个文档，它的 HomePlanet 属性为“Eternium”，名字为“Lord Nibbler”，但没有姓氏。” 该模型适合网页：所有网页都是一个文档，但文档的实际内容/标签/键差异很大，以至于您无法将它们放入 DBMS 自上而下的严格结构中。这就是为什么 Google 认为 MapReduce 模型是 roxors soxors，因为 Google 的数据集非常多样化，需要从一开始就构建模糊性，并且由于海量数据集能够利用并行处理（MapReduce 使这变得微不足道）。文档数据库模型假设您的数据属性可能/将会发生很大变化或非常多样化，如果数据存储在关系数据库中，则可能会发现“间隙”和大量稀疏填充的列。虽然您可以使用 RDBMS 来存储这样的数据，但它很快就会变得丑陋。

那么回答你的问题：在查看使用 MapReduce 范例的数据库时，你根本无法进行“关系”思考。因为，它实际上并不存在强制关系。这是一个你必须克服的概念性难题。

我遇到的一篇很好的文章对这两个数据库进行了很好的比较和对比，它是 MapReduce: A Major Step Back，认为 MapReduce 范式数据库是技术倒退，并且不如 RDBMS。我不同意作者的论点，并认为数据库设计者只需根据他/她的情况选择正确的数据库即可。

I think, after perusing about on a couple of pages on this subject, it all depends upon the types of data you are dealing with.

RDBMSes represent a top-down approach, where you, the database designer, assert the structure of all data that will exist in the database. You define that a Person has a First,Last,Middle Name and a Home Address, etc. You can enforce this using a RDBMS. If you don't have a column for a Person's HomePlanet, tough luck wanna-be-Person that has a different HomePlanet than Earth; you'll have to add a column in at a later date or the data can't be stored in the RDBMS. Most programmers make assumptions like this in their apps anyway, so this isn't a dumb thing to assume and enforce. Defining things can be good. But if you need to log additional attributes in the future, you'll have to add them in. The relation model assumes that your data attributes won't change much.

"Cloud" type databases using something like MapReduce, in your case CouchDB, do not make the above assumption, and instead look at data from the bottom-up. Data is input in documents, which could have any number of varying attributes. It assumes that your data, by its very definition, is diverse in the types of attributes it could have. It says, "I just know that I have this document in database Person that has a HomePlanet attribute of "Eternium" and a FirstName of "Lord Nibbler" but no LastName." This model fits webpages: all webpages are a document, but the actual contents/tags/keys of the document vary soo widely that you can't fit them into the rigid structure that the DBMS pontificates from upon high. This is why Google thinks the MapReduce model roxors soxors, because Google's data set is so diverse it needs to build in for ambiguity from the get-go, and due to the massive data sets be able to utilize parallel processing (which MapReduce makes trivial). The document-database model assumes that your data's attributes may/will change a lot or be very diverse with "gaps" and lots of sparsely populated columns that one might find if the data was stored in a relational database. While you could use an RDBMS to store data like this, it would get ugly really fast.

To answer your question then: you can't think "relationally" at all when looking at a database that uses the MapReduce paradigm. Because, it doesn't actually have an enforced relation. It's a conceptual hump you'll just have to get over.

A good article I ran into that compares and contrasts the two databases pretty well is MapReduce: A Major Step Back, which argues that MapReduce paradigm databases are a technological step backwards, and are inferior to RDBMSes. I have to disagree with the thesis of the author and would submit that the database designer would simply have to select the right one for his/her situation.

回复收藏 0 原文

爱格式化 2024-08-01 15:54:07

一切都与数据有关。如果您拥有最有意义的相关数据，则文档存储可能没有用。典型的基于文档的系统是搜索服务器，您有一个巨大的数据集并且想要查找特定的项目/文档，该文档是静态的或版本化的。

在存档类型的情况下，文档可能确实是文档，不会更改并且具有非常灵活的结构。将它们的元数据存储在关系数据库中是没有意义的，因为它们都非常不同，因此很少有文档可以共享这些标签。基于文档的系统不存储空值。

非关系/类似文档的数据在非规范化时才有意义。它不会改变太多，或者你不太关心一致性。

如果您的用例非常适合关系模型，那么可能不值得将其压缩到文档模型中。

这是一篇关于非关系数据库的好文章。

另一种思考方式是，文档就是行。有关文档的所有内容都在该行中，并且特定于该文档。行很容易分割，因此缩放更容易。

回复收藏 0 原文

诺曦 2024-08-01 15:54:07

在 CouchDB 中，就像 Lotus Notes 一样，您确实不应该将文档视为类似于行。

相反，文档是一个关系（表）。

每个文档都有许多行——字段值：

ValueID(PK)  Document ID(FK)   Field Name        Field Value
========================================================
92834756293  MyDocument        First Name        Richard
92834756294  MyDocument        States Lived In   TX
92834756295  MyDocument        States Lived In   KY

每个视图都是一个跨表查询，它在每个文档的大量 UNION ALL 中进行选择。

因此，它仍然是相关的，但不是最直观的意义上，也不是最重要的意义上：良好的数据管理实践。

In CouchDB, like Lotus Notes, you really shouldn't think about a Document as being analogous to a row.

Instead, a Document is a relation (table).

Each document has a number of rows--the field values:

ValueID(PK)  Document ID(FK)   Field Name        Field Value
========================================================
92834756293  MyDocument        First Name        Richard
92834756294  MyDocument        States Lived In   TX
92834756295  MyDocument        States Lived In   KY

Each View is a cross-tab query that selects across a massive UNION ALL's of every Document.

So, it's still relational, but not in the most intuitive sense, and not in the sense that matters most: good data management practices.

回复收藏 0 原文

昇り龍 2024-08-01 15:54:07

面向文档的数据库并不拒绝关系的概念，它们只是有时让应用程序取消引用链接（CouchDB），甚至直接支持文档之间的关系（MongoDB）。更重要的是 DODB 是无模式的。在基于表的存储中，可以通过大量开销来实现此属性（请参阅 richardtallent 的回答），但在这里它的效率更高。从 RDBMS 切换到 DODB 时，我们真正应该学习的是忘记表并开始考虑数据。这就是绵羊模拟器所说的“自下而上”的方法。这是一个不断发展的模式，而不是预定义的普罗克拉斯特式床。当然，这并不意味着图式应该以任何形式被完全抛弃。您的应用程序必须解释数据，以某种方式限制其形式——这可以通过将文档组织到集合中，通过使用验证方法创建模型来完成——但这现在是应用程序的工作。

回复收藏 0 原文