如何停止“相关”思考
在工作中,我们最近启动了一个使用 CouchDB(面向文档的数据库)的项目。 我一直很难忘记我所有的关系数据库知识。
我想知道你们中的一些人是如何克服这个障碍的? 你是如何停止关系性思考并开始记录性思考的(我很抱歉编造了这个词)。
有什么建议么? 有帮助的提示?
编辑:如果有什么区别,我们正在使用 Ruby & CouchPotato 连接到数据库。
编辑2:SO一直在骚扰我接受答案。 我认为我选择了对我学习帮助最大的一个。 然而,我认为没有真正的“正确”答案。
At work, we recently started a project using CouchDB (a document-oriented database). I've been having a hard time un-learning all of my relational db knowledge.
I was wondering how some of you overcame this obstacle? How did you stop thinking relationally and start think documentally (I apologise for making up that word).
Any suggestions? Helpful hints?
Edit: If it makes any difference, we're using Ruby & CouchPotato to connect to the database.
Edit 2: SO was hassling me to accept an answer. I chose the one that helped me learn the most, I think. However, there's no real "correct" answer, I suppose.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我认为,在仔细阅读了有关该主题的几页内容之后,这完全取决于您正在处理的数据类型。
RDBMS 代表一种自上而下的方法,数据库设计者可以断言数据库中存在的所有数据的结构。 您定义一个人有名字、姓氏、中间名和家庭地址等。您可以使用 RDBMS 强制执行此操作。 如果您没有关于某个人的家乡星球的专栏,那么想成为拥有与地球不同的家乡星球的人的人就很不幸了; 您必须稍后添加一列,否则数据无法存储在 RDBMS 中。 无论如何,大多数程序员都会在他们的应用程序中做出这样的假设,因此假设和执行这并不是一件愚蠢的事情。 定义事物可能是件好事。 但是,如果您将来需要记录其他属性,则必须将它们添加进去。关系模型假设您的数据属性不会发生太大变化。
使用 MapReduce 之类的“云”类型数据库,在您的例子中是 CouchDB,不要做出上述假设,而是从下往上查看数据。 数据输入到文档中,文档可以具有任意数量的不同属性。 它假设您的数据,根据其定义,其可能具有的属性类型是多种多样的。 它说:“我只知道数据库 Person 中有这个文档,它的 HomePlanet 属性为“Eternium”,名字为“Lord Nibbler”,但没有姓氏。” 该模型适合网页:所有网页都是一个文档,但文档的实际内容/标签/键差异很大,以至于您无法将它们放入 DBMS 自上而下的严格结构中。 这就是为什么 Google 认为 MapReduce 模型是 roxors soxors,因为 Google 的数据集非常多样化,需要从一开始就构建模糊性,并且由于海量数据集能够利用并行处理(MapReduce 使这变得微不足道) 。 文档数据库模型假设您的数据属性可能/将会发生很大变化或非常多样化,如果数据存储在关系数据库中,则可能会发现“间隙”和大量稀疏填充的列。 虽然您可以使用 RDBMS 来存储这样的数据,但它很快就会变得丑陋。
那么回答你的问题:在查看使用 MapReduce 范例的数据库时,你根本无法进行“关系”思考。 因为,它实际上并不存在强制关系。 这是一个你必须克服的概念性难题。
我遇到的一篇很好的文章对这两个数据库进行了很好的比较和对比,它是 MapReduce: A Major Step Back,认为 MapReduce 范式数据库是技术倒退,并且不如 RDBMS。 我不同意作者的论点,并认为数据库设计者只需根据他/她的情况选择正确的数据库即可。
I think, after perusing about on a couple of pages on this subject, it all depends upon the types of data you are dealing with.
RDBMSes represent a top-down approach, where you, the database designer, assert the structure of all data that will exist in the database. You define that a Person has a First,Last,Middle Name and a Home Address, etc. You can enforce this using a RDBMS. If you don't have a column for a Person's HomePlanet, tough luck wanna-be-Person that has a different HomePlanet than Earth; you'll have to add a column in at a later date or the data can't be stored in the RDBMS. Most programmers make assumptions like this in their apps anyway, so this isn't a dumb thing to assume and enforce. Defining things can be good. But if you need to log additional attributes in the future, you'll have to add them in. The relation model assumes that your data attributes won't change much.
"Cloud" type databases using something like MapReduce, in your case CouchDB, do not make the above assumption, and instead look at data from the bottom-up. Data is input in documents, which could have any number of varying attributes. It assumes that your data, by its very definition, is diverse in the types of attributes it could have. It says, "I just know that I have this document in database Person that has a HomePlanet attribute of "Eternium" and a FirstName of "Lord Nibbler" but no LastName." This model fits webpages: all webpages are a document, but the actual contents/tags/keys of the document vary soo widely that you can't fit them into the rigid structure that the DBMS pontificates from upon high. This is why Google thinks the MapReduce model roxors soxors, because Google's data set is so diverse it needs to build in for ambiguity from the get-go, and due to the massive data sets be able to utilize parallel processing (which MapReduce makes trivial). The document-database model assumes that your data's attributes may/will change a lot or be very diverse with "gaps" and lots of sparsely populated columns that one might find if the data was stored in a relational database. While you could use an RDBMS to store data like this, it would get ugly really fast.
To answer your question then: you can't think "relationally" at all when looking at a database that uses the MapReduce paradigm. Because, it doesn't actually have an enforced relation. It's a conceptual hump you'll just have to get over.
A good article I ran into that compares and contrasts the two databases pretty well is MapReduce: A Major Step Back, which argues that MapReduce paradigm databases are a technological step backwards, and are inferior to RDBMSes. I have to disagree with the thesis of the author and would submit that the database designer would simply have to select the right one for his/her situation.
一切都与数据有关。 如果您拥有最有意义的相关数据,则文档存储可能没有用。 典型的基于文档的系统是搜索服务器,您有一个巨大的数据集并且想要查找特定的项目/文档,该文档是静态的或版本化的。
在存档类型的情况下,文档可能确实是文档,不会更改并且具有非常灵活的结构。 将它们的元数据存储在关系数据库中是没有意义的,因为它们都非常不同,因此很少有文档可以共享这些标签。 基于文档的系统不存储空值。
非关系/类似文档的数据在非规范化时才有意义。 它不会改变太多,或者你不太关心一致性。
如果您的用例非常适合关系模型,那么可能不值得将其压缩到文档模型中。
这是一篇关于非关系数据库的好文章。
另一种思考方式是,文档就是行。 有关文档的所有内容都在该行中,并且特定于该文档。 行很容易分割,因此缩放更容易。
It's all about the data. If you have data which makes most sense relationally, a document store may not be useful. A typical document based system is a search server, you have a huge data set and want to find a specific item/document, the document is static, or versioned.
In an archive type situation, the documents might literally be documents, that don't change and have very flexible structures. It doesn't make sense to store their meta data in a relational databases, since they are all very different so very few documents may share those tags. Document based systems don't store null values.
Non-relational/document-like data makes sense when denormalized. It doesn't change much or you don't care as much about consistency.
If your use case fits a relational model well then it's probably not worth squeezing it into a document model.
Here's a good article about non relational databases.
Another way of thinking about it is, a document is a row. Everything about a document is in that row and it is specific to that document. Rows are easy to split on, so scaling is easier.
在 CouchDB 中,就像 Lotus Notes 一样,您确实不应该将文档视为类似于行。
相反,文档是一个关系(表)。
每个文档都有许多行——字段值:
每个视图都是一个跨表查询,它在每个文档的大量 UNION ALL 中进行选择。
因此,它仍然是相关的,但不是最直观的意义上,也不是最重要的意义上:良好的数据管理实践。
In CouchDB, like Lotus Notes, you really shouldn't think about a Document as being analogous to a row.
Instead, a Document is a relation (table).
Each document has a number of rows--the field values:
Each View is a cross-tab query that selects across a massive UNION ALL's of every Document.
So, it's still relational, but not in the most intuitive sense, and not in the sense that matters most: good data management practices.
面向文档的数据库并不拒绝关系的概念,它们只是有时让应用程序取消引用链接(CouchDB),甚至直接支持文档之间的关系(MongoDB)。 更重要的是 DODB 是无模式的。 在基于表的存储中,可以通过大量开销来实现此属性(请参阅 richardtallent 的回答),但在这里它的效率更高。 从 RDBMS 切换到 DODB 时,我们真正应该学习的是忘记表并开始考虑数据。 这就是绵羊模拟器所说的“自下而上”的方法。 这是一个不断发展的模式,而不是预定义的普罗克拉斯特式床。 当然,这并不意味着图式应该以任何形式被完全抛弃。 您的应用程序必须解释数据,以某种方式限制其形式——这可以通过将文档组织到集合中,通过使用验证方法创建模型来完成——但这现在是应用程序的工作。
Document-oriented databases do not reject the concept of relations, they just sometimes let applications dereference the links (CouchDB) or even have direct support for relations between documents (MongoDB). What's more important is that DODBs are schema-less. In table-based storages this property can be achieved with significant overhead (see answer by richardtallent), but here it's done more efficiently. What we really should learn when switching from a RDBMS to a DODB is to forget about tables and to start thinking about data. That's what sheepsimulator calls the "bottom-up" approach. It's an ever-evolving schema, not a predefined Procrustean bed. Of course this does not mean that schemata should be completely abandoned in any form. Your application must interpret the data, somehow constrain its form -- this can be done by organizing documents into collections, by making models with validation methods -- but this is now the application's job.
也许你应该读一下这个
http://books.couchdb.org/relax/getting-started
我自己只是听说过,很有趣,但不知道如何在现实世界的应用程序中实现它;)
may be you should read this
http://books.couchdb.org/relax/getting-started
i myself just heard it and it is interesting but have no idea how to implemented that in the real world application ;)
您可以尝试的一件事是获取 Firefox 和 Firebug 的副本,并使用 JavaScript 中的 map 和 reduce 函数。 它们实际上非常酷且有趣,并且似乎是如何在 CouchDB 中完成工作的基础,
这是 Joel 关于该主题的小文章:http://www.joelonsoftware.com/items/2006/08/01.html
One thing you can try is getting a copy of firefox and firebug, and playing with the map and reduce functions in javascript. they're actually quite cool and fun, and appear to be the basis of how to get things done in CouchDB
here's Joel's little article on the subject : http://www.joelonsoftware.com/items/2006/08/01.html