基于文档的数据库与关系数据库的优缺点
我一直在尝试看看是否可以使用基于文档的数据库(在本例中为 CouchDB)来满足一些要求。 两个通用要求:
- 具有某些字段的实体的 CRUD,这些字段在
- 电子商务 Web 应用程序(如 eBay)上具有唯一索引 (这里有更好的描述)。
我开始认为基于文档的数据库并不是满足这些要求的最佳选择。 此外,我无法想象基于文档的数据库的用途(也许我的想象力太有限)。
当我尝试使用面向文档的数据库来满足这些要求时,您能否向我解释一下,如果我在向榆树求梨?
I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:
- CRUD of entities with some fields which have unique index on it
- ecommerce web app like eBay (better description here).
And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).
Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您需要考虑如何以面向文档的方式处理应用程序。 如果您只是尝试复制如何在 RDBMS 中对问题进行建模,那么您将会失败。 您可能还需要做出不同的权衡。 ([编辑:不确定这与争论有何关系,但是:]请记住,CouchDB 的设计假设您将拥有一个由许多节点组成的活动集群,这些节点可能随时发生故障。您的应用程序将如何处理从数据库中消失的一个数据库节点)思考
它的一种方法是想象你没有任何计算机,只有纸质文档。 您将如何利用传递的纸张创建高效的业务流程? 如何避免瓶颈? 如果事情不顺利怎么办?
你应该考虑的另一个角度是最终一致性,你最终会进入一致的状态,但在一段时间内你可能会不一致。 这在 RDBMS 领域是令人厌恶的,但在现实世界中却极为常见。 规范的交易示例是从银行账户转账。 这在现实世界中实际上是如何发生的——通过单个原子交易或通过不同的银行相互发出贷记和借记通知? 当您写支票时会发生什么?
因此,让我们看一下您的示例:
如果我对 CouchDB 术语的理解正确,您想要一个文档集合,其中某些命名值保证在所有这些文档中是唯一的? 这种情况通常不受支持,因为文档可能是在不同的副本上创建的。
所以我们需要看看现实世界的问题,看看我们是否可以对其进行建模。 您真的需要它们是独一无二的吗? 您的应用程序可以处理具有相同值的多个文档吗? 您需要分配唯一的标识符吗? 你能确定地做到这一点吗? 需要这样做的常见场景是您需要唯一的顺序标识符。 在复制环境中这很难解决。 事实上,如果要求唯一 id 相对于创建时间严格顺序,那么您不可能立即需要该 id。 您需要至少放松其中一项限制。
我不知道要在这里添加什么,因为您对该帖子的最后评论是说“非常有用!谢谢”。 那里概述的方法是否缺少某些内容,仍然给您带来问题? 我认为 MrKurt 的答案相当完整,我添加了一些增强功能来减少争用。
You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)
One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?
Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?
So lets look at your examples:
If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.
So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.
I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.
是否需要对数据进行标准化?
Is there a need to normalize the data?
我也是同样的情况,我现在很喜欢couchdb,我认为整个功能风格很棒。 但我们到底什么时候开始在 ernest 中使用它们进行应用呢? 我的意思是,是的,我们都可以非常快速地开始开发应用程序,不受所有那些关于正常形式被留在路边而不使用模式的令人讨厌的困扰。 但是,套用一句话“我们站在巨人的肩膀上”。 使用 RDBMS 以及规范化和使用模式是有充分理由的。 我的老甲骨文头脑正在思考没有形式的数据。
我对 couchdb 的主要惊叹因素是复制内容和协同工作的版本控制系统。
上个月我一直在绞尽脑汁试图摸索couchdb的存储机制,显然它使用B树但不基于正常形式存储数据。 这是否意味着它真的非常聪明,并且意识到数据位是被复制的,所以我们只需创建一个指向此 B 树条目的指针?
到目前为止,我正在考虑流式传输到 base64 字符串的 xml 文档、配置文件、资源文件。
但我会使用 couchdb 来存储结构数据吗? 我不知道,任何帮助都非常感谢。
对于存储 RDF 数据甚至自由格式文本可能很有用。
I am in the same boat, I am loving couchdb at the moment, and I think that the whole functional style is great. But when exactly do we start to use them in ernest for applications. I mean, yes we can all start to develop applications extremely quickly, cruft free with all those nasty hang-ups about normal form being left in the wayside and not using schemas. But, to coin a phrase "we are standing on the shoulders of giants". There is a good reason to use RDBMS and to normalise and to use schemas. My old oracle head is reeling thinking about data without form.
My main wow factor on couchdb is the replication stuff and the versioning system working in tandem.
I have been racking my brain for the last month trying to grok the storage mechanisms of couchdb, apparently it uses B trees but doesn't store data based on normal form. Does this mean that it is really really smart and realises that bits of data are replicated so lets just make a pointer to this B tree entry?
So far I am thinking of xml documents, config files, resource files streamed to base64 strings.
But would I use couchdb for structural data. I don't know, any help greatly appreciated on this.
Might be useful in storing RDF data or even free form text.
一种可能性是拥有一个主关系数据库来存储可通过其 ID 检索的项目定义,以及一个用于这些项目的描述和/或规格的文档数据库。 例如,您可以拥有一个包含具有以下字段的产品表的关系数据库:
字段实际上包含对具有产品技术规格的文档的引用。 这样,您就可以两全其美。
A possibility is to have a main relational database that stores definitions of items that can be retrieved by their IDs, and a document database for the descriptions and/or specifications of those items. For example, you could have a relational database with a Products table with the following fields:
And that Specifications field would actually contain a reference to a document with the technical specifications of the product. This way, you have the best of both worlds.
基于文档的数据库最适合存储文档。 Lotus Notes 是一种常见的实现,Notes 电子邮件就是一个示例。 对于您所描述的电子商务、CRUD 等,关系数据库更适合存储和检索已索引的数据项/元素(而不是文档)。
Document based DBs are best suiting for storing, well, documents. Lotus Notes is a common implementation and Notes email is an example. For what you are describing, eCommerce, CRUD, etc., realtional DBs are better designed for storage and retrieval of data items/elements that are indexed (as opposed to documents).
Re CRUD:整个 REST 范式直接映射到 CRUD(反之亦然)。 因此,如果您知道可以使用资源(可通过 URI 识别)和一组基本操作(即 CRUD)对您的需求进行建模,那么您可能非常接近基于 REST 的系统,许多面向文档的系统都提供了这种系统盒子的。
Re CRUD: the whole REST paradigm maps directly to CRUD (or vice versa). So if you know that you can model your requirements with resources (identifiable via URIs) and a basic set of operations (namely CRUD), you may be very near to a REST-based system, which quite a few document-oriented systems provide out of the box.