使用 NoSQL 对于非分布式系统有意义吗? (试图理解最终一致性)
在过去的两天里,我一直在阅读和学习 NoSQL 和 MongoDB、CouchDB 等,但我仍然无法判断这是否是适合我的存储类型。
让我担心的是最终的一致性问题。这种一致性只有在使用集群时才会出现吗? (我将我的网站托管在一个专用服务器中,所以我不知道我是否可以从 NoSQL 中受益)对于哪种类型的应用程序可以实现最终一致性(而不是 ACID),对于哪些应用程序则不可以。 t?你能给我一些例子吗?在可以实现最终一致性的应用程序中,最糟糕的情况是什么?
我读到的另一件事是 MongoDB 在内存中保存了很多东西。在文档中,它提到了 32 位系统的数据限制为 2GB。是因为32位系统的内存限制吗?
I have been reading and learning about NoSQL and MongoDB, CouchDB, etc, for the last two days, but I still can't tell if this is the right kind of storage for me.
What worries me is the eventual consistency thing. Does that type of consistency only kick in when using clusters? (I'm hosting my sites in a single dedicated server, so I don't know if I can benefit from NoSQL) For which kind of applications is OK to have eventual consistency (instead of ACID), and for which ones it isn't? Can you give me some examples? What's the worst thing that can happen in an application for which is OK to have eventual consistency?
Another thing that I read is that MongoDB keeps a lot of things in memory. In the docs it says something about 32-bit systems having a 2gb limit of data. Is that because of the ram limitation for 32-bit systems?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我只能代表 CouchDB,但没有必要在最终一致性和 ACID 之间进行选择,它们不属于同一类别。
CouchDB 是完全 ACID 的。文档更新是原子的、一致的、隔离的和持久的(使用CouchDB推荐的delayed_commits = false生产设置,您的更新将在返回201成功代码之前刷新到磁盘)。 CouchDB 不提供的是多项目事务(因为当项目存储在单独的服务器中时,这些事务很难扩展)。 “事务”和“ACID”之间的混淆令人遗憾,但可以原谅,因为典型的 RDBMS 通常支持两者。
最终一致性是关于数据库副本如何聚合在同一数据集上。考虑传统 RDBMS 中的主从设置。该关系的某些配置将使用分布式事务机制,以便主设备和从设备始终处于锁步状态。然而,出于性能原因,通常会放宽这一点。主站可以在本地进行交易,然后通过交易日志将它们延迟转发到从站。这也是“最终一致性”,当日志完全耗尽时,两个服务器将汇聚在同一数据集上。 CouchDB 更进一步,消除了主从之间的区别。也就是说,CouchDB 服务器可以被视为平等的对等体,在任何主机上所做的更改都可以正确复制到其他主机。
最终一致性的技巧在于如何处理不同主机上同一项目的更新。在 CouchDB 中,这些单独的更新被检测为同一项目上的“冲突”,并且复制可确保所有冲突的更新都存在于所有主机上。然后,CouchDB 选择其中之一作为当前版本。可以通过删除不想保留的冲突来修改此选择。
I can speak only for CouchDB but there is no need to choose between eventual consistency and ACID, they are not in the same category.
CouchDB is fully ACID. A document update is atomic, consistent, isolated and durable (using CouchDB's recommended production setting of delayed_commits=false, your update is flushed to disk before the 201 success code is returned). What CouchDB does not provide is multi-item transactions (since these are very hard to scale when the items are stored in separate servers). The confusion between 'transaction' and 'ACID' is regrettable but excusable given that typical RDBMS's usually support both.
Eventual consistency is about how database replicas converge on the same data set. Consider a master-slave setup in a traditional RDBMS. Some configurations of that relationship will use a distributed transaction mechanism, such that both master and slave are always in lock-step. However, it is common to relax this for performance reasons. The master can make transactions locally and then forward them lazily to the slave via a transaction journal. This is also 'eventual consistency', the two servers will converge on the same data set when the journal is fully drained. CouchDB goes further and removes the distinction between master and slaves. That is, CouchDB servers can be treated as equal peers, with changes made at any host being correctly replicated to the others.
The trick to eventual consistency is in how updates to the same item at different hosts are handled. In CouchDB, these separate updates are detected as 'conflicts' on the same item, and replication ensures that all of conflicting updates are present at all hosts. CouchDB then chooses one of these to present as the current revision. This choice can be revised by deleting the conflicts one doesn't want to keep.
NoSQL 数据库解决了一个问题设置传统 RDMS 很难解决的问题。如果您遇到任何问题,NoSQL 可能是
适合您的存储
。当您可能会读回刚刚保存的数据的不同/上一版本时,最终一致性就会“生效”。例如:
您将同一条数据持久化到多个位置,假设 A 和 B。根据配置,持久化操作可能会在仅持久化到 A(而不是 B)后返回然而 )。之后,您从 B 读取该数据,但该数据尚未存在。 最终它会在那里,但不幸的是当你读回它时它不会
不行
=>您有一个家庭银行账户,里面有 100 美元的可用资金。现在,您和您的配偶尝试同时(在不同的商店)购买 100 美元的商品。如果银行使用“最终一致性”模型来实现这一点,例如在多个节点上,您的配偶可能会在您花完所有金额后的几毫秒内花掉 100 美元。对于银行来说,这并不是一个好日子。确定
=>您在 Twitter 上有 10000 名关注者。你发推文“嘿,今晚谁想做一些黑客活动?”。 100% 一致性意味着所有这 10000 人都会同时收到您的邀请。但如果约翰在玛丽看到你的推文后 2 秒看到你的推文,那么真的不会发生什么坏事。例如,节点 A 获取数据和节点 B 获取相同数据之间存在巨大的延迟 [它们是同步的]。如果 NoSQL 解决方案是可靠的,那将是最糟糕的事情。
来自 MongoDB 文档:
“MongoDB 是一个在 Linux、Windows 和 OS X 上运行的服务器进程。它可以作为 32 位或 64 位系统运行-位应用程序。我们建议在 64 位模式下运行,因为 Mongo 在 32 位模式下的总数据大小限制为 2GB 左右。”
NoSQL databases solve a set of problems, that are hard(er) to solve with traditional RDMS. NoSQL can be
the right storage for you
if any of your problems are in that set.Eventual consistency "kicks in" when you might read back different/previous version of data from the one that was just persisted. For example:
You persist the same piece of data into MORE THAN ONE location, let's say A and B. Depending on the configuration, a persist operation may return after only persisting to A ( and not to B just yet ). Right after that you read that data from B, which is not yet there. Eventually it will be there, but unfortunately not when you read it back
NOT OK
=> You have a family bank account which has a $100 available. Now you and your spouse try to buy something at the same time (at different stores) for $100. If the bank had this implemented with "eventual consistency" model, over more than one node for example, your spouse could have spent $100 a couple of milliseconds after you already spent all of it. Would not be exactly a good day for the bank.OK
=> You have 10000 followers on Twitter. You tweeted "Hey who wants to do some hacking tonight?". 100% consistency would mean that ALL those 10000 would receive your invitation at the same time. But nothing bad would really happen, if John saw your tweet 2 seconds after Mary did.A huge latency between e.g. when node A gets the data, and node B gets the same data [they are in sync]. If NoSQL solution is any solid, that would be the worse thing that can happen.
from MongoDB docs:
"MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application. We recommend running in 64-bit mode, since Mongo is limited to a total data size of about 2GB for all databases in 32-bit mode."
Brewers CAP 定理是了解哪些选项的最佳来源对你有用。我可以说这一切都取决于,但如果我们谈论 Mongo,那么它提供了开箱即用的水平可扩展性,并且在某些情况下它总是很好。
现在关于一致性。实际上,您可以选择三个选项来保持数据最新:
1)首先要考虑的是“安全”模式或“getLastError()”,如 Andreas 所示。如果您发出“安全”写入,您就知道数据库已收到插入并应用了写入。但是,MongoDB 每 60 秒才刷新到磁盘,因此如果磁盘上没有数据,服务器可能会发生故障。
2)第二件事要考虑的是“日记”(v1.8+)。打开日志功能后,数据每 100 毫秒刷新到日志中。因此,您在失败之前的时间窗口会更短。驱动程序有一个“fsync”选项(检查该名称),它比“安全”更进一步,它等待数据已刷新到磁盘(即日志文件)的确认。但是,这仅涵盖一台服务器。如果服务器上的硬盘坏了会发生什么?那么你需要第二份副本。
3)第三件事要考虑的是复制。驱动程序支持“W”参数,表示在返回之前“将此数据复制到 N 个节点”。如果写入在某个超时之前没有到达“N”个节点,则写入失败(抛出异常)。但是,您必须根据副本集中的节点数量正确配置“W”。同样,由于硬盘驱动器可能会发生故障,即使有日志功能,您也需要考虑复制。然后是跨数据中心的复制,这里篇幅太长了。最后要考虑的是您的“回滚”要求。据我了解,MongoDB不具备这种“回滚”能力。如果您正在进行批量插入,您将获得的最佳结果是哪些元素失败的指示。
无论如何,在很多情况下,数据一致性成为开发人员的责任,您需要小心并包括所有场景并调整数据库模式,因为 Mongo 中没有像我们这样的“这是正确的方法”用于 RDB-s 中。
关于内存——这完全是一个性能问题,MongoDB 将索引和“工作集”保存在 RAM 中。通过限制你的内存,你就限制了你的工作集。实际上,您可以拥有 SSD 和少量 RAM,而不是大量 RAM 和 HDD - 至少这些是官方建议。无论如何,这个问题是个人的,您应该针对您的特定用例进行性能测试
Brewers CAP theorem is the best source to understand what are the options which are availbale to you. I can say that it all depends but if we talk about Mongo then it provides with the horizontally scalability out of the box and it is always nice in some situations.
Now about consistency. Actually you have three options of keeping your data up-to-date:
1)First thing to consider is "safe" mode or "getLastError()" as indicated by Andreas. If you issue a "safe" write, you know that the database has received the insert and applied the write. However, MongoDB only flushes to disk every 60 seconds, so the server can fail without the data on disk.
2) Second thing to consider is "journaling" (v1.8+). With journaling turned on, data is flushed to the journal every 100ms. So you have a smaller window of time before failure. The drivers have an "fsync" option (check that name) that goes one step further than "safe", it waits for acknowledgement that the data has be flushed to the disk (i.e. the journal file). However, this only covers one server. What happens if the hard drive on the server just dies? Well you need a second copy.
3)Third thing to consider is replication. The drivers support a "W" parameter that says "replicate this data to N nodes" before returning. If the write does not reach "N" nodes before a certain timeout, then the write fails (exception is thrown). However, you have to configure "W" correctly based on the number of nodes in your replica set. Again, because a hard drive could fail, even with journaling, you'll want to look at replication. Then there's replication across data centers which is too long to get into here. The last thing to consider is your requirement to "roll back". From my understanding, MongoDB does not have this "roll back" capacity. If you're doing a batch insert the best you'll get is an indication of which elements failed.
Anyhow there are a lot of scenarios when data consistency becomes developer's responsibility and it is up to you to be careful and include all the scenarios and adjust the DB schema because there is no "This is the right way to do it" in Mongo like we are used to in RDB-s.
About memory - this is totally a performance question, MongoDB keeps indexes and "working set" in RAM. By limiting your RAM your limit your working set. You can actually have an SSD and smaller amount of RAM rather than huge ammount of RAM and a HDD - at least these are official recommendations. Anyhow this question is individual, you should do the performance tests for your specific use cases