MongoDB 在 v4 之前不符合 ACID 的真正含义是什么?
我不是数据库专家,也没有正式的计算机科学背景,所以请耐心等待。我想知道如果您使用旧的MongoDB,现实世界可能会发生哪些负面事情v4 之前的版本,不符合 ACID 要求。这适用于任何不符合 ACID 的数据库。
我知道 MongoDB 可以执行原子操作,但它们不“支持传统锁定和复杂事务”,主要是出于性能原因。我还了解数据库事务的重要性,以及当您的数据库用于银行时的示例,并且您正在更新所有需要同步的几条记录,您希望事务恢复到初始状态(如果有) :
但是当我开始谈论 MongoDB 时,我们这些不知道数据库实际实现方式的技术细节的人开始抛出这样的说法
MongoDB 比 MySQL 和 Postgres 快得多,但它“无法正确保存”的可能性很小,比如百万分之一。
“无法正确保存”部分指的是这样的理解:如果在您写入 MongoDB 的瞬间发生断电,则有可能出现特定记录(假设您正在跟踪具有 10 个属性的文档中的页面浏览量)每个),其中一个文档仅保存了 5 个属性……这意味着随着时间的推移,您的浏览量计数器将“稍微”关闭。你永远不会知道有多少,你知道它们的正确率是 99.999%,但不是 100%。这是因为,除非您专门将其设为 mongodb 原子操作,否则该操作是不保证是原子的。
所以我的问题是,什么时候以及为什么 MongoDB 可能无法“正确保存”的正确解释是什么?它不满足 ACID 的哪些部分,在什么情况下,以及您如何知道这 0.001% 的数据何时出现问题?难道不能以某种方式解决这个问题吗?如果不是,这似乎意味着您不应该在 MongoDB 中存储诸如 users
表之类的内容,因为记录可能无法保存。但话又说回来,那 1/1,000,000 用户可能只需要“尝试再次注册”,不是吗?
我只是在寻找一个列表,其中列出了像 MongoDB 这样的不符合 ACID 的数据库何时/为何会发生负面情况,理想情况下是否有标准的解决方法(例如运行后台作业来清理数据,或者仅使用 SQL 来执行此操作,等等) 。
I am not a database expert and have no formal computer science background, so bear with me. I want to know the kinds of real world negative things that can happen if you use an old MongoDB version prior to v4, which were not ACID compliant. This applies to any ACID noncompliant database.
I understand that MongoDB can perform Atomic Operations, but that they don't "support traditional locking and complex transactions", mostly for performance reasons. I also understand the importance of database transactions, and the example of when your database is for a bank, and you're updating several records that all need to be in sync, you want the transaction to revert back to the initial state if there's a power outage so credit equals purchase, etc.
But when I get into conversations about MongoDB, those of us that don't know the technical details of how databases are actually implemented start throwing around statements like:
MongoDB is way faster than MySQL and Postgres, but there's a tiny chance, like 1 in a million, that it "won't save correctly".
That "won't save correctly" part is referring to this understanding: If there's a power outage right at the instant you're writing to MongoDB, there's a chance for a particular record (say you're tracking pageviews in documents with 10 attributes each), that one of the documents only saved 5 of the attributes… which means over time your pageview counters are going to be "slightly" off. You'll never know by how much, you know they'll be 99.999% correct, but not 100%. This is because, unless you specifically made this a mongodb atomic operation, the operation is not guaranteed to have been atomic.
So my question is, what is the correct interpretation of when and why MongoDB may not "save correctly"? What parts of ACID does it not satisfy, and under what circumstances, and how do you know when that 0.001% of your data is off? Can't this be fixed somehow? If not, this seems to mean that you shouldn't store things like your users
table in MongoDB, because a record might not save. But then again, that 1/1,000,000 user might just need to "try signing up again", no?
I am just looking for maybe a list of when/why negative things happen with an ACID noncompliant database like MongoDB, and ideally if there's a standard workaround (like run a background job to cleanup data, or only use SQL for this, etc.).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
MongoDB 不符合 ACID 的说法实际上是不正确的。相反,MongoDB 在文档级别符合 ACID。
对单个文档的任何更新都是
MongoDB 没有的是事务——也就是说,可以回滚并且符合 ACID 的多文档更新。
请注意,您可以通过 使用两阶段提交。
It's actually not correct that MongoDB is not ACID-compliant. On the contrary, MongoDB is ACID-compilant at the document level.
Any update to a single document is
What MongoDB doesn't have is transactions -- that is, multiple-document updates that can be rolled back and are ACID-compliant.
Note that you can build transactions on top of the ACID-compliant updates to a single document, by using two-phase commit.
使用 MongoDB 会丢失的一件事是多集合(表)事务。 MongoDB 中的原子修饰符只能对单个文档起作用。
如果您需要从库存中删除商品并同时将其添加到某人的订单中 - 您不能这样做。除非这两件事——库存和订单——存在于同一个文档中(它们可能不存在)。
我在我正在开发的应用程序中遇到了同样的问题,并且有两种可能的解决方案可供选择:
1)尽可能地构造文档并尽可能地使用原子修饰符,对于剩余的部分,使用后台进程来清理可能不同步的记录。例如,我从库存中删除项目,并使用原子修饰符将它们添加到同一文档的reservedInventory 数组中。
这让我始终知道库存中没有商品(因为它们是由客户预订的)。当客户结账时,我会从预留库存中删除这些商品。这不是标准交易,由于客户可能会放弃购物车,因此我需要一些后台流程来完成并找到放弃的购物车并将保留的库存移回到可用库存池中。
这显然不太理想,但它是大型应用程序中 mongodb 不能完全满足需求的唯一部分。另外,到目前为止它工作完美。这对于许多场景来说可能是不可能的,但由于我正在使用的文档结构,它非常适合。
2)将事务数据库与MongoDB结合使用。通常使用 MySQL 为绝对需要的事物提供事务,同时让 MongoDB(或任何其他 NoSQL)做它最擅长的事情。
如果我的#1 解决方案从长远来看不起作用,我将进一步研究如何将 MongoDB 与 MySQL 结合起来,但目前#1 很适合我的需求。
One thing you lose with MongoDB is multi-collection (table) transactions. Atomic modifiers in MongoDB can only work against a single document.
If you need to remove an item from inventory and add it to someone's order at the same time - you can't. Unless those two things - inventory and orders - exist in the same document (which they probably do not).
I encountered this very same issue in an application I am working on and had two possible solutions to choose from:
1) Structure your documents as best you can and use atomic modifiers as best you can and for the remaining bit, use a background process to cleanup records that may be out of sync. For example, I remove items from inventory and add them to a reservedInventory array of the same document using atomic modifiers.
This lets me always know that items are NOT available in the inventory (because they are reserved by a customer). When the customer check's out, I then remove the items from the reservedInventory. Its not a standard transaction and since the customer could abandon the cart, I need some background process to go through and find abandoned carts and move the reserved inventory back into the available inventory pool.
This is obviously less than ideal, but its the only part of a large application where mongodb does not fit the need perfectly. Plus, it works flawlessly thus far. This may not be possible for many scenarios, but because of the document structure I am using, it fits well.
2) Use a transactional database in conjunction with MongoDB. It is common to use MySQL to provide transactions for the things that absolutely need them while letting MongoDB (or any other NoSQL) do what it does best.
If my solution from #1 does not work in the long run, I will investigate further into combining MongoDB with MySQL but for now #1 suits my needs well.
“星巴克不使用两阶段提交” 中有一个很好的解释。
这与 NoSQL 数据库无关,但它确实说明了有时您可以承受丢失事务或使数据库暂时处于不一致状态的情况。
我不认为这是需要“修复”的事情。解决方法是使用符合 ACID 的关系数据库。当 NoSQL 的行为满足您的应用程序要求时,您可以选择它。
A good explanation is contained in "Starbucks Does Not Use Two Phase Commit".
It's not about NoSQL databases, but it does illustrate the point that sometimes you can afford to lose a transaction or have your database in an inconsistent state temporarily.
I wouldn't consider it to be something that needs to be "fixed". The fix is to use an ACID-compliant relational database. You choose a NoSQL alternative when its behavior meets your application requirements.
从 MongoDB v4.0 开始,将支持多文档 ACID 事务。通过快照隔离,事务将提供全局一致的数据视图,并强制执行“全有或全无”以维护数据完整性。
它们感觉像是来自关系世界的交易,例如:
参见 https:// /www.mongodb.com/blog/post/multi-document-transactions-in-mongodb
As of MongoDB v4.0, multi-document ACID transactions are to be supported. Through snapshot isolation, transactions will provide a globally consistent view of data, and enforce all-or-nothing execution to maintain data integrity.
They feel like transactions from the relational world, e.g.:
See https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb
我认为其他人已经给出了很好的答案。
不过我想补充一点,有 ACID NOSQL DB(例如 http://ravendb.net/ )。所以这不仅仅是 NOSQL 的决定 - 无 ACID 与有 ACID 的关系型数据库......
I think other people gave good answers already.
However i would like to add that there are ACID NOSQL DBs (like http://ravendb.net/ ). So it is not only decision NOSQL - no ACID vs Relational with ACID....
“无法正确保存”可能意味着:
默认情况下,MongoDB 不会立即将更改保存到驱动器。所以有可能你告诉用户“更新成功”,却发生断电,更新丢失。 MongoDB 提供了控制更新“持久性”级别的选项。它可以等待其他副本接收此更新(在内存中),等待写入本地日志文件等。
对多个集合甚至多个集合进行简单的“原子”更新并不容易同一集合中的文档。在大多数情况下这不是问题,因为可以通过 两阶段提交,或重组您的架构,以便对单个文档进行更新。请参阅此问题:文档数据库:冗余数据、引用等(MongoDB具体来说)
"won't save correctly" could mean:
By default MongoDB does not save your changes to the drive immediately. So there is a possibility that you tell a user "update is successful", power outage happens and the update is lost. MongoDB provides options to control level of update "durability". It can wait for the other replica(s) to receive this update (in memory), wait for the write to happen to the local journal file, etc.
There is no easy "atomic" updates to multiple collections and even multiple documents in the same collection. It's not a problem in most cases because it can be circumvented with Two Phase Commit, or restructuring your schema so updates are made to a single document. See this question: Document Databases: Redundant data, references, etc. (MongoDB specifically)
请阅读ACID 属性以获得更好的理解。
另外,在 MongoDB 文档中,您可以找到问题和解答。
A
tomic。它不符合我们从关系数据库系统中了解到的原子定义,特别是上面的链接。从这个意义上说,MongoDB 不符合 ACID 中的 A。C
onsitent 的。但是,您可以从副本集中的辅助服务器读取。在这种情况下,您只能拥有最终一致性。如果您不介意阅读稍微过时的数据,这很有用。
D
耐用性 - 您可以使用writeconcern
选项配置此行为,但不确定。也许有人更了解。我相信一些研究正在进行中,旨在将 NoSQL 转向 ACID 约束或类似约束。这是一个挑战,因为 NoSQL 数据库通常速度很快,而 ACID 约束会显着降低性能。
Please read about the ACID properties to gain better understanding.
Also in the MongoDB documentation you can find a question and answer.
A
tomic on document level only. It does not comply with the definition of atomic that we know from relational database systems, in particular the link above. In this sense MongoDB does not comply with the A from ACID.C
onsitent by default.However, you can read from secondary servers in a replica set. You can only have eventual consistency in this case. This is useful if you don't mind to read slightly outdated data.
I
solation (again according to above definition):D
urability - you can configure this behaviour with thewrite concern
option, not sure though. Maybe someone knows better.I believe some research is ongoing to move NoSQL towards ACID constraints or similar. This is a challenge because NoSQL databases are usually fast(er) and ACID constraints can slow down performance significantly.
对单个集合进行原子修改工作的唯一原因是 mongodb 开发人员最近用集合范围的写锁交换了数据库锁。认为增加并发性是值得进行权衡的。从本质上讲,mongodb 是一个内存映射文件:它们将缓冲池管理委托给机器的虚拟机子系统。因为它总是在内存中,所以他们能够摆脱非常粗粒度的锁:您将在持有它的同时执行仅内存中的操作,这将非常快。这与传统的数据库系统有很大不同,传统的数据库系统有时被迫在持有页锁或行锁的同时执行 I/O。
The only reason atomic modifies work against a single-collection is because the mongodb developers recently exchanged a database lock with a collection wide write-lock. Deciding that the increased concurrency here was worth the trade-off. At it's core, mongodb is a memory-mapped file: they've delegated the buffer-pool management to the machine's vm subsystem. Because it's always in memory, they're able to get away with very course grained locks: you'll be performing in-memory only operations while holding it, which will be extremely fast. This differs significantly from a traditional database system which is sometimes forced to perform I/O while holding a pagelock or a rowlock.
“在 MongoDB 中,对单个文档的操作是原子的” - 这就是过去
在新版本的 MongoDB 4.0 中,您可以:
尽管对于执行如何和执行什么操作有一些限制。
检查 Mongo 文档。
https://docs.mongodb.com/master/core/transactions/
"In MongoDB, an operation on a single document is atomic" - That's the thing for past
In the new version of MongoDB 4.0 you CAN :
Though there are few limitations for How and What operations can be performed.
Check the Mongo Doc.
https://docs.mongodb.com/master/core/transactions/
如果您的存储支持每键线性化以及比较和设置(对于 MongoDB 来说是这样),您可以在客户端实现原子多键更新(可序列化事务)。此方法用于 Google 的 Percolator 和 < a href="https://www.cockroachlabs.com/blog/how-cockroachdb-distributes-atomic-transactions" rel="nofollow">CockroachDB 但没有什么可以阻止您使用与 MongoDB 一起使用。
我已经创建了此类的分步可视化交易。我希望它能帮助你理解它们。
如果您对读提交隔离级别感到满意,那么查看 RAMP 事务是有意义的彼得·拜里斯 (Peter Bailis) 着。它们也可以在客户端为 MongoDB 实现。
You can implement atomic multi-key updates (serializable transaction) on the client side if your storage supports per key linearizability and compare and set (which is true for MongoDB). This approach is used in Google's Percolator and in the CockroachDB but nothing prevents you from using it with MongoDB.
I've created a step-by-step visualization of such transactions. I hope it will help you to understand them.
If you're fine with read committed isolation level then it makes sense to take a look on RAMP transactions by Peter Bailis. They also can be implemented for MongoDB on the client side.