如何解决 MongoDB 中缺少事务的问题?
我知道这里也有类似的问题,但它们要么告诉我如果我需要事务,就切换回常规 RDBMS 系统或使用 原子操作 或 两阶段提交。第二种解决方案似乎是最好的选择。第三个我不想遵循,因为似乎很多事情都可能出错,而且我无法在每个方面都进行测试。我很难重构我的项目来执行原子操作。我不知道这是否来自我有限的观点(到目前为止我只使用过 SQL 数据库),或者它是否实际上无法完成。
我们想在我们公司试点测试 MongoDB。我们选择了一个相对简单的项目——短信网关。它允许我们的软件向蜂窝网络发送短信,而网关则完成肮脏的工作:实际上通过不同的通信协议与提供商进行通信。网关还管理消息的计费。每个申请该服务的客户都必须购买一些积分。发送消息时系统会自动减少用户的余额,如果余额不足则拒绝访问。此外,由于我们是第三方短信提供商的客户,我们也可能在他们那里有自己的余额。我们也必须跟踪这些。
我开始考虑如果降低一些复杂性(外部计费、排队短信发送),如何使用 MongoDB 存储所需的数据。来自 SQL 领域的我会为用户创建一个单独的表,另一个用于 SMS 消息的表,一个用于存储有关用户余额的交易的表。假设我为 MongoDB 中的所有集合创建了单独的集合。
想象一下,在这个简化的系统中,短信发送任务包含以下步骤:
检查用户是否有足够的余额;如果没有足够的信用
发送消息并将其存储在 SMS 集合中,并附上详细信息和费用(在实时系统中,消息将具有
status
属性,并且任务会拾取它用于发送并根据其当前状态设置 SMS 的价格)根据发送消息的费用减少用户的余额
在交易集合中记录交易
p>
现在有什么问题吗? MongoDB 只能对一个文档进行原子更新。在前面的流程中,可能会发生某种错误,消息被存储在数据库中,但用户的余额未更新和/或交易未记录。
我提出了两个想法:
为用户创建一个集合,并将余额存储为字段,将用户相关的交易和消息存储为用户文档中的子文档。因为我们可以原子地更新文档,这实际上解决了事务问题。缺点:如果用户发送大量短信,文档的大小可能会变大,并且可能会达到 4MB 的文档限制。也许我可以在这种情况下创建历史文档,但我认为这不是一个好主意。另外,我不知道如果我将越来越多的数据推送到同一个大文档中,系统的速度会有多快。
为用户创建一个集合,为交易创建一个集合。可以有两种交易:余额变化为正的信用购买和余额变化为负的消息发送。交易可能有子文档;例如,在发送的消息中,SMS 的详细信息可以嵌入到交易中。缺点:我不存储当前的用户余额,因此每次用户尝试发送消息时我都必须计算它以判断消息是否可以通过。恐怕随着存储的事务数量的增长,这一计算会变得很慢。
我对选择哪种方法有点困惑。还有其他解决方案吗?我在网上找不到任何有关如何解决此类问题的最佳实践。我想很多试图熟悉NoSQL世界的程序员一开始都面临着类似的问题。
I know there are similar questions here but they are either telling me to switch back to regular RDBMS systems if I need transactions or use atomic operations or two-phase commit. The second solution seems the best choice. The third I don't wish to follow because it seems that many things could go wrong and I can't test it in every aspect. I'm having a hard time refactoring my project to perform atomic operations. I don't know whether this comes from my limited viewpoint (I have only worked with SQL databases so far), or whether it actually can't be done.
We would like to pilot test MongoDB at our company. We have chosen a relatively simple project - an SMS gateway. It allows our software to send SMS messages to the cellular network and the gateway does the dirty work: actually communicating with the providers via different communication protocols. The gateway also manages the billing of the messages. Every customer who applies for the service has to buy some credits. The system automatically decreases the user's balance when a message is sent and denies the access if the balance is insufficient. Also because we are customers of third party SMS providers, we may also have our own balances with them. We have to keep track of those as well.
I started thinking about how I can store the required data with MongoDB if I cut down some complexity (external billing, queued SMS sending). Coming from the SQL world, I would create a separate table for users, another one for SMS messages, and one for storing the transactions regarding the users' balance. Let's say I create separate collections for all of those in MongoDB.
Imagine an SMS sending task with the following steps in this simplified system:
check if the user has sufficient balance; deny access if there's not enough credit
send and store the message in the SMS collection with the details and cost (in the live system the message would have a
status
attribute and a task would pick up it for delivery and set the price of the SMS according to its current state)decrease the users's balance by the cost of the sent message
log the transaction in the transaction collection
Now what's the problem with that? MongoDB can do atomic updates only on one document. In the previous flow it could happen that some kind of error creeps in and the message gets stored in the database but the user's balance is not updated and/or the transaction is not logged.
I came up with two ideas:
Create a single collection for the users, and store the balance as a field, user related transactions and messages as sub documents in the user's document. Because we can update documents atomically, this actually solves the transaction problem. Disadvantages: if the user sends many SMS messages, the size of the document could become large and the 4MB document limit could be reached. Maybe I can create history documents in such scenarios, but I don't think this would be a good idea. Also I don't know how fast the system would be if I push more and more data to the same big document.
Create one collection for users, and one for transactions. There can be two kinds of transactions: credit purchase with positive balance change and messages sent with negative balance change. Transaction may have a subdocument; for example in messages sent the details of the SMS can be embedded in the transaction. Disadvantages: I don't store the current user balance so I have to calculate it every time a user tries to send a message to tell if the message could go through or not. I'm afraid this calculation can became slow as the number of stored transactions grows.
I'm a little bit confused about which method to pick. Are there other solutions? I couldn't find any best practices online about how to work around these kinds of problems. I guess many programmers who are trying to become familiar with the NoSQL world are facing similar problems in the beginning.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
从 4.0 开始,MongoDB 将具有多文档 ACID 事务。该计划是首先启用副本集部署中的那些,然后是分片集群。 MongoDB 中的事务就像开发人员熟悉的关系数据库中的事务一样 - 它们将是多语句,具有相似的语义和语法(例如
start_transaction
和commit_transaction
)。重要的是,对 MongoDB 进行的启用事务的更改不会影响不需要事务的工作负载的性能。有关更多详细信息,请参阅此处。
拥有分布式事务并不意味着您应该像表格关系数据库一样对数据进行建模。拥抱文档模型的力量并遵循良好和推荐的实践的数据建模。
As of 4.0, MongoDB will have multi-document ACID transactions. The plan is to enable those in replica set deployments first, followed by the sharded clusters. Transactions in MongoDB will feel just like transactions developers are familiar with from relational databases - they'll be multi-statement, with similar semantics and syntax (like
start_transaction
andcommit_transaction
). Importantly, the changes to MongoDB that enable transactions do not impact performance for workloads that do not require them.For more details see here.
Having distributed transactions, doesn't mean that you should model your data like in tabular relational databases. Embrace the power of the document model and follow the good and recommended practices of data modeling.
检查这个,作者:Tokutek 。他们为 Mongo 开发了一个插件,不仅可以实现交易,还可以提高性能。
Check this out, by Tokutek. They develop a plugin for Mongo that promises not only transactions but also a boosting in performance.
言归正传:如果事务完整性是必须,那么就不要使用 MongoDB,而只使用系统中支持事务的组件。在组件之上构建一些东西以便为不符合 ACID 的组件提供类似 ACID 的功能是极其困难的。根据各个用例,以某种方式将操作分为事务性操作和非事务性操作可能是有意义的......
Bring it to the point: if transactional integrity is a must then don't use MongoDB but use only components in the system supporting transactions. It is extremely hard to build something on top of component in order to provide ACID-similar functionality for non-ACID compliant components. Depending on the individual usecases it may make sense to separate actions into transactional and non-transactional actions in some way...
这其实并不是一个问题。您提到的错误要么是逻辑错误(bug),要么是IO错误(网络、磁盘故障)。此类错误可能会使无事务存储和事务存储处于不一致状态。例如,如果它已经发送了短信,但在存储消息时发生错误 - 它无法回滚短信发送,这意味着它不会被记录,用户余额不会减少等。
这里真正的问题是用户可以利用竞争条件并发送比其余额允许的更多的消息。这也适用于 RDBMS,除非您使用余额字段锁定在事务内发送短信(这将是一个很大的瓶颈)。作为 MongoDB 的一个可能的解决方案,首先使用
findAndModify
来减少余额并检查它,如果余额为负,则不允许发送并退款(原子增量)。如果是,则继续发送,如果发送失败,则退还金额。还可以维护余额历史记录集合以帮助修复/验证余额字段。This is not really a problem. The error you mentioned is either a logical (bug) or IO error (network, disk failure). Such kind of error can leave both transactionless and transactional stores in non-consistent state. For example, if it has already sent SMS but while storing message error occurred - it can't rollback SMS sending, which means it won't be logged, user balance won't be reduced etc.
The real problem here is the user can take advantage of race condition and send more messages than his balance allows. This also applies to RDBMS, unless you do SMS sending inside transaction with balance field locking (which would be a great bottleneck). As a possible solution for MongoDB would be using
findAndModify
first to reduce the balance and check it, if it's negative disallow sending and refund the amount (atomic increment). If positive, continue sending and in case it fails refund the amount. The balance history collection can be also maintained to help fix/verify balance field.项目很简单,但是你必须支持支付交易,这让整个事情变得困难。因此,例如,具有数百个集合(论坛、聊天、广告等)的复杂门户系统在某些方面更简单,因为如果您丢失了论坛或聊天条目,没有人真正关心。另一方面,如果您丢失了一笔付款交易,那就是一个严重的问题。
因此,如果您确实想要一个 MongoDB 试点项目,请选择一个在这方面简单的项目。
The project is simple, but you have to support transactions for payment, which makes the whole thing difficult. So, for example, a complex portal system with hundreds of collections (forum, chat, ads, etc...) is in some respect simpler, because if you lose a forum or chat entry, nobody really cares. If you, on the otherhand, lose a payment transaction that's a serious issue.
So, if you really want a pilot project for MongoDB, choose one which is simple in that respect.
出于正当原因,MongoDB 中不存在事务。这是让 MongoDB 更快的因素之一。
就您而言,如果必须进行交易,那么 mongo 似乎不适合。
可能是 RDMBS + MongoDB,但这会增加复杂性,并使管理和支持应用程序变得更加困难。
Transactions are absent in MongoDB for valid reasons. This is one of those things that make MongoDB faster.
In your case, if transaction is a must, mongo seems not a good fit.
May be RDMBS + MongoDB, but that will add complexities and will make it harder to manage and support application.
这可能是我发现的关于为 mongodb 实现事务类功能的最好的博客。!
同步标志:最适合从主文档复制数据
作业队列:非常通用,可以解决 95% 的情况。无论如何,大多数系统都需要至少有一个作业队列!
两阶段提交:此技术确保每个实体始终拥有达到一致状态所需的所有信息
日志协调:最强大的技术,非常适合金融系统
版本控制:提供隔离并支持复杂的结构
阅读此内容以获取更多信息:https://dzone.com/articles/how-implement-robust-and
This is probably the best blog I found regarding implementing transaction like feature for mongodb .!
Syncing Flag: best for just copying data over from a master document
Job Queue: very general purpose, solves 95% of cases. Most systems need to have at least one job queue around anyway!
Two Phase Commit: this technique ensure that each entity always has all information needed to get to a consistent state
Log Reconciliation: the most robust technique, ideal for financial systems
Versioning: provides isolation and supports complex structures
Read this for more info: https://dzone.com/articles/how-implement-robust-and
虽然已经晚了,但我认为这对将来会有帮助。我使用 Redis 来制作 队列来解决这个问题。
要求:
下图显示了 2 个操作需要同时执行,但操作 1 的阶段 2 和阶段 3 需要在操作 2 的阶段 2 开始之前完成或相反(阶段可以是请求 REST api、数据库请求或执行 javascript 代码...)。
队列如何帮助您
队列确保许多函数中
lock()
和release()
之间的每个块代码不会同时运行,使它们隔离。<块引用>
如何构建队列
我将只关注在后端站点上构建队列时如何避免竞争条件部分。如果您不了解队列的基本概念,请访问此处。
下面的代码仅展示了概念,您需要以正确的方式实现。
<块引用>
但是您需要
isRunning()
setStateToRelease()
setStateToRunning()
隔离它的自身,否则您将再次面临竞争条件。为此,我选择 Redis 来实现 ACID 目的且可扩展。Redis 文档谈论它的事务:
P/s:
我使用 Redis 是因为我的服务已经使用它,您可以使用任何其他支持隔离的方式来做到这一点。
我的代码中的
action_domain
位于上面,当您只需要用户 A 调用操作 1 时,阻止用户 A 的操作 2,而不阻止其他用户。这个想法是为每个用户的锁放置一个唯一的密钥。This is late but think this will help in future. I use Redis for make a queue to solve this problem.
Requirement:
Image below show 2 actions need execute concurrently but phase 2 and phase 3 of action 1 need finish before start phase 2 of action 2 or opposite (A phase can be a request REST api, a database request or execute javascript code...).
How a queue help you
Queue make sure that every block code between
lock()
andrelease()
in many function will not run as the same time, make them isolate.How to build a queue
I will only focus on how avoid race conditon part when building a queue on backend site. If you don't know the basic idea of queue, come here.
The code below only show the concept, you need implement in correct way.
But you need
isRunning()
setStateToRelease()
setStateToRunning()
isolate it's self or else you face race condition again. To do this I choose Redis for ACID purpose and scalable.Redis document talk about it's transaction:
P/s:
I use Redis because my service already use it, you can use any other way support isolation to do that.
The
action_domain
in my code is above for when you need only action 1 call by user A block action 2 of user A, don't block other user. The idea is put a unique key for lock of each user.事务现在在 MongoDB 4.0 中可用。示例此处
Transactions are available now in MongoDB 4.0. Sample here