在集合之间移动文档是表示 MongoDB 中状态变化的好方法吗?
我有两个集合,一个 (A) 包含要处理的项目(相对较小),另一个 (B) 包含已处理的项目(相当大,具有额外的结果字段) 。
项目从A读取,进行处理并保存()到B,然后从A删除()。
其基本原理是,这些索引之间的索引可以不同,并且“传入”集合可以通过这种方式保持非常小和快速。
我遇到了两个问题:
- 如果remove()或save()超时或在负载下失败,我会完全丢失该项目,或者
- 如果两者都失败则处理它两次,会发生副作用但没有记录其中,
我可以使用 findAndModify 锁来回避双重失败情况(否则不需要,我们有进程级锁),但随后我们会遇到过时的锁问题,并且仍然可能发生部分失败。据我所知,没有办法自动删除+保存到不同的集合(也许是设计使然?)
这种情况有最佳实践吗?
I have two collections, one (A) containing items to be processed (relatively small) and one (B) with those already processed (fairly large, with extra result fields).
Items are read from A, get processed and save()'d to B, then remove()'d from A.
The rationale is that indices can be different across these, and that the "incoming" collection can be kept very small and fast this way.
I've run into two issues with this:
- if either remove() or save() time out or otherwise fail under load, I lose the item completely, or process it twice
- if both fail, the side effects happen but there is no record of that
I can sidestep the double-failure case with findAndModify locks (not needed otherwise, we have a process-level lock) but then we have stale lock issues and partial failures can still happen. There's no way to atomically remove+save to different collections, as far as I can tell (maybe by design?)
Is there a Best Practice for this situation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我自己还没有尝试过,但是新书 MongoDB 开发人员的 50 个提示和技巧多次提到使用 cron 作业(或服务/调度程序)来清理这样的数据。您可以将集合 A 中的文档标记为删除,然后运行日常作业来清除它们,从而缩小原始事务的总体范围。
根据我到目前为止所学到的,我永远不会让数据库处于依赖下一个数据库操作成功的状态,除非它是最后操作(日志记录将重新发送最后一个数据库操作)恢复)。例如,我有一个三阶段帐户注册过程,我在 CollectionA 中创建一个用户,然后将另一个相关文档添加到 CollectionB。当我创建用户时,我将 CollectionB 文档的详细信息嵌入到 CollectionA 中,以防第二次写入失败。稍后我将编写一个过程,如果 CollectionB 中的文档存在,则从 CollectionA 中删除嵌入的数据。
没有事务确实会导致这样的痛点,但我认为在某些情况下有新的思考方式。就我而言,随着我的应用程序的进展,时间会证明一切
I've not tried this myself yet but the new book 50 Tips and Tricks for MongoDB Developers mentions a few times about using cron jobs (or services/scheduler) to clean up data like this. You could leave the documents in Collection A flagged for deletion and run daily job to clear them out, reducing the overall scope of the original transaction.
From what I've learned so far, I'd never leave the database in a state where I rely on the next database action succeeding unless it is the last action (journalling will resend the last db action upon recovery). For example, I have a three phase account registration process where I create a user in CollectionA and then add another related document to CollectionB. When I create the user I embed the details of the CollectionB document in CollectionA in case the second write fails. Later I will write a process that removes the embedded data from CollectionA if the document in CollectionB exists
Not having transactions does cause pain points like this, but I think in some cases there are new ways of thinking about it. In my case, time will tell as I progress with my app
是的,这是设计使然。 MongoDB 明确不提供连接或事务。删除+保存是一种事务形式。
这里确实有两个低复杂度的选项,都涉及 findAndModify。
选项#1:单个集合
根据您的描述,您基本上是在构建一个具有一些额外功能的队列。如果您利用单个集合,则可以使用
findAndModify
来更新正在处理的每个项目的状态。不幸的是,这意味着您将失去这个:...“传入”集合可以通过这种方式保持非常小和快速。
选项 #2:两个集合
另一个选项基本上是利用
findAndModify
进行两阶段提交。请在此处查看相关文档。
一旦在A中处理了某个项目,您就可以设置一个字段来将其标记为删除。然后您将该项目复制到B。一旦复制到B,您就可以从A中删除该项目。
Yes this is by design. MongoDB explicitly does not provides joins or transactions. Remove + Save is a form of transaction.
You really have two low-complexity options here, both involve
findAndModify
.Option #1: a single collection
Based on your description, you are basically building a queue with some extra features. If you leverage a single collection then you use
findAndModify
to update the status of each item as it is processing.Unfortunately, that means you will lose this: ...that the "incoming" collection can be kept very small and fast this way.
Option #2: two collections
The other option is basically a two phase commit, leveraging
findAndModify
.Take a look at the docs for this here.
Once an item is processed in A you set a field to flag it for deletion. You then copy that item over to B. Once copied to B you can then remove the item from A.