您能否分享您的想法,如何在 MongoDB 中实现数据版本控制。 (我已经问过有关 Cassandra 的类似问题。如果您有任何问题哪个数据库更适合这个想法,请分享)
假设我需要对一个简单的地址簿中的记录进行版本控制。 (地址簿记录存储为平面 json 对象)。我希望历史:
- 将不经常使用
- 将被一次全部使用,以“时间机器”的方式呈现它,
- 单个记录的版本不会超过几百个。
历史不会过期。
我正在考虑以下方法:
-
创建一个新的对象集合来存储记录的历史记录或记录的更改。它将为每个版本存储一个对象以及对地址簿条目的引用。此类记录如下所示:
<前>{
'_id': '新 ID',
'用户': user_id,
'时间戳':时间戳,
'address_book_id': '通讯录记录的id'
'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...}
}
可以修改此方法以存储每个文档的版本数组。但这似乎是较慢的方法,没有任何优点。
-
将版本存储为附加到地址簿条目的序列化 (JSON) 对象。我不确定如何将此类对象附加到 MongoDB 文档。也许作为字符串数组。
(根据 CouchDB 的简单文档版本控制建模)
Can you share your thoughts how would you implement data versioning in MongoDB. (I've asked similar question regarding Cassandra. If you have any thoughts which db is better for that please share)
Suppose that I need to version records in an simple address book. (Address book records are stored as flat json objects). I expect that the history:
- will be used infrequently
- will be used all at once to present it in a "time machine" fashion
- there won't be more versions than few hundred to a single record.
history won't expire.
I'm considering the following approaches:
-
Create a new object collection to store history of records or changes to the records. It would store one object per version with a reference to the address book entry. Such records would looks as follows:
{
'_id': 'new id',
'user': user_id,
'timestamp': timestamp,
'address_book_id': 'id of the address book record'
'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...}
}
This approach can be modified to store an array of versions per document. But this seems to be slower approach without any advantages.
-
Store versions as serialized (JSON) object attached to address book entries. I'm not sure how to attach such objects to MongoDB documents. Perhaps as an array of strings.
(Modelled after Simple Document Versioning with CouchDB)
发布评论
评论(7)
深入研究这个问题时的第一个大问题是“你想如何存储变更集”?
我个人的方法是存储差异。因为这些差异的显示实际上是一个特殊的操作,所以我会将这些差异放在不同的“历史”集合中。
我会使用不同的集合来节省内存空间。您通常不需要简单查询的完整历史记录。因此,通过将历史记录保留在对象之外,您还可以在查询数据时将其保留在常用访问的内存之外。
为了让我的生活变得轻松,我将制作一个历史文档,其中包含带有时间戳差异的字典。像这样的事情:
为了让我的生活变得非常轻松,我会将这部分作为我用来访问数据的 DataObjects(EntityWrapper,等等)的一部分。一般来说,这些对象都有某种形式的历史记录,因此您可以轻松地重写
save()
方法来同时进行此更改。更新:2015-10
看起来现在有处理 JSON 的规范差异。这似乎是存储差异/更改的更可靠的方式。
The first big question when diving in to this is "how do you want to store changesets"?
My personal approach would be to store diffs. Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection.
I would use the different collection to save memory space. You generally don't want a full history for a simple query. So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried.
To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. Something like this:
To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. Generally these objects have some form of history, so that you can easily override the
save()
method to make this change at the same time.UPDATE: 2015-10
It looks like there is now a spec for handling JSON diffs. This seems like a more robust way to store the diffs / changes.
有一个名为“Vermongo”的版本控制方案,它解决了其他答复中未处理的某些方面。
其中一个问题是并发更新,另一个问题是删除文档。
Vermongo 将完整的文档副本存储在影子集合中。对于某些用例,这可能会导致太多开销,但我认为它也简化了很多事情。
https://github.com/thiloplanz/v7files/wiki/Vermongo
There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies.
One of these issues is concurrent updates, another one is deleting documents.
Vermongo stores complete document copies in a shadow collection. For some use cases this might cause too much overhead, but I think it also simplifies many things.
https://github.com/thiloplanz/v7files/wiki/Vermongo
这是另一个针对当前版本和所有旧版本使用单个文档的解决方案:
data
包含所有 版本。data
数组是有序的,新版本只会被$push
到数组的末尾。data.vid
是版本 ID,它是一个递增的数字。获取最新版本:
通过
vid
获取特定版本:仅返回指定字段:
插入新版本:(并防止并发插入/更新)
2
是当前最新版本的vid
,3
是新版本正在插入。因为您需要最新版本的vid
,所以很容易获取下一个版本的vid
:nextVID = oldVID + 1
。$and
条件将确保2
是最新的vid
。这样就不需要唯一索引,但应用程序逻辑必须负责在插入时递增
vid
。删除特定版本:
就是这样!
(记住每个文档 16MB 的限制)
Here's another solution using a single document for the current version and all old versions:
data
contains all versions. Thedata
array is ordered, new versions will only get$push
ed to the end of the array.data.vid
is the version id, which is an incrementing number.Get the most recent version:
Get a specific version by
vid
:Return only specified fields:
Insert new version: (and prevent concurrent insert/update)
2
is thevid
of the current most recent version and3
is the new version getting inserted. Because you need the most recent version'svid
, it's easy to do get the next version'svid
:nextVID = oldVID + 1
.The
$and
condition will ensure, that2
is the latestvid
.This way there's no need for a unique index, but the application logic has to take care of incrementing the
vid
on insert.Remove a specific version:
That's it!
(remember the 16MB per document limit)
如果您正在寻找现成的解决方案 -
Mongoid 内置了简单的版本控制
http://mongoid.org/en/mongoid/docs/extras.html#versioning
mongoid-history 是一个 Ruby 插件,它提供了一个更加复杂的解决方案,包括审核、撤消和重做
https://github.com/aq1018/mongoid-history
If you're looking for a ready-to-roll solution -
Mongoid has built in simple versioning
http://mongoid.org/en/mongoid/docs/extras.html#versioning
mongoid-history is a Ruby plugin that provides a significantly more complicated solution with auditing, undo and redo
https://github.com/aq1018/mongoid-history
我研究了这个解决方案,该解决方案容纳了数据的已发布、草稿和历史版本:
我在这里进一步解释该模型:http://software.danielwatrous.com/representing-revision-data-in-mongodb/
对于那些可能在 Java 中实现类似内容的人,这里是示例:
http://software.danielwatrous.com/using -java-to-work-with-versioned-data/
包括您可以分叉的所有代码,如果您喜欢
https://github.com/dwatrous/mongodb-revision-objects
I worked through this solution that accommodates a published, draft and historical versions of the data:
I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/
For those that may implement something like this in Java, here's an example:
http://software.danielwatrous.com/using-java-to-work-with-versioned-data/
Including all the code that you can fork, if you like
https://github.com/dwatrous/mongodb-revision-objects
另一种选择是使用 mongoose-history 插件。
Another option is to use mongoose-history plugin.
我已经将下面的包用于 Meteor/MongoDB 项目,它运行良好,主要优点是它将历史记录/修订存储在同一文档的数组中,因此不需要额外的出版物或中间件来访问更改历史记录。它可以支持有限数量的先前版本(例如最近十个版本),它还支持更改串联(因此在特定时期内发生的所有更改将由一个修订版覆盖)。
nicklozon/meteor-collection-revisions
另一个声音选项是使用 Meteor Vermongo (此处)
I have used the below package for a meteor/MongoDB project, and it works well, the main advantage is that it stores history/revisions within an array in the same document, hence no need for an additional publications or middleware to access change-history. It can support a limited number of previous versions (ex. last ten versions), it also supports change-concatenation (so all changes happened within a specific period will be covered by one revision).
nicklozon/meteor-collection-revisions
Another sound option is to use Meteor Vermongo (here)