在键值存储中保存带有修订的文档的最佳方法是什么?

发布于 2024-10-31 11:15:17 字数 260 浏览 1 评论 0原文

我是键值存储新手,需要您的推荐。我们正在开发一个管理文档及其修订的系统。有点像维基百科。我们正在考虑将这些数据保存在键值存储中。

请不要给我推荐您喜欢的数据库,因为我们想破解它,以便我们可以使用许多不同的键值数据库。我们使用node.js,因此我们可以轻松地使用json。

我的问题是:数据库的结构应该是什么样的?我们有每个文档的元数据(时间戳、最后文本、id、最新版本),并且我们有每个修订的数据(更改、作者、时间戳等)。那么,您推荐哪种键/值结构?

谢谢

I'm new to Key-Value Stores and I need your recommendation. We're working on a system that manages documents and their revisions. A bit like a wiki does. We're thinking about saving this data in a key value store.

Please don't give me a recommendation that is the database you prefer because we want to hack it so we can use many different key value databases. We're using node.js so we can easily work with json.

My Question is: What should the structure of the database look like? We have meta data for each document(timestamp, lasttext, id, latestrevision) and we have data for each revision (the change, the author, timestamp, etc...). So, which key/value structure you recommend?

thx

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

海未深 2024-11-07 11:15:17

摘自 MongoDB 群组。它有些特定于 MongoDB,但是非常通用。

大多数历史实现都分为两种常见策略。

策略 1:嵌入历史记录

理论上,您可以将文档的历史记录嵌入到文档本身中。这甚至可以原子地完成。

> db.docs.save( { _id : 1, text : "Original Text" } ) 
> var doc = db.docs.findOne() 
> db.docs.update( {_id: doc._id}, { $set : { text : 'New Text' }, $push : { hist : doc.text } } ) 
> db.docs.find() 
{ "_id" : 1, "hist" : [ "Original Text" ], "text" : "New Text" } 

策略 2:将历史记录写入单独的集合

> db.docs.save( { _id : 1, text : "Original Text" } ) 
> var doc = db.docs.findOne() 
> db.docs_hist.insert ( { orig_id : doc._id, ts : Math.round((new Date()).getTime() / 1000), data : doc } ) 
> db.docs.update( {_id:doc._id}, { $set : { text : 'New Text' }  } ) 

在这里您将看到我执行了两次写入。一件到大师收藏和
一份历史收藏。
要快速查找历史记录,只需获取原始 ID:

> db.docs_hist.ensureIndex( { orig_id : 1, ts : 1 }) 
> db.docs_hist.find( { orig_id : 1 } ).sort( { ts : -1 } )

  • 两种策略都可以通过仅显示差异来增强
  • 您可以通过添加从 历史集合原始集合 的链接来进行混合

在键值存储中保存带有修订的文档的最佳方式是什么?

很难说有一个“最好的方法”。显然这里需要做出一些权衡。

嵌入:

  • 单个文档的原子更改
  • 可能会导致大型文档,可能会打破合理的大小限制,
  • 可能必须增强代码以避免在不必要时返回完整的历史记录

单独集合:

  • 更容易编写
  • 非原子查询,需要两个操作(你有事务吗?
  • 更多的存储空间(原始文档上的额外索引

Cribbed from the MongoDB groups. It is somewhat specific to MongoDB, however, it is pretty generic.

Most of these history implementations break down to two common strategies.

Strategy 1: embed history

In theory, you can embed the history of a document inside of the document itself. This can even be done atomically.

> db.docs.save( { _id : 1, text : "Original Text" } ) 
> var doc = db.docs.findOne() 
> db.docs.update( {_id: doc._id}, { $set : { text : 'New Text' }, $push : { hist : doc.text } } ) 
> db.docs.find() 
{ "_id" : 1, "hist" : [ "Original Text" ], "text" : "New Text" } 

Strategy 2: write history to separate collection

> db.docs.save( { _id : 1, text : "Original Text" } ) 
> var doc = db.docs.findOne() 
> db.docs_hist.insert ( { orig_id : doc._id, ts : Math.round((new Date()).getTime() / 1000), data : doc } ) 
> db.docs.update( {_id:doc._id}, { $set : { text : 'New Text' }  } ) 

Here you'll see that I do two writes. One to the master collection and
one to the history collection.
To get fast history lookup, just grab the original ID:

> db.docs_hist.ensureIndex( { orig_id : 1, ts : 1 }) 
> db.docs_hist.find( { orig_id : 1 } ).sort( { ts : -1 } )

  • Both strategies can be enhanced by only displaying diffs
  • You could hybridize by adding a link from history collection to original collection

Whats the best way of saving a document with revisions in a key value store?

It's hard to say there is a "best way". There are obviously some trade-offs being made here.

Embedding:

  • atomic changes on a single doc
  • can result in large documents, may break the reasonable size limits
  • probably have to enhance code to avoid returning full hist when not necessary

Separate collection:

  • easier to write queries
  • not atomic, needs two operations (do you have transactions?)
  • more storage space (extra indexes on original docs)
千鲤 2024-11-07 11:15:17

我会在每个文档下保留真实数据的层次结构,并附加修订数据,例如:

{ 
  [
    {
      "timestamp" : "2011040711350621",
      "data" : { ... the real data here .... }
    },
    {
      "timestamp" : "2011040711350716",
      "data" : { ... the real data here .... }
    }
  ]
}

然后使用推送操作添加新版本并定期删除旧版本。您可以使用最后一个(或第一个)过滤器在任何给定时间仅获取最新副本。

I'd keep a hierarchy of the real data under each document with the revision data attached, for instance:

{ 
  [
    {
      "timestamp" : "2011040711350621",
      "data" : { ... the real data here .... }
    },
    {
      "timestamp" : "2011040711350716",
      "data" : { ... the real data here .... }
    }
  ]
}

Then use the push operation to add new versions and periodically remove the old versions. You can use the last (or first) filter to only get the latest copy at any given time.

A君 2024-11-07 11:15:17

我认为有多种方法,而且这个问题很老了,但我会给出我的两分钱,因为我今年早些时候正在研究这个问题。我一直在使用 MongoDB。

就我而言,我有一个用户帐户,然后在不同的社交网络上有个人资料。我们想要跟踪社交网络配置文件的更改并希望对其进行修改,因此我们创建了两个结构来进行测试。两种方法都有一个指向外来对象的 User 对象。我们不想从一开始就嵌入对象。

用户看起来像这样:

User {
  "tags"              : [Tags]
  "notes"             : "Notes"
  "facebook_profile"  : <combo_foreign_key>
  "linkedin_profile"  : <same as above>
}

然后,对于combo_foreign_key,我们使用了这种模式(为了简单起见,使用 Ruby 插值语法)

combo_foreign_key = "#{User.key}__#{new_profile.last_updated_at}"

facebook_profiles {
  combo_foreign_key: facebook_profile
  ... and you keep adding your foreign objects in this pattern
}

这为我们提供了 O(1) 查找用户的最新 FacebookProfile,但要求我们将最新的 FK 存储在用户对象。如果我们想要所有 FacebookProfiles,我们会要求 facebook_profiles 集合中带有“#{User.key}__”前缀的所有键,这是 O(N)...

我们尝试的第二个策略是存储一个数组用户对象上的那些 FacebookProfile 键,因此用户对象的结构从 更改为

  "facebook_profile"  : <combo_foreign_key>

这里

  "facebook_profile"  : [<combo_foreign_key>]

,当我们添加新的配置文件变体时,我们只需附加新的组合键。然后我们只需对“facebook_profile”属性进行快速排序,并在最大的属性上建立索引即可获取最新的个人资料副本。此方法必须对 M 个字符串进行排序,然后根据排序列表中最大的项目对 FacebookProfile 进行索引。获取最新副本的速度有点慢,但它给了我们一次性了解用户 FacebookProfile 的每个版本的优势,而且我们不必担心确保foreign_key确实是最新的配置文件对象。

起初,我们的修订次数非常少,而且效果都很好。我想我现在更喜欢第一个而不是第二个。

希望得到其他人关于解决这个问题的方法的意见。另一个答案中建议的 GIT 想法实际上对我来说听起来非常好,对于我们的用例来说会工作得很好......酷。

I think there are multiple approaches and this question is old but I'll give my two cents as I was working on this earlier this year. I have been using MongoDB.

In my case, I had a User account that then had Profiles on different social networks. We wanted to track changes to social network profiles and wanted revisions of them so we created two structures to test out. Both methods had a User object that pointed to foreign objects. We did not want to embed objects from the get-go.

A User looked something like:

User {
  "tags"              : [Tags]
  "notes"             : "Notes"
  "facebook_profile"  : <combo_foreign_key>
  "linkedin_profile"  : <same as above>
}

and then, for the combo_foreign_key we used this pattern (Using Ruby interpolation syntax for simplicity)

combo_foreign_key = "#{User.key}__#{new_profile.last_updated_at}"

facebook_profiles {
  combo_foreign_key: facebook_profile
  ... and you keep adding your foreign objects in this pattern
}

This gave us O(1) lookup of the latest FacebookProfile of a User but required us to keep the latest FK stored in the User object. If we wanted all of the FacebookProfiles we would then ask for all keys in the facebook_profiles collection with the prefix of "#{User.key}__" and this was O(N)...

The second strategy we tried was storing an array of those FacebookProfile keys on the User object so the structure of the User object changed from

  "facebook_profile"  : <combo_foreign_key>

to

  "facebook_profile"  : [<combo_foreign_key>]

Here we'd just append on the new combo_key when we added a new profile variation. Then we'd just do a quick sort of the "facebook_profile" attribute and index on the largest one to get our latest profile copy. This method had to sort M strings and then index the FacebookProfile based on the largest item in that sorted list. A little slower for grabbing the latest copy but it gave us the advantage knowing every version of a Users FacebookProfile in one swoop and we did not have to worry about ensuring that foreign_key was really the latest profile object.

At first our revision counts were pretty small and they both worked pretty well. I think I prefer the first one over the second now.

Would love input from others on ways they went about solving this issue. The GIT idea suggested in another answer actually sounds really neat to me and for our use case would work quite well... Cool.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文