如何有效地存储编辑历史记录?

发布于 2024-07-29 16:57:26 字数 133 浏览 10 评论 0原文

我只是想知道像 stackoverflow 和 wikipedia 这样的网站,它们无限期地存储编辑历史记录并允许用户回滚编辑。 有人可以推荐有关如何使用任何合适的技术(例如数据库等)执行此操作的任何资源/书籍/文章吗?

非常感谢!

I was just wondering for sites like stackoverflow and wikipedia, they stores history of edits indefinitely and allows user to roll back the edits. Can someone recommend any resources/books/articles regarding how to do this using any suitable technology (such as databases etc)

Thanks a lot!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寄居者 2024-08-05 16:57:26

有多种选择,最简单的当然是简单地独立记录所有版本。 对于像 Stack Overflow 这样的网站来说,帖子通常不会被多次编辑,这是合适的。 然而,对于像维基百科这样的东西,人们需要更聪明才能节省空间。

就维基百科而言,页面最初与每个版本分开存储在文本表。 定期将许多较旧的修订版本压缩在一起,然后打包到单个字段中。 由于会有很多重复,这样可以节省很多空间。

您可能还想了解一些版本控制系统是如何做到这一点的 - 例如,subversion 使用 跳过增量,其中修订版本存储为与历史记录中间修订版本的差异。 这意味着人们必须检查最多 O(log n) 次修订才能重建自己感兴趣的修订。

另一方面,Git 使用更类似于维基百科的方法。

修订版本首先存储为单独压缩的“松散”对象,然后定期 git 获取所有松散对象,根据有点复杂的启发式对它们进行排序,然后在“附近”对象之间构建压缩的增量并将结果转储为 packfile

重建文件所需读取的修订版本数量受包构建过程的参数限制。 这有一个有趣的特性,在某些情况下,可以在不相关的对象之间构建增量。

There are a number of options, the simplest, of course, being to simply record all versions independently. For a site like Stack Overflow, where posts aren't usually edited very many times, this is appropriate. However for something like Wikipedia, one needs to be more clever to save space.

In the case of Wikipedia, pages are initially stored with each version separate, in the text table. Periodically, a number of older revisions are compressed together, then packed into a single field. Since there will be a lot of repetition, you save a lot of space this way.

You might also want to look into how some version control systems do it - for example, subversion uses skip deltas, where revisions are stored as a difference from a revision halfway down the history. This means that one will have to examine at most O(log n) revisions to reconstruct one's revision of interest.

Git, on the other hand, uses something more similar to Wikipedia's approach.

Revisions are stored as individually compressed 'loose' objects at first, then periodically git takes all of the loose objects, sorts them according to a somewhat complex heuristic, then builds compressed deltas between 'nearby' objects and dumps the result as a packfile.

The number of revisions that need to be read to reconstruct a file is bounded by an argument to the pack building process. This has the interesting property that deltas can be built between objects that are unrelated, in some cases.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文