Java 中文件增量/版本控制的现有解决方案
当版本控制或优化文件备份时,一种想法是仅使用增量或已修改的数据。
乍一看,这听起来像是一个简单的想法,但实际上确定未修改的数据在哪里结束以及新数据在哪里开始是一项艰巨的任务。
是否有现有的框架已经可以执行类似的操作或有效的文件比较算法?
When versioning or optimizing file backups one idea is to use only the delta or data that has been modified.
This sounds like a simple idea at first but actually determining where unmodified data ends and new data starts comes accross as a difficult task.
Is there an existing framework that already does something like this or an efficient file comparison algorithm?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
XDelta 不是 Java,但无论如何都值得一看。它有 Java 版本,但我不知道它有多稳定。
XDelta is not Java but is worth looking at anyway. There is Java version of it but I don't know how stable is it.
您可以考虑利用开源版本控制系统(例如,Subversion),而不是自行开发。通过这种方式,您获得的不仅仅是增量版本控制算法。
Instead of rolling your own, you might consider leveraging an open source version control system (eg, Subversion). You get a lot more than just a delta versioning algorithm that way.
听起来您正在描述基于差异的存储方案。大多数源代码控制系统都使用此类系统来最大限度地减少其存储需求。 *nix“diff”命令能够生成您自己实现它所需的数据。
It sounds like you are describing a difference based storage scheme. Most source code control systems use such systems to minimize their storage requirements. The *nix "diff" command is capable of generating the data you would need to implement it on your own.
下面是一个可以计算两个纯文本文件之间差异的 Java 库:
http:// code.google.com/p/google-diff-match-patch/
不过,我不知道有任何二进制差异库。尝试谷歌搜索“java二进制差异”;-)
Here's a Java library that can compute diffs between two plain text files:
http://code.google.com/p/google-diff-match-patch/
I don't know any library for binary diffs though. Try googling for 'java binary diff' ;-)
我认为,Bsdiff 工具是二进制文件的最佳选择。它使用后缀排序(Larsson 和 Sadakane 的 qsufsort)并利用可执行文件的更改方式。 Bsdiff 由 Colin Percival 用 C++ 编写。 Bsdiff 创建的 Diff 文件通常比 Xdelta 创建的文件小。
还值得注意的是,Bsdiff 使用 bzip2 压缩算法。 Bsdiff 创建的二进制补丁有时可以使用其他压缩算法(如 WinRAR 归档器的算法)进一步压缩。
您可以在以下站点找到 Bsdiff 文档并免费下载 Bsdiff: http://www.daemonology.net/ bsdiff/
As for my opinion, Bsdiff tool is the best choice for binary files. It uses suffix sorting (Larsson and Sadakane's qsufsort) and takes advantage of how executable files change. Bsdiff was written in C++ by Colin Percival. Diff files created by Bsdiff are generally smaller than the files created by Xdelta.
It is also worth noting that Bsdiff uses bzip2 compression algorithm. Binary patches created by Bsdiff sometimes can be further compressed using other compression algorithms (like the WinRAR archiver's one).
Here is the site where you can find Bsdiff documentation and download Bsdiff for free: http://www.daemonology.net/bsdiff/