Subversion 能否有效存储 OpenXML Office 文档?
我一直将 Subversion 作为我公司的工程文档存储库进行管理。它工作得相当好,但是我有一个关于 Subversion 如何(应该)处理 MS Office 2007 格式的问题。
我正在工作副本中查看 Excel 2007 电子表格(扩展名 .xlsx),Subversion 已应用 svn:mime-type 属性 application/octet-stream。这意味着 Subversion 将其视为二进制文件,对吧?
我希望 Subversion 能够有效地存储新的 MS Office 文档格式。我的理解是,每次提交该文件时都会生成二进制文件的完整副本,而如果文件是文本,则会对文件进行微小的更改将少量附加数据添加到存储库中(至少在典型情况下)。
我不太了解 XML 的细节,但我认为 XML 文件是文本,因此 Subversion 可以有效地存储它。
是否可以配置 Subversion 以便高效存储 MS Office OpenXML 文档?
后续 (2009-11-09):我发现 Office 文档可以使用 Office 2003 XML 文档格式存储为纯文本(Excel:XML Spreadsheet 2003; Word:Word XML 文档有一个关于格式丢失的警告,但我还没有遇到任何明显的格式丢失。
I have been managing Subversion as an engineering document storage repository for my company. It is working fairly well, however I have a question about how MS Office 2007 formats are (should be) handled by Subversion.
I'm looking at an Excel 2007 spreadsheet (extension .xlsx) in my working copy that Subversion has applied the svn:mime-type property application/octet-stream. This means that Subversion is treated it as binary, right?
I was hoping that the new MS Office document formats would be stored efficiently by Subversion. My understanding is that a full copy of a binary file will be made on every commit of that file, whereas if the file is text, a small change to the file will result in a small amount of additional data being added to the repository (in a typical situation at least).
I don't understand much of the details of XML, but I thought that an XML file was text, and that it would therefore be efficiently stored by Subversion.
Is it possible to configure Subversion so that MS Office OpenXML documents are stored efficiently?
Follow-up (2009-11-09): I've found that Office documents can be stored as plain text using the Office 2003 XML document formats (Excel: XML Spreadsheet 2003; Word: Word XML Document. There is a warning about loss of formatting, but I have yet to encounter any noticeable loss of formatting.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
遗憾的是,您目前无法使用 Subversion 执行此操作,但对此已有一些讨论:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=651443
Sadly, you can't currently do this with Subversion, but there has been some discussion around this:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=651443
您是否尝试过在文本编辑器中打开 OpenXML 文件?
简而言之:它不是文本,它仍然是二进制的。所以不,你不能让 Subversion 以不同的方式处理它。
Have you ever tried to open an OpenXML file in a text editor?
To make it short: it is not text, it is still binary. So no, you can’t make Subversion handle it any different.
来自 维基百科上的 OpenXML 文章:
换句话说,OpenXML 文件实际上是其中包含 XML 文件的 zip 文件。压缩或加密会“扰乱”数据,破坏 subversion 在修订之间生成增量的能力。这与 svn:mimetype 无关。 Subversion 在生成增量时将所有文件视为二进制文件。
在荷兰语中,我们有句话“测量即了解”。下图显示了我在 SVN 1.6 存储库(修订版 1)中导入 500K OpenXML 文档的实验结果。然后我添加了另一个文档中的一段,保存并提交。重复 5 次(修订 2 至 6)。
如您所见,提交仅添加一个段落的新 docx 修订版将花费您大约 150K 磁盘空间。这仍然比在没有版本控制系统帮助的情况下仅存储每个修订版本的副本要高效得多。
我还通过解压缩 docx 的每个修订版,使用单独的测试存储库重复了实验。正如您所看到的,如果不压缩文档修订版本的存储将会更加高效。有趣的是,subversion 自己的数据压缩与 zip 的效率差不多。在 Subversion 中存储未压缩 docx 的第一个版本所占用的空间与原始 docx 大致相同。
YMMV。
From the OpenXML article on wikipedia:
In other words, OpenXML files are actually zip files with XML files in them. Compression or encryption "scrambles" the data, sabotaging subversion's ability to generate deltas between revisions. This is not related to the
svn:mimetype
. Subversion considers all files to be binary when generating deltas.In Dutch we have a saying "measuring is knowing". The graph below shows the results of an experiment where I imported a 500K OpenXML document in a SVN 1.6 repository (revision 1). I then added a paragraph from another document, saved and committed. This was repeated 5 times (revision 2 to 6).
As you can see, committing a new docx revision that just adds a paragraph will cost you about 150K disk space. This is still much more efficient than just storing a copy of each revision without the help of a version control system.
I also repeated the experiment with a separate test repository by uncompressing each revision of the docx. As you can see, the storage of the document revisions would be much more efficient if it wasn't compressed. It's also interesting to see that subversion's own data compression is about as efficient as zip. Storing the first revision of an uncompressed docx in subversion takes about the same space as the original docx.
YMMV.
Subversion 可以很好地处理二进制文件。它不存储每次提交的完整副本,而仅存储有效的二进制差异。
请参阅有关此内容的常见问题解答。
Subversion handles binary files quite well. It does not store a full copy for every commit but only an efficient binary diff.
See the FAQ about this.