word文档的svn或mercurial版本控制
据我所知,Microsoft 在其最新版本的 Office 中采用了某种基于 xml 的表示形式。如果确实如此,那么我会假设版本控制会起作用,尽管您显然必须
<<<<<<
======
>>>>>>
在加载单词之前使用其中的旧标记来解决任何嵌入的更改。
This other question提到了这个问题,但版本控制在Word中根本不起作用,这似乎已成定局,我想知道为什么?
As far as I know, Microsoft went to some sort of xml-based representation in their most recent version of office. If that's really true, then I would assume that version control would work, although you would obviously have to resolve any embedded changes with the old
<<<<<<
======
>>>>>>
marks in them before loading word.
This other question mentions the issue, but it seems to be taken as a foregone conclusion that version control simply won't work in Word, and I want to know why?
Is version control (ie. Subversion) applicable in document tracking?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
Mercurial 有 zipdoc 扩展,它似乎可以通过存储来处理压缩文件,例如基于 XML 的 Word 文档它们在内部解压缩,以获得有意义的增量并以有意义的方式合并它们。我没有测试它,但这听起来像是您正在寻找的东西。
There's the zipdoc extension for Mercurial, which seems to handle compressed files like XML-based Word documents by storing them uncompressed internally in order to get meaningful deltas and in order to merge them in a meaningful way. I did not test it, but it sounds like the thing you're looking for.
得出的结论是,尽管大多数(如果不是全部)版本控制系统(包括 Mercurial)确实可以处理二进制文件,但它们在比较和合并它们方面很糟糕。
Word 文件本质上是二进制的。是的,Office 的最新版本已切换为“Office Open XML”格式,其中包括 XML,但它们仍然将整个内容包装在 zip 文件中,这意味着它仍然是二进制文件(是的,我知道所有文件都在事实上,二进制,你知道我的意思。)
现在,许多版本控制系统,无论是 Mercurial 还是 Subversion,都可以通过给它一个可以完成这项工作的外部合并工具来告诉它如何合并它认为是二进制的任何文件类型。
这基本上意味着,如果您能找到一个程序,可以获取两个 Word 文件,比较它们,并允许您协调差异,那么您就可以开展业务。
如果您解压缩 Word 文件并对内容进行版本控制,那么是的,您可能会遇到可以通过 Mercurial 解决的合并冲突,但是内容仍然采用不是您自己编写的格式,因此协调困难的合并冲突可能会不仅困难,而且可能是不可能的。
简而言之,版本控制系统擅长存储二进制文件,但不擅长比较和合并它们。
如果您从不需要比较或合并,您可以使用 Mercurial 或 Subversion 等,它会工作得很好。
The foregone conclusion is that although most, if not all, version control systems, Mercurial included, does indeed work with binary files, they suck at diffing and merging them.
Word files are binary in nature. Yes, the latest incarnations of Office has switched to "Office Open XML" format, which includes XML, but they still wrap the entire thing in a zip file, which means it is still binary (and yes, I know that all files are in fact binary, you know what I mean.)
Now, many version control systems, both Mercurial and Subversion, can be told how to merge any file type it considers binary by giving it an external merge tool that can do the job.
This basically means that if you can find a program that can take two Word files, diff them, and allow you to reconcile differences, then you're in business.
If you unzipped the Word file, and versioned the contents, then yes, you could get merge conflicts that you can resolve through Mercurial, however the contents would still be in a format that you didn't write yourself, so reconciling difficult merge conflicts might not be just difficult, they might be impossible.
In short, version control systems excel at storing binary files, but they suck at diffing and merging them.
If you never need to diff or merge, you can use Mercurial or Subversion or whatever, and it will work just great.
新格式实际上是基于 XML 的,但 .docx 文件本身实际上是一个 zip 文件。所以最终它仍然是一个二进制文件......
The new formats are in fact XML based, however the .docx file itself is actually a zip file. So ultimately it is still a binary file...
我想这取决于谁将使用这些文件。通常只有开发人员才习惯使用 VCS,因此您可能会让那些只想通过共享驱动器进行访问的人的生活变得复杂化。
另一方面,修订历史记录通常非常重要,我经常看到顶部有大摘要的Word文档,列出了所有更改,这看起来真的很愚蠢。
我认为像谷歌文档这样基于云的解决方案可能会在未来填补这一空白。或者也许只是一个团队维基。一般来说,您会牺牲一些更高级的单词功能来获得更开放的共享体验,但谷歌文档正在变得非常强大。
I suppose it depends on who will be using the documents. Usually only developers are comfortable with using VCSs, so you may be complicating the lives of people who just want to access via a shared drive.
On the other hand, revision history is often very important, and I often see word documents with big summaries at the top, listing all of the changes, which seems really silly.
I'm think that cloud based solutions like google docs, will probably fill this gap in the future. Or maybe just a team wiki. Generally you are trading off some of the fancier word features to have a more open sharing experience, but google docs is becoming pretty powerful.
我会将用例放在前台。世界上有很多人需要工具来比较同一 Word 文档的两个版本 - 但他们不是开发人员,而是律师。在我的律师事务所客户中,文件会发送给他们的客户并进行编辑,因此基于文件的比较是绝对必要的。他们使用内置的 Word 比较功能或第三方工具(WorkShare DeltaView 类似于行业标准)。这些工具还可以比较 PDF 文档。
这里的用例显然是内容驱动的:律师需要快速了解两个版本的合同之间的差异。两个版本都可以作为“版本”存储在文档管理系统中,或者对于 DeltaView,可以存储增量文件以供进一步查看。
开发人员的用例是什么?源代码控制系统意味着“源”控制,而不是“控制项目中出现的所有内容”。我宁愿将项目相关文档(计划、规格、要求、电子邮件)存储在另一个存储中,而不是存储在 Mercurial 中。 - 另一方面,我经常在文档模板项目中使用Word 文档或Word 模板作为解决方案的一部分,当然这些文档是源文件 - 因此保存在存储库中。但到目前为止,可视化差异的需求相对较小,特别是如果您的评论很好(“版本 1 - init”、“版本 2:在页眉中添加了文本框”、“版本 3:添加了页脚”)信息”等)。
I'd put the Use Case in the foreground. Quite a lot of people in the world need tools to compare two versions of the same Word document - but they're not developers, but for example attorneys. At my law firm clients, documents go out to their clients and come back with edits, so a document-based-comparison is absolutely necessary. They use either the built-in Word comparison function, or third-party tools (WorkShare DeltaView is something like an industry standard). These tools allow also to compare PDF-documents.
The use case here is clearly content-driven: the attorneys need to get quickly an overview of the differences between two versions of a contract. Both versions can be stored in a document management system as "versions", or in the case of DeltaView, the delta file can be stored for further review.
What can be the use case for a developer? Source control systems mean "SOURCE" control, and not "control all stuff coming up in my project". I'd rather store project-related documents (Plans, Specs, Requirements, E-Mails) in another store, not in Mercurial. - On the other hand, I use often Word documents or Word templates as part of the solution in Document Template projects, and of course these documents are source - so saved in the repo. But the need to visualize differences was up to now relatively small, especially if your comments are good ("Version 1 - init", "Version 2: added textbox in header", "Version 3: added footer information" etc.).
对各种观点或假设的回复请阅读此处:
如果您想要关键字扩展,请考虑保存为 XML 而不是 docx:
将文件另存为 .xml 而不是 .docx;虽然你的文件变得更大(不再压缩),你可以通过 svn 压缩节省空间,我希望文本比二进制更有效。
这似乎为我工作。
鲁道夫
Replies to various points or assumptions read here:
Consider saving in XML instead of docx, if you want keyword expansion:
Save your file as .xml instead of .docx; though your file gets much bigger (no longer zipped), you may save space with svn compression, more efficient on text than binaries, I expect.
That seems to work for me.
Rodolphe
取决于设置。
如果您想要跟踪其更改的时间较短的文档,请使用 Word 内部控件。
否则,请使用 SVN 或 Sharepoint 或其他一些记录版本化文档的外部方法。如果不这样做,您将面临任何人都可能覆盖该文件并丢失所有版本控制信息的风险。
Depends on the setting.
If it's a short lived doc that you want to track changes in, then use the Word internal control.
Otherwise use SVN or Sharepoint or some other External means of recording versioned documents. If you don't you run the risk that anybody could overwrite the file with all the versioning information lost.