有些格式实际上是伪装的 zip 文件,例如 docx 或 odt。如果我将它们直接存储在版本控制中,它们将被作为二进制文件处理。我理想的解决方案是
- 有一个钩子,在提交之前为每个 foo.docx/ 文件创建一个 foo.docx/ 目录,
- 可选择将所有文件解压缩到其中,有一个钩子重新缩进 xml 文件
- 有一个钩子,可以在更新后从存储的文件重新创建 foo.docx
我不希望 docx 文件本身受到版本控制。 (我知道相关问题,其中使用自定义差异的不同方法建议。)
这可行吗?这对 Mercurial 可行吗?
更新:
我了解钩子。我对具体细节感兴趣。这是一个演示预期行为的会话。
> hg add foo.docx
> hg status
A foo.docx
> hg commit
> # Change foo.docx with external editor
> hg status
M foo.docx
> hg diff
+++ foo.docx/word/document.xml
- <w:t>An idea</w:t>
+ <w:t>A much better idea</w:t>
There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be
- have a hook that creates a
foo.docx/
directory for each foo.docx
files before commit, unzipping all files into it
- optionally, have a hook that reindents the xml files
- have a hook that recreates
foo.docx
from the stored files after update
I don't want the docx files themselves to be version-controlled. (I am aware of a related question where a different approach with a custom diff was suggested.)
Is this doable? Is this doable with mercurial?
UPDATE:
I know about hooks. I am interested in the specifics. Here is a session to demonstrate the expected behavior.
> hg add foo.docx
> hg status
A foo.docx
> hg commit
> # Change foo.docx with external editor
> hg status
M foo.docx
> hg diff
+++ foo.docx/word/document.xml
- <w:t>An idea</w:t>
+ <w:t>A much better idea</w:t>
发布评论
评论(4)
我想知道同样的事情,刚刚遇到了 Mercurial 的 ZipDoc 扩展/过滤器,这似乎正是这样做的!
还没有尝试过,但看起来很有希望!
I was wondering the same thing, and just came across the ZipDoc extension/filter for Mercurial, which seems to do exactly this!
Haven't tried it yet, but it looks promising!
如果您能够克服成功解压和压缩 Openoffice 文档的障碍,那么您应该能够使用 过滤系统。这使您可以在每次从存储库读取/写入存储库时转换文件。
不幸的是,您要做的不仅仅是解压缩 foo.docx 文件。问题是您需要生成一个文件作为输出 - 因此也许您可以
解压缩 foo.docx
,然后tar
生成的文件。然后,您将对 tarball 进行版本控制,这应该可以工作,因为 tarball 只是所有单个文件与一些元信息的未压缩串联。想想看,一个更简单的解决方案是再次压缩解压的 foo.docx 文件,但不指定压缩。这应该会产生与使用 tar 类似的结果。解决这个问题是我自己想做的事情,所以请通过发送邮件到 进行报告Mercurial 邮件列表。
If you can get past the hurdle of succesfully unzipping and zipping the Openoffice documents, then you should be able to use the filter system we have in Mercurial. That lets you transform files on every read/write from/to the repository.
You will unfortunately have to do more than just unzip the foo.docx file. The problem is that you need to generate a single file as output -- so perhaps you can
unzip foo.docx
and thentar
up the generated files. You'll then be versioning the tarball, which should work since a tarball is just an uncompressed concatenations of all the individual files with some meta information. Come to think of it, a simpler solution would be to zip the unpacked foo.docx file again but specify no compression. That should give similar results as using tar.Solving this problem is something I've wanted to do myself, so please report back by sending a mail to Mercurial mailing list.
您可以使用预提交挂钩来解压缩,并使用更新挂钩来压缩。请参阅明确指南了解如何使用钩子。
重命名时要小心。如果您将
foo.docx
重命名为bar.docx
,您的预提交挂钩将需要删除foo.docx/
并添加bar。 docx/
.更新(很抱歉为 1k-rep 用户提供入门级答案)
如果您想使用未打包的 docx 进行核心 hg 操作,例如
diff
(status
可以与打包的文件),你必须使用扩展名。我认为您可以采取与keyword
扩展 类似的方法用您自己的对象包装 repo 对象。我已经编写了一些扩展,但还没有达到核心级别,所以我无法提供更多细节。
如果你想变得疯狂,你甚至可以与解压的文件合并。但将其视为二进制文件并使用外部工具 差异和合并。
You can use a precommit hook to unzip, and a update hook to zip. See the definite guide on how to use hooks.
Be careful about rename. If you rename
foo.docx
tobar.docx
, your precommit hook will need to deletefoo.docx/
and addbar.docx/
.UPDATE (sorry for giving an entry-level answer to a 1k-rep user)
If you want to use unpacked docx for core hg operations like
diff
(status
can work with packed file), you'd have to go with an extension. I think you can take a similar approach as thekeyword
extension as to wrap the repo object with your own.I have written some extensions but not at that hard core level, so I can't provide more details.
If you want to get crazy you could even do merge with unpacked file. But it's probably safer to treat it as binary and use external tool to diff and merge.
在过去的几天里,我一直在努力解决这个问题,并编写了一个小型 .NET 实用程序来提取和规范化 Excel 文件,以便更容易将它们存储在源代码管理中。我在这里发布了可执行文件:
https://bitbucket.org/htilabs/ooxmlunpack/downloads /OoXmlUnpack.exe
..以及此处的源:
https://bitbucket.org/htilabs/ooxmlunpack< /a>
如果有任何兴趣,我很乐意使其更加可配置,但目前,您应该将可执行文件放在一个文件夹中(例如源存储库的根目录),当您运行它时,它会:
显然并非所有这些都是必需的,但最终结果是一个电子表格文件,仍将在 Excel 中打开,但更适合比较和增量压缩。此外,存储提取的文件也使得版本历史记录中每个版本中应用的更改更加明显。
如果有任何兴趣,我很高兴使该工具更具可配置性,因为我想不是每个人都希望提取内容,或者可能希望从公式单元格中删除值,但目前这些对我来说都非常有用。
在测试中,2MB 的电子表格“解压”为 21MB,但随后我能够在 1.9MB 的 Mercurial 数据文件中存储其五个版本,每个版本之间都有微小的变化,并在文本模式下使用 Beyond Compare 有效地可视化版本之间的差异。
I've been struggling with this exact problem for the last few days and have written a small .NET utility to extract and normalise Excel files in such a way that they're much easier to store in source control. I've published the executable here:
https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe
..and the source here:
https://bitbucket.org/htilabs/ooxmlunpack
If there's any interest I'm happy to make this more configurable, but at the moment, you should put the executable in a folder (e.g. the root of your source repository) and when you run it, it will:
Clearly not all of these things are necessary, but the end result is a spreadsheet file that will still open in Excel but which is much more amenable to diffing and incremental compression. Also, storing the extracted files as well makes it much more obvious in the version history what changes have been applied in each version.
If there's any appetite out there, I'm happy to make the tool more configurable since I guess not everyone will want the contents extracted, or possibly the values removed from formula cells, but these are both very useful to me at the moment.
In tests, a 2MB spreadsheet 'unpacks' to 21MB but then I was able to store five versions of it with small changes between each, in a 1.9MB mercurial data file, and visualise the differences between versions effectively using Beyond Compare in text mode.