为什么我的 Git 存储库比 Mercurial 版本大得多?
我已使用快速导出将 Mercurial 存储库转换为 Git。但 Git 存储库非常庞大:Git 为 18 GB,而 Mercurial 为 3.4 GB。我的清理步骤都没有帮助。
我的 Mercurial 存储库主要由一个每天更新的 65 MB 文件(SQLite 格式的 Anki 抽认卡)控制。其历史记录已增长至 2.9 GB,位于 .hg/store/data 下。
我希望 Git 能够更好地压缩历史记录,但我一直无法将存储库缩小到 18 GB 以下!
我尝试过 git prune 、 git gc 等,但没有成功。我什至尝试压缩 .git 文件夹,结果仍然是 18 GB。
我错过了什么吗?
更新:我尝试了 Bazaar (bzr),它将我的存储库压缩到只有 2.3 GB。好的!
I've converted a Mercurial repository to Git, using fast-export. But the Git repository is huge: 18 GB for Git vs. 3.4 GB for Mercurial. None of my cleanup steps have helped.
My Mercurial repository is dominated by one 65 MB file (Anki flashcards in SQLite format) that gets updated daily. Its history has grown to be 2.9 GB, under .hg/store/data.
I was hoping Git might be able to compress the history a little better, but I have been unable to shrink the repository below 18 GB!
I have tried git prune
, git gc
, and others, to no avail. I even tried zipping the .git folder, and it still came out to be exactly 18 GB.
Am I missing something?
Update: I tried Bazaar (bzr), and it compressed my repository to only 2.3 GB. Nice!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
原因之一可能是 Mercurial 具有非常紧凑的存储格式,其中涉及差异,即使对于二进制文件也是如此。由于使用差异来重新创建版本可能非常耗时,因此一旦差异+旧原始版本超过完整快照大小的两倍,它就会存储完整快照。
就我个人而言,我会尝试存储 sqlite 数据库的转储而不是数据库文件本身,然后看看它会带来什么结果。这可能会更有效率。
我不知道git的存储格式是什么。但我猜它并不像 Mercurial 那样涉及差异。
One reason could be that Mercurial has a very compact storage format that involves diffs, even for binaries. And since using diffs to re-create versions can be very time consuming, it will store a full snapshot as soon as the diffs+old original exceed the double the size of a full snapshot.
Personally, I would try storing a dump of your sqlite database instead of the database file itself and see where that gets you. It might be far more efficient.
I do not know what git's storage format is. But I'm guessing it does not involve diffs in the same way as Mercurial's does.
如果 git gc 失败,请尝试手动运行 git repack,然后运行 git gc。
我对 SVN、Git 和 Hg 的观察:
我一直观察到 SVN 和 Hg 存储库比相应的 git 存储库小得多。这是因为对文件(文本或二进制)的每次更改都会为其添加一个新的完整对象。在 SVN 中,即使是二进制文件,也只添加 diff,并且 SVN 中的二进制 diff 也非常好。
但这就是包文件的用武之地,因为它们仅存储相似对象之间的差异(增量),甚至被压缩。即使进行打包,我也发现 Git 存储库往往会更大,具体取决于文件类型和这些文件所经历的更改量。这是我对 Git 所接受的,考虑到 Git 各种操作的速度,这是我愿意接受的妥协。
If the
git gc
is failing, try manually running agit repack
and thengit gc
.My observations with SVN, Git and Hg:
I have always observed that SVN and Hg repositories were much smaller than the corresponding git repositories. This is because each change to a file - text or binary, adds a new full object for it. In SVN, only the diff is added, even in the case of binaries and the binary diffing in SVN is very good as well.
But this is where the pack files come in, since they store only diff (delta) amongst similar objects and are even compressed. Even with packing, I have observed that Git repositories, depending on the kind of files and the amount of changes those files undergo, tend to be larger. This is something I have come to accept with Git and it is a compromise I am willing to take given how fast the various operation are with Git.
在从 Mercurial 迁移的存储库上运行 git gc --aggressive 适合我。它从 500 MB 减少到 150 MB。
Running
git gc --aggressive
on a repository migrated from Mercurial worked for me. It reduced from 500 MB to 150 MB.