我刚刚在 http://blip.tv/play/Aeu2CAI。
Git 如何存储所有文件的所有版本,以及如何在空间上比 Subversion< /a> 只保存最新版本的代码?
我知道这可以通过压缩来完成,但这会以速度为代价,但这也表明 Git 更快(尽管它获得最大收益的事实是它的大部分操作都是离线的)。
所以,我的猜测是
- Git 广泛压缩数据
- 它仍然更快,因为
uncompression + work
仍然比 network_fetch + work
我正确吗?甚至接近?
I just saw the first Git tutorial at http://blip.tv/play/Aeu2CAI.
How does Git store all the versions of all the files, and how can it still be more economical in space than Subversion which saves only the latest version of the code?
I know this can be done using compression, but that would be at the cost of speed, but this also says that Git is much faster (though where it gains the maximum is the fact that most of its operations are offline).
So, my guess is that
- Git compresses data extensively
- It is still faster because
uncompression + work
is still faster than network_fetch + work
Am I correct? Even close?
发布评论
评论(2)
我假设您在问 git 克隆(完整存储库 + 签出)如何可能比 Subversion 中签出的源文件更小。或者你还有别的意思吗?
这个问题在评论中得到了回答
存储库大小
首先,您应该考虑到,在签出(工作版本)时,Subversion 在这些
.svn
子目录中存储原始副本(最新版本)。原始副本未压缩地存储在 Subversion 中。其次,git 使用以下技术来缩小存储库:
性能(操作速度)
首先,任何涉及网络的操作都会比本地操作慢得多。因此,例如将工作区的当前状态与其他版本进行比较,或者获取日志(历史记录),这在Subversion中涉及网络连接和网络传输,而在Git中是本地操作,当然在Subversion中会比Subversion慢得多在 Git 中。顺便提一句。这是集中式版本控制系统(使用客户端-服务器工作流程)和分布式版本控制系统(使用点对点工作流程)之间的区别,不仅是 Subversion 和吉特。
其次,如果我理解正确的话,现在的限制不是CPU,而是IO(磁盘访问)。因此,由于压缩(并且能够将其映射到内存中)而必须从磁盘读取更少数据所获得的收益可能会克服必须解压缩数据所带来的损失。
第三,Git 的设计考虑了性能(参见 Git Wiki 上的 GitHistory 页面) ):
core.trustctime
配置变量)。pack.depth
,默认为 50。Git 有增量缓存来加快访问速度。有(生成的)packfile 索引,用于快速访问 packfile 中的对象。git log
”的第一页,并且您几乎可以立即看到它,即使生成完整历史记录需要更多时间;它不会等待完整的历史记录生成后再显示。我不是 Git 黑客,我可能错过了 Git 用于获得更好性能的一些技术和技巧。但请注意,Git 为此大量使用 POSIX(如内存映射文件),因此在 MS Windows 上的增益可能不会那么大。
I assume you are asking how it is possible for a git clone (full repository + checkout) to be smaller than checked-out sources in Subversion. Or did you mean something else?
This question is answered in the comments
Repository size
First you should take into account that along checkout (working version) Subversion stores pristine copy (last version) in those
.svn
subdirectories. Pristine copy is stored uncompressed in Subversion.Second, git uses the following techniques to make repository smaller:
Performance (speed of operations)
First, any operation that involves network would be much slower than a local operation. Therefore for example comparing current state of working area with some other version, or getting a log (a history), which in Subversion involves network connection and network transfer, and in Git is a local operation, would of course be much slower in Subversion than in Git. BTW. this is the difference between centralized version control systems (using client-server workflow) and distributed version control systems (using peer-to-peer workflow), not only between Subversion and Git.
Second, if I understand it correctly, nowadays the limitation is not CPU but IO (disk access). Therefore it is possible that the gain from having to read less data from disk because of compression (and being able to mmap it in memory) overcomes the loss from having to decompress data.
Third, Git was designed with performance in mind (see e.g. GitHistory page on Git Wiki):
core.trustctime
config variable).pack.depth
, which defaults to 50. Git has delta cache to speed up access. There is (generated) packfile index for fast access to objects in packfile.git log
" as fast as possible, and you see it almost immediately, even if generating full history would take more time; it doesn't wait for full history to be generated before displaying it.I am not a Git hacker, and I probably missed some techniques and tricks that Git uses for better performance. Note however that Git heavily uses POSIX (like memory mapped files) for that, so the gain might be not as large on MS Windows.
不是完整的答案,但那些评论(来自AlBlue) 可能有助于解决问题的空间管理方面:
(有一些 速度方面,我在我的“如何git 比远程操作颠覆更快吗?”答案(就像 Linus 在其 Google 演示中所说:(此处释义)“任何涉及网络的事情都会扼杀性能”)
以及 Jakub Narębski 提到的 >GitBenchmark 文档 是一个很好的补充,尽管它不直接处理与颠覆。
它确实列出了您需要在 DVCS 性能方面监控的操作类型。
这个SO问题中提到了其他Git基准。
Not a complete answer, but those comments (from AlBlue) might help on the space management aspect of the question:
As for the speed aspect, I mentioned it in my "How fast is git over subversion with remote operations?" answer (like Linus said in its Google presentation: (paraphrasing here) "anything involving network will just kill the performances")
And the GitBenchmark document mentioned by Jakub Narębski is a good addition, even though it doesn't deal directly with Subversion.
It does list the kind of operation you need to monitor on a DVCS performance-wise.
Other Git benchmarks are mentioned in this SO question.