当前位置：文江博客话题详情

Mercurial 存储库如何随着时间的推移而增长？

发布于 2024-09-29 23:32:13 字数 297 浏览 3 评论 0原文

假设我创建了一个存储库，向其中添加 x 个文件并提交。假设初始提交后大小为a Mb。

有什么方法可以估计一年后存储库有多大？
如果代码行数增加了10%，存储库会相应增长吗？
提交、分支、标签等的数量如何影响存储库的大小？
同年 10000 次提交会使存储库增长（明显）超过 1000 次提交吗？
也许我的问题措辞错误？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱的故事 2024-10-06 23:32:13

对 Mercurial 存储库的更改存储为完整文件或相对于先前版本的压缩增量：

https://www.mercurial-scm.org/wiki/FAQ#FAQ.2BAC8-TechnicalDetails.How_does_Mercurial_store_its_data.3F

Mercurial 决定是否存储完整文件与基于所做更改量的增量。

这意味着添加代码行不仅会增加存储库的总大小，还会：

对现有代码所做的更改数量。
每次提交对每个文件所做的更改数。
添加并随后删除的文件数。

Mercurial 会保留所有已删除的文件。您可以将一个 1GB 的文件添加到您的存储库中，然后将其删除；行数没有增加，但由于文件保留在存储库中，因此存储库将变得相当大。

依次回答您的问题：

我认为在 x 个月后粗略估计存储库的大小是可行的，假设您保持存储库总体变化率稳定（即添加/删除/更改文件）以相同的速率，每次提交更改大致相同的行数）。
将代码行数增加 10% 并不能告诉我们删除/更改了多少行，因此代码行数的增加不一定对应于存储库大小的相同增加。
标签对 Mercurial 存储库大小的影响不会超过几个字节。分支也不会，直到您开始处理它们，此时它们会增加与处理尖端相同的开销。假设发生相同的变化率，提交数量应该与存储库大小成合理的比例。
频繁提交 10 倍可能不会增加文件大小，因为对存储库大小的主要影响是变化率，而不是提交次数。

回复收藏 0 原文

╰◇生如夏花灿烂 2024-10-06 23:32:13

直接估计一年的大小显然是不可能的，除非您对提交数量和工作树的最终大小有所了解。

也就是说，git 的磁盘空间效率非常高。它绝对不会存储给定版本文件的多个副本（这在内部表示为 blob），并且旧的 blob 会被增量压缩到包中。这意味着它在存储纯文本时非常有效，而在存储大型二进制文件时效率非常低。如果您的项目主要是纯文本，那么您几乎肯定无需担心。

分支和标签对大小基本上没有影响。当然，分支的引用日志可能会达到几 KB，但这没有什么可担心的。轻量级标签几乎只是存储的 SHA1，带注释的标签只是向其中添加了一点元数据。

至于代码行数和提交次数，很难准确地说。一般来说，提交是比代码行更重要的因素；您可以将许多版本的文件全部加起来（甚至表示为增量），但实际内容只需存储一次。工作树往往比 .git 目录要多这一事实支持了这一点。例如，我的 git.git 克隆有一个 17MB 的工作树和一个 39MB 的 .git 目录。我检查的其他项目也有类似的比率。

更多相同大小的提交肯定会让存储库增长得更多，但是将 1000 次提交分成 10000 次（包含相同的更改）不会使存储库变得更大。提交对象本身很小；这是占用空间的文件中的差异。您可能会看到初始大小激增，因为提交仅定期进行增量压缩，但一旦触发 git gc --auto ，这些提交将被压缩回来。

我能做出的最好的概括是，存储库的 .git 目录将倾向于以与每次增量量成正比的速度增长，这通常应该与工作树大小和速度成正比。哪些人正在修改该项目。当然，这太笼统了，完全没有帮助，但你就是这样。

如果你想估计，我只需在第一个月左右获取一些数据，然后尝试拟合一条曲线。

Directly estimating the size in a year is obviously impossible, unless you have some idea of the number of commits and the final size of the work tree.

That said, git is pretty disk-space efficient. It absolutely never stores more than one copy of a given version of a file (this is internally represented as a blob), and older blobs are delta-compressed into packs. This means that it is very efficient at storing plain text, and very inefficient with large binary files. If your project is predominantly plain text, you almost certainly have nothing to worry about.

Branches and tags have essentially no effect on size. Sure, a branch's reflog could get up to a few KB, but that's nothing to worry about. Lightweight tags are pretty much just a stored SHA1, and annotated tags just add a tiny bit of metadata to that.

As for lines of code and number of commits, it's hard to say exactly. Generally the commits are a much bigger factor than the lines of code; you can have many many version of files all adding up (even represented as deltas) but the actual content only has to be stored once. This is backed up by the fact that work trees tend to be much than the .git directory. For example, my clone of git.git has a 17MB work tree and a 39MB .git directory. Other projects I examined had similar ratios.

More commits of equal size would certainly make the repository grow more, but taking 1000 commits and splitting them up into 10000 (encompassing the same changes) wouldn't make the repository much bigger. The commit objects themselves are small; it's the differences in the files that take space. You might see an initial spike in size, as commits are only periodically delta-compressed, but once git gc --auto gets triggered, those commits will get compressed back down.

The best generalization I can make is that a repository's .git directory will tend to grow at a rate proportional to the amount of delta per time, which in general should be proportional to work tree size and the rate at which people are modifying the project. This is of course so general as to be completely unhelpful, but there you are.

If you want to estimate, I'd just take some data over the first month or so, and try and fit a curve.

回复收藏 0 原文