Mercurial 存储库如何随着时间的推移而增长?

发布于 2024-09-29 23:32:13 字数 297 浏览 3 评论 0原文

假设我创建了一个存储库,向其中添加 x 个文件并提交。假设初始提交后大小为a Mb。

  • 有什么方法可以估计一年后存储库有多大?

  • 如果代码行数增加了10%,存储库会相应增长吗?

  • 提交、分支、标签等的数量如何影响存储库的大小?

  • 同年 10000 次提交会使存储库增长(明显)超过 1000 次提交吗?

  • 也许我的问题措辞错误?

Let's say I create a repository, add x files to it and commit. Say the size is a Mb after the initial commit.

  • Is there any way to estimate how large the repository is going to be in one years time?

  • If the lines of code has increased by 10%, will the repository have grown accordingly?

  • How does number of commits, branches, tags etc. factor into the repository size?

  • Will 10000 commits the same year make the repository grow (noticeably) more than say 1000 commits?

  • Maybe my question is wrongly phrased?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱的故事 2024-10-06 23:32:13

对 Mercurial 存储库的更改存储为完整文件或相对于先前版本的压缩增量:

https://www.mercurial-scm.org/wiki/FAQ#FAQ.2BAC8-TechnicalDetails.How_does_Mercurial_store_its_data.3F

Mercurial 决定是否存储完整文件与基于所做更改量的增量。

这意味着添加代码行不仅会增加存储库的总大小,还会:

  1. 对现有代码所做的更改数量。
  2. 每次提交对每个文件所做的更改数。
  3. 添加并随后删除的文件数。

Mercurial 会保留所有已删除的文件。您可以将一个 1GB 的文件添加到您的存储库中,然后将其删除;行数没有增加,但由于文件保留在存储库中,因此存储库将变得相当大。

依次回答您的问题:

  • 我认为在 x 个月后粗略估计存储库的大小是可行的,假设您保持存储库总体变化率稳定(即添加/删除/更改文件)以相同的速率,每次提交更改大致相同的行数)。

  • 将代码行数增加 10% 并不能告诉我们删除/更改了多少行,因此代码行数的增加不一定对应于存储库大小的相同增加。

  • 标签对 Mercurial 存储库大小的影响不会超过几个字节。分支也不会,直到您开始处理它们,此时它们会增加与处理尖端相同的开销。假设发生相同的变化率,提交数量应该与存储库大小成合理的比例。

  • 频繁提交 10 倍可能不会增加文件大小,因为对存储库大小的主要影响是变化率,而不是提交次数。

Changes to a Mercurial repository are stored as either a complete file or as a compressed delta against the previous version:

https://www.mercurial-scm.org/wiki/FAQ#FAQ.2BAC8-TechnicalDetails.How_does_Mercurial_store_its_data.3F

Mercurial makes the decision about whether to store a complete file versus a delta based on the amount of changes made.

This means that it's not just adding lines of code that will increase the total size of a repository, but also:

  1. The number of changes made to existing code.
  2. The number of changes made to each file per commit.
  3. The number of files that are added and subsequently deleted.

Mercurial retains all deleted files. You could add a 1GB file to your repository and then delete it; the number of lines hasn't increased, but because the file remains in the repository, the repository will be considerably larger.

To answer your questions in turn:

  • I imagine it's feasible to roughly estimate the size of a repository after x months, assuming that you maintain a steady rate of change to the repository in total (ie. you add/remove/alter files at the same rate, changing roughly the same number of lines per commit).

  • Increasing the number of lines of code by 10% doesn't tell us how many lines were deleted/altered, so an increase in lines of code won't necessarily correspond to the same increase in repo size.

  • Tags don't affect Mercurial repo size by more than a handful of bytes. Nor do branches, until you start working on them, at which point they add the same overhead as working on the tip. Number of commits should be reasonably proportional to the repo size, assuming the same rate of change occurs.

  • Committing 10x as often probably won't increase the file size, as it is the rate of change that is the main influence on repo size, not number of commits.

╰◇生如夏花灿烂 2024-10-06 23:32:13

直接估计一年的大小显然是不可能的,除非您对提交数量和工作树的最终大小有所了解。

也就是说,git 的磁盘空间效率非常高。它绝对不会存储给定版本文件的多个副本(这在内部表示为 blob),并且旧的 blob 会被增量压缩到包中。这意味着它在存储纯文本时非常有效,而在存储大型二进制文件时效率非常低。如果您的项目主要是纯文本,那么您几乎肯定无需担心。

分支和标签对大小基本上没有影响。当然,分支的引用日志可能会达到几 KB,但这没有什么可担心的。轻量级标签几乎只是存储的 SHA1,带注释的标签只是向其中添加了一点元数据。

至于代码行数和提交次数,很难准确地说。一般来说,提交是比代码行更重要的因素;您可以将许多版本的文件全部加起来(甚至表示为增量),但实际内容只需存储一次。工作树往往比 .git 目录要多这一事实支持了这一点。例如,我的 git.git 克隆有一个 17MB 的工作树和一个 39MB 的 .git 目录。我检查的其他项目也有类似的比率。

更多相同大小的提交肯定会让存储库增长得更多,但是将 1000 次提交分成 10000 次(包含相同的更改)不会使存储库变得更大。提交对象本身很小;这是占用空间的文件中的差异。您可能会看到初始大小激增,因为提交仅定期进行增量压缩,但一旦触发 git gc --auto ,这些提交将被压缩回来。

我能做出的最好的概括是,存储库的 .git 目录将倾向于以与每次增量量成正比的速度增长,这通常应该与工作树大小和速度成正比。哪些人正在修改该项目。当然,这太笼统了,完全没有帮助,但你就是这样。

如果你想估计,我只需在第一个月左右获取一些数据,然后尝试拟合一条曲线。

Directly estimating the size in a year is obviously impossible, unless you have some idea of the number of commits and the final size of the work tree.

That said, git is pretty disk-space efficient. It absolutely never stores more than one copy of a given version of a file (this is internally represented as a blob), and older blobs are delta-compressed into packs. This means that it is very efficient at storing plain text, and very inefficient with large binary files. If your project is predominantly plain text, you almost certainly have nothing to worry about.

Branches and tags have essentially no effect on size. Sure, a branch's reflog could get up to a few KB, but that's nothing to worry about. Lightweight tags are pretty much just a stored SHA1, and annotated tags just add a tiny bit of metadata to that.

As for lines of code and number of commits, it's hard to say exactly. Generally the commits are a much bigger factor than the lines of code; you can have many many version of files all adding up (even represented as deltas) but the actual content only has to be stored once. This is backed up by the fact that work trees tend to be much than the .git directory. For example, my clone of git.git has a 17MB work tree and a 39MB .git directory. Other projects I examined had similar ratios.

More commits of equal size would certainly make the repository grow more, but taking 1000 commits and splitting them up into 10000 (encompassing the same changes) wouldn't make the repository much bigger. The commit objects themselves are small; it's the differences in the files that take space. You might see an initial spike in size, as commits are only periodically delta-compressed, but once git gc --auto gets triggered, those commits will get compressed back down.

The best generalization I can make is that a repository's .git directory will tend to grow at a rate proportional to the amount of delta per time, which in general should be proportional to work tree size and the rate at which people are modifying the project. This is of course so general as to be completely unhelpful, but there you are.

If you want to estimate, I'd just take some data over the first month or so, and try and fit a curve.

落日海湾 2024-10-06 23:32:13

Take a look at GitBenchmarks page on Git wiki, the section "Repository size benchmarks" and "Other benchmarks and references" (taking into account when the benchmark was made, and what versions it uses), in particular the entry at the end page:

  • DVCS Round-up: One System to Rule Them All? -- Part 3 by Robert Fendt on Linux Developer Network, from 27-01-2009, contains results of two synthetic benchmarks testing how a system acts under stress (number of commits in repository, or number of files comitted).

    The test system was a VM running Ubuntu 8.10, and the software versions used were SVK 2.0.2 (last is 2.2.3), darcs 2.1.0 (last is 2.4.4), monotone 0.42 (last is 0.48), Bazaar 1.10 (last is 2.2.1), Mercurial 1.1.2 (last is 1.6.4), and Git 1.6.1 (last is 1.7.3).

郁金香雨 2024-10-06 23:32:13

如果您担心蘑菇的大小,请克隆一些在线项目并检查其存储库的大小。有很多大型项目可供选择,其中包括分支提交等。我的经验是,git & 。善变并且非常擅长保持较小的大小,大小更多地反映了您放入其中的文件(及其大小)而不是开销。

If you're worried about size mushroomin, go and clone some online projects and examine the size of their repositories. There are plenty of large projects to choose from with branches commits, etc, etc. My experience is that git & mercurial and pretty good about keeping size down, the size is a reflection more of the files that you put into them (and their size) rather than overhead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文