你应该多久使用一次 git-gc ?

发布于 2024-07-04 16:09:37 字数 260 浏览 8 评论 0原文

你应该多久使用一次 git-gc ?

手册页简单地说:

鼓励用户在每个存储库中定期运行此任务,以保持良好的磁盘空间利用率和良好的操作性能。

是否有一些命令可以获取一些对象计数以了解是否到了 gc 的时间?

How often should you use git-gc?

The manual page simply says:

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

Are there some commands to get some object counts to find out whether it's time to gc?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

小…红帽 2024-07-11 16:09:37

这主要取决于存储库的使用量。 如果一个用户每天签入一次,每周一次分支/合并/等操作,您可能不需要每年运行一次以上。

由于数十名开发人员正在处理数十个项目,每个项目每天检查 2-3 次,您可能希望每晚运行它。

不过,比需要的频率更频繁地运行它并没有什么坏处。

我要做的就是现在运行它,然后一周后测量磁盘利用率,再次运行它,并再次测量磁盘利用率。 如果大小下降 5%,则每周运行一次。 如果下降更多,则更频繁地运行。 如果下降较少,则减少运行频率。

It depends mostly on how much the repository is used. With one user checking in once a day and a branch/merge/etc operation once a week you probably don't need to run it more than once a year.

With several dozen developers working on several dozen projects each checking in 2-3 times a day, you might want to run it nightly.

It won't hurt to run it more frequently than needed, though.

What I'd do is run it now, then a week from now take a measurement of disk utilization, run it again, and measure disk utilization again. If it drops 5% in size, then run it once a week. If it drops more, then run it more frequently. If it drops less, then run it less frequently.

只等公子 2024-07-11 16:09:37

最新版本的 git 会在需要时自动运行 gc,因此您不必执行任何操作。 请参阅 man git-gc(1 ):“某些 git 命令在执行可能创建许多松散对象的操作后运行 git gc --auto。”

Recent versions of git run gc automatically when required, so you shouldn't have to do anything. See the Options section of man git-gc(1): "Some git commands run git gc --auto after performing operations that could create many loose objects."

久随 2024-07-11 16:09:37

如果您使用 Git-Gui,则 告诉您何时应该担心:

此存储库当前拥有大约 1500 个松散对象。 
  

以下命令将带来类似的数字:

$ git count-objects

除了, 来自其来源,git-gui 会自己进行数学计算,实际上是在 .git/objects 文件夹中计算一些内容,并且可能会带来一个近似值(我不知道 tcl 是否正确)读一下!)。

无论如何,它似乎都会根据大约 300 个松散物体的任意数量发出警告。

If you're using Git-Gui, it tells you when you should worry:

This repository currently has approximately 1500 loose objects.

The following command will bring a similar number:

$ git count-objects

Except, from its source, git-gui will do the math by itself, actually counting something at .git/objects folder and probably brings an approximation (I don't know tcl to properly read that!).

In any case, it seems to give the warning based on an arbitrary number around 300 loose objects.

眸中客 2024-07-11 16:09:37

将其放入每晚(下午?)当你睡觉时运行的 cron 作业中。

Drop it in a cron job that runs every night (afternoon?) when you're sleeping.

还不是爱你 2024-07-11 16:09:37

您可以使用新的 (Git 2.0 Q2 2014) 设置 gc.autodetach

请参阅 提交 4c4ac4d提交 9f673f9 (Nguyễn Thái Ngọc Duy,又名 pcloud s ):

gc --auto 需要时间并且可能会暂时阻止用户(但同样令人烦恼)。
使其在支持它的系统上在后台运行。
在后台运行唯一丢失的是打印输出。 但gc 输出并不是很有趣。
您可以通过更改 gc.autodetach 将其保留在前台。


自 2.0 版本以来,存在一个错误:git 2.7(2015 年第 4 季度)将确保不会丢失错误消息
请参阅 提交 329e6e8(2015 年 9 月 19 日),作者:Nguyễn Thái Ngọc Duy (pclouds)
(由 Junio C Hamano -- gitster -- 合并于 提交 076c827,2015 年 10 月 15 日)

gc:保存来自守护进程gc --auto的日志并在下次打印

虽然提交9f673f9gc:配置选项在后台运行 --auto - 2014-02-08)有助于减少一些关于“gc --auto”占用终端的投诉,但它会产生另一组问题。

这组中的最新情况是,由于守护进程,stderr 被关闭,所有警告都丢失。 cmd_gc() 末尾的警告特别重要,因为它告诉用户如何避免“gc --auto”重复运行。
因为stderr是关闭的,用户不知道,自然会抱怨'gc --auto'浪费CPU。

守护进程 gc 现在将 stderr 保存到 $GIT_DIR/gc.log
在用户删除 gc.log
之前,以下 gc --auto 将不会运行并打印 gc.log

You can do it without any interruption, with the new (Git 2.0 Q2 2014) setting gc.autodetach.

See commit 4c4ac4d and commit 9f673f9 (Nguyễn Thái Ngọc Duy, aka pclouds):

gc --auto takes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. But gc output is not really interesting.
You can keep it in foreground by changing gc.autodetach.


Since that 2.0 release, there was a bug though: git 2.7 (Q4 2015) will make sure to not lose the error message.
See commit 329e6e8 (19 Sep 2015) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit 076c827, 15 Oct 2015)

gc: save log from daemonized gc --auto and print it next time

While commit 9f673f9 (gc: config option for running --auto in background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid "gc --auto" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log.
Following gc --auto will not run and gc.log printed out until the user removes gc.log
.

淡水深流 2024-07-11 16:09:37

这句话摘自;
使用 Git 进行版本控制

Git 自动运行垃圾收集

• 如果存储库中有太多松散对象

• 当推送到远程存储库时

• 在执行一些可能会引入许多松散对象的命令之后

• 当某些命令(例如 git reflog 过期)明确请求时

最后,当您明确请求时,就会发生垃圾收集
使用 git gc 命令。 但那应该是什么时候呢? 没有固体的
回答这个问题,但有一些好的建议和最好的
练习。

您应该考虑在少数情况下手动运行 git gc
情况:

• 如果您刚刚完成 git filter-branch 。 回想起那个
过滤器分支重写许多提交,引入新的,然后离开
裁判上的旧的应该在您满意后删除
与结果。 所有那些死去的物体(不再是
引用,因为您刚刚删除了指向它们的一个引用)
应通过垃圾收集删除。

• 在执行某些可能会引入许多松散对象的命令之后。 这
例如,可能需要进行大量的变基工作。

另一方面,
什么时候应该警惕垃圾收集?

• 如果存在您可能想要恢复的孤立引用

• 在 git rerere 上下文中,您不需要保存
永远的决心

• 在只有标签和分支就足以引起
Git 永久保留提交

• 在 FETCH_HEAD 检索的上下文中(通过 URL 直接检索
git fetch ),因为它们会立即受到垃圾收集

This quote is taken from;
Version Control with Git

Git runs garbage collection automatically:

• If there are too many loose objects in the repository

• When a push to a remote repository happens

• After some commands that might introduce many loose objects

• When some commands such as git reflog expire explicitly request it

And finally, garbage collection occurs when you explicitly request it
using the git gc command. But when should that be? There’s no solid
answer to this question, but there is some good advice and best
practice.

You should consider running git gc manually in a few
situations:

• If you have just completed a git filter-branch . Recall that
filter-branch rewrites many commits, introduces new ones, and leaves
the old ones on a ref that should be removed when you are satisfied
with the results. All those dead objects (that are no longer
referenced since you just removed the one ref pointing to them)
should be removed via garbage collection.

• After some commands that might introduce many loose objects. This
might be a large rebase effort, for example.

And on the flip side,
when should you be wary of garbage collection?

• If there are orphaned refs that you might want to recover

• In the context of git rerere and you do not need to save the
resolutions forever

• In the context of only tags and branches being sufficient to cause
Git to retain a commit permanently

• In the context of FETCH_HEAD retrievals (URL-direct retrievals via
git fetch ) because they are immediately subject to garbage collection

眼趣 2024-07-11 16:09:37

我在进行大量结帐后使用 git gc,并且有很多新对象。 它可以节省空间。 例如,如果您使用 git-svn 签出一个大型 SVN 项目,并执行 git gc,通常可以节省大量空间

I use git gc after I do a big checkout, and have a lot of new object. it can save space. E.g. if you checkout a big SVN project using git-svn, and do a git gc, you typically save a lot of space

妄断弥空 2024-07-11 16:09:37

您不必经常使用 git gc ,因为 git gc (垃圾收集)会在几个常用命令上自动运行:

git pull
git merge
git rebase
git commit

来源:git gc 最佳实践和常见问题解答

You don't have to use git gc very often, because git gc (Garbage collection) is run automatically on several frequently used commands:

git pull
git merge
git rebase
git commit

Source: git gc best practices and FAQS

以可爱出名 2024-07-11 16:09:37

只是为了另一个观点,请注意,您可以拥有不想进行垃圾收集的存储库,自动或以其他方式(用作参考存储库,可能是本地克隆等),因为其他一些存储库使用此 git索引,如果对象消失或它们所在的文件具有不同的名称,则可能会变得无效。

在注重空间的 CI 场上,这可能是相当典型的情况,其中使用某些单个存储库作为基线(甚至可能通过 NFS 或类似的存储库)来为许多不同的测试/构建场景生成构建工作区。 您可以在存储库中使用 git config gc.auto false 来避免意外情况,并在您知道安全时才使用特定于域的脚本来执行 GC(例如,没有正在运行的构建 => 没有代理来执行 GC)。飞行途中腐败),甚至永远不会。

相反,您可能希望使用公共参考存储库,然后在实例化它们将构建的特定提交后分离工作区存储库(这仅复制所需的对象,可能会通过该工作区的浅度/深度设置来加速)以使它们独立,因此当不对主存储库进行 GC 至关重要时,减少时间窗口。

进行这种欺骗的一些原因包括:

  • 使用与 SCM 平台链接速度较慢的 CI 场(例如,从公司 LAN 访问 GitHub 等),这样您只会遭受较长的 git 克隆 或类似的操作(并消耗上行链路流量,这在公司设置中可能成本高昂)每次构建一次,而不是针对每个场景;
  • 确保您想要构建的提交在此构建期间可供所有代理使用(如果有人强制推送到 SCM 平台上的原始存储库/分支,就像在私人分叉的 PR 准备期间经常发生的那样,则可能会直接签出当构建代理准备好执行工作时是不可能的,因为 SCM 平台声称提交哈希不存在),或者对于命名分支构建 - 确保在同一构建的所有场景中使用相同的提示提交(并且是的,随着时间的推移,一些团队也会毫不犹豫地重新定义 git 标签);
  • 作为上述内容的延续 - 您的构建场景实际上可能会准备并归档 git 存储库的 tarball(垃圾收集的和所有),并将其作为临时工件分发给构建代理,以加快工作区实例化速度。 当代理不在同一构建主机甚至不在同一 LAN 上时,这种方法更有用。

来源/免责声明:制作 时吸取的经验教训https://github.com/networkupstools/jenkins-dynamatrix/blob/master/src/org/nut/dynamatrix/DynamatrixStash.groovy 和类似项目

Just for a bit of another point of view, note that you can have repos where you DO NOT WANT to do garbage-collection, automatic or otherwise (used as reference repositories, possibly local clones, etc.) because some other repository uses this git index and may become invalid if objects disappear or files they are in get different names.

This may be a fairly typical situation on a space-conscious CI farm with some single repository used as a baseline (maybe even over NFS or similar) to spawn build workspaces for many different test/build scenarios. There you can git config gc.auto false in the repository to avoid mishaps, and use domain-specific scripting to only GC when you know it is safe to (e.g. no builds running => no agents to corrupt mid-flight) or even never.

Conversely, you may want to use a common reference repository and then detach workspace repos after instantiating the particular commit they would build (this copies just the needed objects, possibly sped up by shallowness/depth settings for that workspace) to make them independent and so reducing the time-window when it is critical to not-GC the main repository.

Some reasons to do this trickery include:

  • Using a CI farm with slow link to the SCM platform (e.g. reaching out to GitHub, etc. from a corporate LAN) so that you only suffer the long-ish git clone or similar operations (and eat the uplink traffic which may be costly in corporate setups) once per build and not for each scenario;
  • Be sure the commit you want to build is available to all agents during this build (if someone force-pushes to the original repo/branch on the SCM platform, as often happens during PR preparations from private forks, a direct checkout from it may be impossible by the time the build agent is ready to do the work because the SCM platform claims the commit hash does not exist), or for named branch builds - to ensure that the same tip commit is used in all scenarios of the same build (and yes, some teams do not shy away from redefining a git tag over time, too);
  • As a continuation of the above - your build scenario might in fact prepare and archive a tarball of the git repository (garbage-collected and all), and distribute it to build agents as a temporary artifact for faster workspace instantiation. Such approach is more useful when the agents are not on the same build host or even same LAN.

Source/Disclaimer: lessons learned while making https://github.com/networkupstools/jenkins-dynamatrix/blob/master/src/org/nut/dynamatrix/DynamatrixStash.groovy and similar projects

孤单情人 2024-07-11 16:09:37

当我进行大的提交时,尤其是当我从存储库中删除更多文件时,我使用......之后,提交速度更快

I use when I do a big commit, above all when I remove more files from the repository.. after, the commits are faster

彼岸花ソ最美的依靠 2024-07-11 16:09:37

请注意,对存储库进行垃圾收集的缺点是,垃圾会被收集。 众所周知,作为计算机用户,我们现在认为是垃圾的文件可能在未来三天内变得非常有价值。 git 保留了大部分碎片,这一事实多次拯救了我——通过浏览所有悬空的提交,我恢复了许多我不小心保存的工作。

所以,在你的私人克隆中不要太爱整洁。 没有什么必要。

OTOH,对于主要用作遥控器的存储库,数据可恢复性的价值值得怀疑,例如。 所有开发人员推送和/或拉出的地方。 在那里,频繁地启动 GC 运行和重新打包可能是明智的。

Note that the downside of garbage-collecting your repository is that, well, the garbage gets collected. As we all know as computer users, files we consider garbage right now might turn out to be very valuable three days in the future. The fact that git keeps most of its debris around has saved my bacon several times – by browsing all the dangling commits, I have recovered much work that I had accidentally canned.

So don’t be too much of a neat freak in your private clones. There’s little need for it.

OTOH, the value of data recoverability is questionable for repos used mainly as remotes, eg. the place all the devs push to and/or pulled from. There, it might be sensible to kick off a GC run and a repacking frequently.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文