你应该多久使用一次 git-gc ?
How often should you use git-gc?
The manual page simply says:
Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.
Are there some commands to get some object counts to find out whether it's time to gc?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这主要取决于存储库的使用量。 如果一个用户每天签入一次,每周一次分支/合并/等操作,您可能不需要每年运行一次以上。
由于数十名开发人员正在处理数十个项目,每个项目每天检查 2-3 次,您可能希望每晚运行它。
不过,比需要的频率更频繁地运行它并没有什么坏处。
我要做的就是现在运行它,然后一周后测量磁盘利用率,再次运行它,并再次测量磁盘利用率。 如果大小下降 5%,则每周运行一次。 如果下降更多,则更频繁地运行。 如果下降较少,则减少运行频率。
It depends mostly on how much the repository is used. With one user checking in once a day and a branch/merge/etc operation once a week you probably don't need to run it more than once a year.
With several dozen developers working on several dozen projects each checking in 2-3 times a day, you might want to run it nightly.
It won't hurt to run it more frequently than needed, though.
What I'd do is run it now, then a week from now take a measurement of disk utilization, run it again, and measure disk utilization again. If it drops 5% in size, then run it once a week. If it drops more, then run it more frequently. If it drops less, then run it less frequently.
最新版本的 git 会在需要时自动运行 gc,因此您不必执行任何操作。 请参阅 man git-gc(1 ):“某些 git 命令在执行可能创建许多松散对象的操作后运行 git gc --auto。”
Recent versions of git run gc automatically when required, so you shouldn't have to do anything. See the Options section of man git-gc(1): "Some git commands run git gc --auto after performing operations that could create many loose objects."
如果您使用 Git-Gui,则 告诉您何时应该担心:
以下命令将带来类似的数字:
除了, 来自其来源,git-gui 会自己进行数学计算,实际上是在 .git/objects 文件夹中计算一些内容,并且可能会带来一个近似值(我不知道 tcl 是否正确)读一下!)。
无论如何,它似乎都会根据大约 300 个松散物体的任意数量发出警告。
If you're using Git-Gui, it tells you when you should worry:
The following command will bring a similar number:
Except, from its source, git-gui will do the math by itself, actually counting something at
.git/objects
folder and probably brings an approximation (I don't knowtcl
to properly read that!).In any case, it seems to give the warning based on an arbitrary number around 300 loose objects.
将其放入每晚(下午?)当你睡觉时运行的 cron 作业中。
Drop it in a cron job that runs every night (afternoon?) when you're sleeping.
您可以使用新的 (Git 2.0 Q2 2014) 设置
gc.autodetach
。请参阅 提交 4c4ac4d 和 提交 9f673f9 (Nguyễn Thái Ngọc Duy,又名 pcloud s ):
自 2.0 版本以来,存在一个错误:git 2.7(2015 年第 4 季度)将确保不会丢失错误消息。
请参阅 提交 329e6e8(2015 年 9 月 19 日),作者:Nguyễn Thái Ngọc Duy (
pclouds
)。(由 Junio C Hamano --
gitster
-- 合并于 提交 076c827,2015 年 10 月 15 日)You can do it without any interruption, with the new (Git 2.0 Q2 2014) setting
gc.autodetach
.See commit 4c4ac4d and commit 9f673f9 (Nguyễn Thái Ngọc Duy, aka pclouds):
Since that 2.0 release, there was a bug though: git 2.7 (Q4 2015) will make sure to not lose the error message.
See commit 329e6e8 (19 Sep 2015) by Nguyễn Thái Ngọc Duy (
pclouds
).(Merged by Junio C Hamano --
gitster
-- in commit 076c827, 15 Oct 2015)这句话摘自;
使用 Git 进行版本控制
This quote is taken from;
Version Control with Git
我在进行大量结帐后使用 git gc,并且有很多新对象。 它可以节省空间。 例如,如果您使用 git-svn 签出一个大型 SVN 项目,并执行 git gc,通常可以节省大量空间
I use git gc after I do a big checkout, and have a lot of new object. it can save space. E.g. if you checkout a big SVN project using git-svn, and do a git gc, you typically save a lot of space
您不必经常使用 git gc ,因为 git gc (垃圾收集)会在几个常用命令上自动运行:
来源:git gc 最佳实践和常见问题解答
You don't have to use
git gc
very often, becausegit gc
(Garbage collection) is run automatically on several frequently used commands:Source: git gc best practices and FAQS
只是为了另一个观点,请注意,您可以拥有不想进行垃圾收集的存储库,自动或以其他方式(用作参考存储库,可能是本地克隆等),因为其他一些存储库使用此 git索引,如果对象消失或它们所在的文件具有不同的名称,则可能会变得无效。
在注重空间的 CI 场上,这可能是相当典型的情况,其中使用某些单个存储库作为基线(甚至可能通过 NFS 或类似的存储库)来为许多不同的测试/构建场景生成构建工作区。 您可以在存储库中使用 git config gc.auto false 来避免意外情况,并在您知道安全时才使用特定于域的脚本来执行 GC(例如,没有正在运行的构建 => 没有代理来执行 GC)。飞行途中腐败),甚至永远不会。
相反,您可能希望使用公共参考存储库,然后在实例化它们将构建的特定提交后分离工作区存储库(这仅复制所需的对象,可能会通过该工作区的浅度/深度设置来加速)以使它们独立,因此当不对主存储库进行 GC 至关重要时,减少时间窗口。
进行这种欺骗的一些原因包括:
git 克隆
或类似的操作(并消耗上行链路流量,这在公司设置中可能成本高昂)每次构建一次,而不是针对每个场景;git 标签
);来源/免责声明:制作 时吸取的经验教训https://github.com/networkupstools/jenkins-dynamatrix/blob/master/src/org/nut/dynamatrix/DynamatrixStash.groovy 和类似项目
Just for a bit of another point of view, note that you can have repos where you DO NOT WANT to do garbage-collection, automatic or otherwise (used as reference repositories, possibly local clones, etc.) because some other repository uses this git index and may become invalid if objects disappear or files they are in get different names.
This may be a fairly typical situation on a space-conscious CI farm with some single repository used as a baseline (maybe even over NFS or similar) to spawn build workspaces for many different test/build scenarios. There you can
git config gc.auto false
in the repository to avoid mishaps, and use domain-specific scripting to only GC when you know it is safe to (e.g. no builds running => no agents to corrupt mid-flight) or even never.Conversely, you may want to use a common reference repository and then detach workspace repos after instantiating the particular commit they would build (this copies just the needed objects, possibly sped up by shallowness/depth settings for that workspace) to make them independent and so reducing the time-window when it is critical to not-GC the main repository.
Some reasons to do this trickery include:
git clone
or similar operations (and eat the uplink traffic which may be costly in corporate setups) once per build and not for each scenario;git tag
over time, too);Source/Disclaimer: lessons learned while making https://github.com/networkupstools/jenkins-dynamatrix/blob/master/src/org/nut/dynamatrix/DynamatrixStash.groovy and similar projects
当我进行大的提交时,尤其是当我从存储库中删除更多文件时,我使用......之后,提交速度更快
I use when I do a big commit, above all when I remove more files from the repository.. after, the commits are faster
请注意,对存储库进行垃圾收集的缺点是,垃圾会被收集。 众所周知,作为计算机用户,我们现在认为是垃圾的文件可能在未来三天内变得非常有价值。 git 保留了大部分碎片,这一事实多次拯救了我——通过浏览所有悬空的提交,我恢复了许多我不小心保存的工作。
所以,在你的私人克隆中不要太爱整洁。 没有什么必要。
OTOH,对于主要用作遥控器的存储库,数据可恢复性的价值值得怀疑,例如。 所有开发人员推送和/或拉出的地方。 在那里,频繁地启动 GC 运行和重新打包可能是明智的。
Note that the downside of garbage-collecting your repository is that, well, the garbage gets collected. As we all know as computer users, files we consider garbage right now might turn out to be very valuable three days in the future. The fact that git keeps most of its debris around has saved my bacon several times – by browsing all the dangling commits, I have recovered much work that I had accidentally canned.
So don’t be too much of a neat freak in your private clones. There’s little need for it.
OTOH, the value of data recoverability is questionable for repos used mainly as remotes, eg. the place all the devs push to and/or pulled from. There, it might be sensible to kick off a GC run and a repacking frequently.