如何触发 Git 远程存储库上的垃圾收集?

发布于 2024-09-08 09:20:23 字数 207 浏览 10 评论 0原文

我们知道,我们可以定期运行 git gc 来打包 .git/objects 下的对象。

但是,对于远程中央 Git 存储库(无论是否裸露),经过多次推送后,myproj.git/objects 下会出现许多文件;每次提交似乎都会在那里创建一个新文件。

这么多文件怎么打包呢? (我的意思是远程中央裸存储库上的那些,而不是本地克隆存储库上的。)

As we know, we can periodically run git gc to pack objects under .git/objects.

In the case of a remote central Git repository (bare or not), though, after many pushes, there many files under myproj.git/objects; each commit seems to create a new file there.

How can I pack that many files? (I mean the ones on the remote central bare repository, not on local clone repository.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梓梦 2024-09-15 09:20:23

远程存储库应配置为在添加提交后根据需要运行 gc。请参阅 git-gcgit-config 手册页中的 gc.auto 文档。

但是,远程存储库不需要那么多垃圾收集,因为它很少有悬空(无法访问)提交。这些通常是由分支删除和变基之类的事情造成的,这些通常只发生在本地仓库中。

因此重新打包需要更多的 gc,这是为了节省存储空间而不是删除实际的垃圾。 gc.auto 变量足以解决这个问题。


2024 年更新:

GitHub 存储库中的垃圾如今很常见,因为典型的 PR 工作流程涉及大量强制推送和变基。

但是,您无法影响远程存储库中的 GC,无论是否为 GitHub。仅当您对托管存储库的远程系统具有 shell 访问权限时,才可以执行此操作。换句话说,垃圾收集必须是存储库所在计算机上的本地操作。

GitHub 确实做了 GC,但它是在无形中发生的。有时,如果不遵循最佳实践,就会产生不幸且令人惊讶的后果。例如,可以在另一个项目中基于源的依赖项中引用 PR 的提交哈希,同时等待 PR 被上游接受。如果该 PR 分支随后被重新设置基础,则源依赖项将继续工作一段时间,因为如果已知其哈希值,仍然可以获取悬空提交。

然而,当 GH 确实在存储库上执行 GC 时,另一个项目的构建将突然因“缺少引用”错误而中断,因为依赖项的存储库中不再存在提交。这可能非常令人困惑,尤其是当设置源依赖项的人不再存在时。更重要的是,弄清楚丢失的哈希最初指的是什么可能非常困难,因为它不再是任何分支的一部分。

如果您非常幸运,提交仍将存在于尚未被垃圾收集的存储库的某人克隆中,并且引用日志可用于找出提交最初所在的分支。然后可以将源依赖项更新为使用 PR 的重新基版本,或者如果 PR 现已合并,则可以完全删除。

这个故事的寓意是:在 PR 中引用提交时要非常小心。它们不稳定,可能会在没有警告的情况下消失。

The remote repo should be configured to run gc as needed after a commit is added. See the documentation of gc.auto in git-gc and git-config man pages.

However, a remote repo shouldn't need all that much garbage collection, since it will rarely have dangling (unreachable) commits. These usually result from things like branch deletion and rebasing, which typically happen only in local repos.

So gc is needed more for repacking, which is for saving storage space rather than removing actual garbage. The gc.auto variable is sufficient for taking care of this.


Update in 2024:

Garbage in GitHub repos is commonplace nowadays, because the typical PR workflow involves a lot of force-pushing and rebasing.

However, you can't influence GC in a remote repo, GitHub or not. You can do it only if you have shell access on the remote system that's hosting the repo. In other words, garbage collection has to be a local operation on the machine where the repo is located.

GitHub does do GC, but it happens invisibly. Sometimes this has unfortunate and surprising consequences when best practices aren't being followed. For example, it's possible to refer to the commit hash of a PR in a source-based dependency in another project, while waiting for a PR to be accepted upstream. If that PR branch is subsequently rebased, the source dependency will continue working for a while, because the dangling commit can still be fetched if its hash is known.

However, when GH does get around to doing GC on the repo, the other project's build will suddenly break with a "missing reference" error because the commit no longer exists in the dependency's repo. This can be very mystifying, especially if the person who set up the source dependency is no longer around. What's more, it can be extremely hard to figure out what the missing hash was originally referring to, because it's no longer part of any branch.

If you're very lucky, the commit will still exist in someone's clone of the repo that hasn't been garbage-collected, and the reflogs can be used to find out what branch the commit was originally in. The source dependency can then be updated to use the rebased version of the PR, or can be dropped altogether if the PR has now been merged.

The moral of the story: be very careful when referring to commits in PRs. They aren't stable and can disappear without warning.

凉城凉梦凉人心 2024-09-15 09:20:23

运行没有问题

git gc

虽然您应该有一些进程可以定期自动处理此问题,但在裸存储库上

git@domU:/pix/git/repositories/abd.git$ ls -l

total 28
drwxrwxr-x   2 git git    6 2010-06-06 02:44 branches
-rw-rw-r--   1 git git   66 2010-06-06 02:44 config
-rw-r--r--   1 git git   23 2011-03-15 18:19 description
-rw-rw-r--   1 git git   23 2010-06-06 02:44 HEAD
drwxrwxr-x   2 git git 4096 2010-06-06 02:44 hooks
drwxrwxr-x   2 git git   20 2010-06-06 02:44 info
drwxrwxr-x 260 git git 8192 2010-09-01 00:26 objects
drwxrwxr-x   4 git git   29 2010-06-06 02:44 refs

$ git gc
Counting objects: 3833, done.
Compressing objects:  31% (1085/3500)...

While you should have some process that takes care of this periodically, automatically, it's no problem run

git gc

on a bare repository

git@domU:/pix/git/repositories/abd.git$ ls -l

total 28
drwxrwxr-x   2 git git    6 2010-06-06 02:44 branches
-rw-rw-r--   1 git git   66 2010-06-06 02:44 config
-rw-r--r--   1 git git   23 2011-03-15 18:19 description
-rw-rw-r--   1 git git   23 2010-06-06 02:44 HEAD
drwxrwxr-x   2 git git 4096 2010-06-06 02:44 hooks
drwxrwxr-x   2 git git   20 2010-06-06 02:44 info
drwxrwxr-x 260 git git 8192 2010-09-01 00:26 objects
drwxrwxr-x   4 git git   29 2010-06-06 02:44 refs

$ git gc
Counting objects: 3833, done.
Compressing objects:  31% (1085/3500)...
吾性傲以野 2024-09-15 09:20:23

多次推送后,myproj.git/objects下有很多文件

git 2.11+(2016 年第 4 季度)和预接收挂钩不会有那么多文件。
在这种情况下,您根本不必触发 git gc

请参阅提交 62fe0eb提交 e34c2e0, 提交 722ff7f、提交 2564d99提交 526f108(2016 年 10 月 3 日),作者: Jeff King ( peff)
(由 Junio C Hamano -- gitster -- 合并于 提交 25ab004,2016 年 10 月 17 日)

receive-pack:隔离对象直到预接收接受

为了让“git Push”的接收端检查接收到的历史记录并决定拒绝推送,从发送端发送的对象需要可供钩子和连接检查机制使用,并且传统上,这是通过将对象存储在接收存储库中并让“git gc”使其过期来完成的。

相反,将新接收到的对象存储在临时区域中,并仅在我们执行操作时重用备用对象存储机制来使它们可用。
决定我们是否接受检查,一旦我们决定,要么将它们迁移到存储库,要么立即清除它们。

该临时区域将由新的环境变量 GIT_QUARANTINE_ENVIRONMENT 设置。

这样,如果(大)推送被 pre-receive 钩子拒绝,这些大对象将不会在 90 天内等待 git gc 清理他们起来。

after many pushes, there many files under myproj.git/objects

There won't be as much with git 2.11+ (Q4 2016) and a pre-receive hook.
In that scenario, you won't have to trigger a git gc at all.

See commit 62fe0eb, commit e34c2e0, commit 722ff7f, commit 2564d99, commit 526f108 (03 Oct 2016) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 25ab004, 17 Oct 2016)

receive-pack: quarantine objects until pre-receive accepts

In order for the receiving end of "git push" to inspect the received history and decide to reject the push, the objects sent from the sending end need to be made available to the hook and the mechanism for the connectivity check, and this was done traditionally by storing the objects in the receiving repository and letting "git gc" to expire it.

Instead, store the newly received objects in a temporary area, and make them available by reusing the alternate object store mechanism to them only while we
decide if we accept the check, and once we decide, either migrate them to the repository or purge them immediately.

That temporary area will be set by the new environment variable GIT_QUARANTINE_ENVIRONMENT.

That way, if a (big) push is rejected by a pre-receive hook, those big objects won't be laying around for 90 days waiting for git gc to clean them up.

千秋岁 2024-09-15 09:20:23

这个问题应该说明您应该多久运行一次垃圾收集。

最简单的选择是使用 Windows 中的计划任务或 Unix 中的 cron 作业来定期运行 git gc 。这样你甚至不需要考虑它。

This question should shed some light on how often you should run garbage collection.

The easiest option would be to use a scheduled task in windows or a cron job in Unix to run git gc periodically. This way you don't even need to think about it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文