如果不是最新的,则阻止 git Push 发送整个存储库

发布于 2024-09-30 21:23:00 字数 1100 浏览 8 评论 0原文

相关问题:为什么Git发送每次推送整个存储库origin master

简短版本:当使用两个Git存储库时,即使99%的提交对象是相同的,使用git pushorigin 设置为指向存储库 A 时,将提交发送到存储库 B 会导致所有对象(200MB +)转移。

更长的版本:我们在持续集成服务器上设置了第二个 Git 存储库。在本地准备好提交对象后,我们不再像通常那样直接推送到origin/master,而是将更改推送到第二个存储库上的分支。 CI 服务器选择新分支,自动将其重新设置为 master,运行我们的集成测试,如果一切顺利,则将分支推送到 origin/master 上主仓库。

CI 服务器还会定期调用 git fetch 从主存储库检索最新的 origin/master 副本,以防有人绕过 CI 流程并直接推送。

这非常有效,特别是如果执行 git fetch; git rebase origin/master 在推送到 CI 存储库之前; Git 仅发送尚未位于 origin/master 中的提交对象。如果在推送之前跳过 fetch/rebase 步骤,该过程仍然有效,但 Git 似乎会将大部分提交对象(如果不是全部)发送到 CI 存储库 — 目前价值超过 200MB。 (我们的存储库的新克隆大小为 225MB。)

我们做错了什么吗?有没有办法纠正这种行为,使 Git 只发送在 CI 存储库上形成分支所需的提交对象?显然,我们可以通过预推送 git fetch 来解决这个问题; git rebase origin/master,但感觉我们应该能够跳过这一步,特别是因为直接推送到主存储库不会出现同样的问题。

我们的存储库由 Gitosis 0.2 提供,我们的客户绝大多数都运行 msysgit 1.7.3.1-preview。

Related question: why does Git send whole repository each time push origin master

The short version: When working with two Git repositories, even if 99% of the commit objects are identical, using git push to send a commit to repository B when origin is set to point to repo A causes all objects (200MB +) to be transferred.

The much longer version: We have a second Git repository set up on our continuous integration server. After we have prepared our commit objects locally, instead of pushing directly to origin/master as one normally would, we instead push our changes to a branch on this second repository. The CI server picks up the new branch, auto-rebases it onto master, runs our integration tests and, if all is well, pushes the branch to origin/master on the master repo.

The CI server also periodically calls git fetch to retrieve the latest copy of origin/master from the master repo, in case someone has gone around the CI process and pushed directly.

This works wonderfully, especially if one does a git fetch; git rebase origin/master before pushing to the CI repo; Git only sends the commit objects that are not already in origin/master. If one skips the fetch/rebase step before pushing, the process still works, but Git appears to send, if not all, then a majority of commit objects to the CI repo -- currently more than 200MB worth. (A fresh clone of our repo clocks in at 225MB.)

Are we doing something wrong? Is there a way to correct this behaviour such that Git only sends the commit objects it needs to form the branch on the CI repo? We can obviously work around the issue by doing a pre-push git fetch; git rebase origin/master, but it feels like we should be able to skip that step, especially because pushing directly to the master repo does not present the same problem.

Our repos are served up by Gitosis 0.2, and our clients are overwhelmingly running msysgit 1.7.3.1-preview.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

_畞蕅 2024-10-07 21:23:00

...自动将其重新设置为 master...

我认为这就是问题的根源。每次您的 CI 服务器执行此自动变基步骤时,它都会相对于当前分支和主分支最近的共同祖先创建一整套新的提交。

下次您将代码推送到 CI 服务器时,它实际上不再拥有所有这些对象(无法从任何活动头访问它们),因此它会请求您的客户端再次发送所有这些对象。

您应该能够通过观察您正在进行的提交的 SHA1 提交 ID 来看到这种情况的发生。您可能会发现本地提交的提交 ID 不再与 CI 服务器上的 rebased 分支中相应的提交 ID 匹配。

...auto-rebases it onto master...

I think that is the root of the problem right there. Every time your CI server does this auto-rebase step, it will create a whole new set of commits relative to the nearest common ancestor of the current and the master branch.

The next time you push your code to the CI server, it doesn't actually have all those object anymore (they're not reachable from any live heads), so it requests your client to send them all again.

You should be able to see this happening by watching the SHA1 commit IDs of the commits you're making. You will probably find that the commit IDs of local commits no longer match the corresponding commit IDs in the rebased branch on the CI server.

分開簡單 2024-10-07 21:23:00

事实证明,解决这个问题的最简单的解决方案是在推送之前获取:

$ git fetch origin master
$ git push user@host:repo.git HEAD:refs/heads/commit128952690069

在我们的例子中,将特定分支获取到 FETCH_HEAD 中非常重要;这样,用户的本地分支状态将不受影响,但我们仍然从主存储库接收最新的对象集;当 Git 开始打包对象时,下面的 git push 始终会出现祖先提交。

我使用 git pack-objects 做了一些工具:如果构建一个包含提交 ..HEAD 的包文件,它只会打包尽可能多的数据必需:

$ echo $(git merge-base master origin/master)..HEAD | git pack-objects --revs --thin --stdout --all-progress-implied > packfile

但是,在存储库处于相同状态时发出 git push 会导致所有对象被打包和发送。

我怀疑会发生什么情况,在连接到 Git 存储库时,会收到存储库中最新版本的 SHA - 如果 Git 本地没有该 SHA 表示的提交对象,则它无法运行 git merge-base 确定共同祖先;因此,它必须将所有对象发送到远程存储库。如果该提交对象确实存在,则 git merge-base 会成功,并且可以引用共同祖先来构建包文件。

It turns out the simplest solution to this problem is to fetch right before the push:

$ git fetch origin master
$ git push user@host:repo.git HEAD:refs/heads/commit128952690069

In our case, it's important to fetch a specific branch into FETCH_HEAD; in this way, the user's local branch state will be unaffected, but we still receive the most up-to-date set of objects from the main repository; the following git push will always have the ancestor commit present when the Git starts to pack objects.

I did some tooling around with git pack-objects: if one builds a pack file containing the commits <common_ancestor>..HEAD, it only packs as much data as is required:

$ echo $(git merge-base master origin/master)..HEAD | git pack-objects --revs --thin --stdout --all-progress-implied > packfile

However, issuing git push with the repository in the same state causes all objects to get packed and sent.

I suspect what happens is that upon connecting to the Git repo, one receives the SHA of the latest revision in the repo -- if Git does not have the commit object represented by that SHA locally, it cannot run git merge-base to determine the common ancestor; therefore, it must send all the objects to the remote repo. If that commit object does exist, then git merge-base succeeds, and the pack file can be built referencing the common ancestor.

一个人的旅程 2024-10-07 21:23:00

听起来您的本地存储库与 CI 服务器存储库不同步,事实上,从您向 CI 服务器推送意味着您的本地存储库具有一组不同的提交哈希值。它可能是这样的:

git clone master
(... do work ...)
git push ci branch
(... CI does a rebase that changes all the commits hashes you pushed ..)
(... CI does its' testing and pushes to master ...)
(... Now master and CI match but the hashes of all the commits you just pushed
     don't exist anywhere except your local machine ...)
(... do work ...)
git push ci branch

最后一次推送将包含第一次推送的整个提交集,因为 CI 的变基更改了它们的所有哈希值,并且您仍在处理您创建的原始提交。

It sounds like your local repositories got out of sync with the CI server repository, the fact that a push from you to the CI server does this means that your local repository has a different set of commit hashes. It could go something like this:

git clone master
(... do work ...)
git push ci branch
(... CI does a rebase that changes all the commits hashes you pushed ..)
(... CI does its' testing and pushes to master ...)
(... Now master and CI match but the hashes of all the commits you just pushed
     don't exist anywhere except your local machine ...)
(... do work ...)
git push ci branch

That last push is going to contain the entire set of commits from your first push because the CI's rebase changed all of their hashes and you're still working off the original commits that you created.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文