更新开发团队，重写 Git 存储库历史记录，删除大文件

发布于 2024-10-07 11:04:07 字数 919 浏览 18 评论 0原文

我有一个 git 存储库，其中包含一些非常大的二进制文件。我不再需要它们，并且我不关心是否能够从早期提交中签出文件。因此，为了减少存储库的大小，我想从历史记录中完全删除二进制文件。

经过网络搜索后，我得出的结论是，我最好的（唯一的？）选择是使用 git-filter-branch ：

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' HEAD

到目前为止，这似乎是一个好方法吗？

假设答案是肯定的，我还有另一个问题需要解决。 git 手册有此警告：

警告！重写的历史记录中的所有对象都会有不同的对象名称，并且不会与原始分支收敛。您将无法轻松地将重写的分支推送和分发到原始分支之上。如果您不知道完整含义，请不要使用此命令，并且如果简单的单次提交足以解决您的问题，请无论如何避免使用它。（有关重写已发布历史记录的更多信息，请参阅 git-rebase(1) 中的“从上游 REBASE 恢复”部分。）

我们的服务器上有一个远程存储库。每个开发人员都会对其进行推送和拉取。根据上面的警告（以及我对 git-filter-branch 如何工作的理解），我认为我无法运行 git-filter-branch 在我的本地副本上，然后推送更改。

因此，我暂时计划执行以下步骤：

告诉所有开发人员提交、推动并停止工作一段时间。
登录服务器并在中央存储库上运行过滤器。
让每个人删除旧副本并从服务器再次克隆。

这听起来正确吗？这是最好的解决方案吗？

原文

I have a git repo with some very large binaries in it. I no longer need them, and I don't care about being able to checkout the files from earlier commits. So, to reduce the repo size, I want to delete the binaries from the history altogether.

After a web search, I concluded that my best (only?) option is to use git-filter-branch:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' HEAD

Does this seem like a good approach so far?

Assuming the answer is yes, I have another problem to contend with. The git manual has this warning:

WARNING! The rewritten history will have different object names for all the objects and will not converge with the original branch. You will not be able to easily push and distribute the rewritten branch on top of the original branch. Please do not use this command if you do not know the full implications, and avoid using it anyway, if a simple single commit would suffice to fix your problem. (See the "RECOVERING FROM UPSTREAM REBASE" section in git-rebase(1) for further information about rewriting published history.)

We have a remote repo on our server. Each developer pushes to and pulls from it. Based on the warning above (and my understanding of how git-filter-branch works), I don't think I'll be able to run git-filter-branch on my local copy and then push the changes.

So, I'm tentatively planning to go through the following steps:

Tell all my developers to commit, push, and stop working for a bit.
Log into the server and run the filter on the central repo.
Have everyone delete their old copies and clone again from the server.

Does this sound right? Is this the best solution?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷路的信 2024-10-14 11:04:07

是的，您的解决方案会起作用。您还有另一种选择：不在中央存储库上执行此操作，而是在克隆上运行过滤器，然后使用 git push --force --all 推回它。这将迫使服务器接受存储库中的新分支。这仅替代步骤 2；其他步骤相同。

如果您的开发人员非常精通 Git，那么他们可能不必删除旧副本；例如，他们可以获取新的遥控器并根据需要重新调整其主题分支的基础。

回复收藏 0 原文

微凉 2024-10-14 11:04:07

您的计划很好（尽管最好在存储库的裸克隆上执行过滤，而不是在中央服务器上），但优先于 git-filter-branch ，您应该使用我的BFG Repo-Cleaner，更快、更简单的 git-filter-branch 替代方案专为从 Git 存储库中删除大文件而设计。

下载 Java jar（需要 Java 6 或更高版本）并运行以下命令：

$ java -jar bfg.jar  --strip-blobs-bigger-than 1MB  my-repo.git

任何超过 1MB 的 blob size（不在您的最新提交中）将从您的存储库历史记录中完全删除。然后，您可以使用 git gc 清理无效数据：

$ git gc --prune=now --aggressive

BFG 通常比运行 git-filter-branch 快 10-50 倍，并且选项是围绕这两个选项定制的常见用例：

删除疯狂的大文件
删除密码、凭据和其他私人数据

Your plan is good (though it would be better to perform the filtering on a bare clone of your repository, rather than on the central server), but in preference to git-filter-branch you should use my BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch designed specifically for removing large files from Git repos.

Download the Java jar (requires Java 6 or above) and run this command:

$ java -jar bfg.jar  --strip-blobs-bigger-than 1MB  my-repo.git

Any blob over 1MB in size (that isn't in your latest commit) will be totally removed from your repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

The BFG is typically 10-50x faster than running git-filter-branch and the options are tailored around these two common use-cases:

Removing Crazy Big Files
Removing Passwords, Credentials & other Private data

回复收藏 0 原文

几度春秋 2024-10-14 11:04:07

如果您不让开发人员重新克隆，他们很可能会设法将大文件拖回来。例如，如果他们小心地拼接到您将创建的新历史记录上，然后碰巧 git merge 来自未重新定位的本地项目分支，合并提交的父级将包括项目分支，该项目分支最终指向您使用 git filter-branch 删除的整个历史记录。

回复收藏 0 原文

南烟 2024-10-14 11:04:07

您的解决方案并不完整。您应该包含 --tag-name-filter cat 作为过滤器分支的参数，以便包含大文件的标签也被更改。您还应该修改所有引用而不仅仅是 HEAD，因为提交可能位于多个分支中。

这里有一些更好的代码：

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' --tag-name-filter cat -- --all

Github 有一个很好的指南：
https://help.github.com/articles/remove-sensitive-data

Your solution is not complete. You should include --tag-name-filter cat as an argument to filter branch so that the tags that contain the large files are changed as well. You should also modify all refs instead of just HEAD since the commit could be in multiple branches.

Here is some better code:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' --tag-name-filter cat -- --all

Github has a good guide:
https://help.github.com/articles/remove-sensitive-data

回复收藏 0 原文

~没有更多了~

关于作者

鹤舞

暂无简介

文章

606 人气

关注发私信

友情链接

文江博客

更新开发团队，重写 Git 存储库历史记录，删除大文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

更新开发团队，重写 Git 存储库历史记录，删除大文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。