从存储库历史记录中删除大文件后,Git 存储库仍然很大
我有一个代码库(到目前为止)使用 git 来存储其依赖项。存储库本身可在此处使用(警告:它很大)。不用说,我需要从存储库历史记录中删除依赖项,以便将其缩减到合理的大小。
为了找到任何臃肿的斑点,我发布了
git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head
以下结果:
105526b5d3d398b9989d88c2f9fc2d1dc96a85b8 blob 35685609 33600527 31978828 d296935e6ac5f3f58b50c789394c9769116e9c34 斑点 35658016 33593241 112485744 50636f931180a32764edadd854968a971a083f8a 斑点 28360290 25897864 233390 b9e4dd37428e879a258f297b7f5bcfb9ba869695 斑点 13108002 11640713 66661788 08d2720b2414aa07ce419b17d5f80c333c7313b7斑点12551621 11124009 89231035 6197a478a461275a0396f20c28487e9ae619a5f9 斑点 11975135 11058259 148211988 1 50636f931180a32764edadd854968a971a083f8a 549eb0c73776fd0ede27a2fcb03366f76f45a13c 斑点 9136086 8166649 166451273 5bc0a0f04a7004bc16cfab1c091c6b369fb74049 斑点 9072616 8270262 80951514 741480238a6a6ce612cf089245dd46d6890fba9f 斑点 8858569 8080252 101294029 744226651c55b14c1aa8affb78fba4fdf02b577c 斑点 7412220 6766404 186825167
这就是我陷入困境的地方。我可以 git show 这些 blob 并看到它们看起来非常像 jar 文件,但我不明白为什么它们仍然在存储库中。
查找其文件名的各种尝试都失败了。
git repack -a
、git repack -ad
和 git repack -Ad
似乎都没有效果。
I have a codebase that (until now) used git to store its dependencies. The repository itself is available here (warning: it's HUGE). Needless to say, I need to remove the dependencies from the repository history in order to cut it down to a reasonable size.
I started by using David Underhill's instructions to remove the lib
directory from the history. Even after doing this, however, the repository is still over 300M. Issuing git prune
and git repack
helps, but it's still over 180M.
In an attempt to find any bloated blobs, I issued
git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head
with these results:
105526b5d3d398b9989d88c2f9fc2d1dc96a85b8 blob 35685609 33600527 31978828
d296935e6ac5f3f58b50c789394c9769116e9c34 blob 35658016 33593241 112485744
50636f931180a32764edadd854968a971a083f8a blob 28360290 25897864 233390
b9e4dd37428e879a258f297b7f5bcfb9ba869695 blob 13108002 11640713 66661788
08d2720b2414aa07ce419b17d5f80c333c7313b7 blob 12551621 11124009 89231035
6197a478a461275a0396f20c28487e9ae619a5f9 blob 11975135 11058259 148211988 1 50636f931180a32764edadd854968a971a083f8a
549eb0c73776fd0ede27a2fcb03366f76f45a13c blob 9136086 8166649 166451273
5bc0a0f04a7004bc16cfab1c091c6b369fb74049 blob 9072616 8270262 80951514
741480238a6a6ce612cf089245dd46d6890fba9f blob 8858569 8080252 101294029
744226651c55b14c1aa8affb78fba4fdf02b577c blob 7412220 6766404 186825167
This is where I'm stuck. I can git show
these blobs and see that they look very much like jar files, but I can't figure out why they're still in the repo.
Various attempts to find their filenames failed.
git repack -a
, git repack -ad
, and git repack -Ad
all seem to have no effect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在 git gc 上使用
--prune=now
虽然您已成功地将不需要的对象从历史记录中写入,但看起来这些不需要的对象没有被修剪,因为它们太年轻了 默认情况下会被修剪(请参阅
git 上的 配置文档 gc
了解更多细节)。使用 git gc --prune=now 应该可以解决这个问题,或者您可以看到这个答案一个更具核性的选择。虽然这应该可以解决您的最终问题,但一个潜在的问题是很难找到大斑点,以便使用 git filter-branch 删除它们 - 我想说:
。 ..不要使用 git filter-branch
git filter-branch
对于这样的任务使用起来很痛苦,并且有一个更好的、不太知名的工具,称为 BFG,专门设计用于删除大文件 来自 Git 存储库。删除大文件的核心命令如下所示:
任何大小超过 10MB 的 blob(不在您的最新提交中)都将从您的存储库历史记录中完全删除 - 您不必自己手动查找文件,受保护提交中的文件是 安全。
然后,您可以使用 git gc 清理死数据:
BFG 通常是 比在大型存储库上运行
git-filter-branch
快数百倍,并且选项是围绕这两个常见用例量身定制的:完全披露:我是 BFG Repo-Cleaner 的作者。
Use
--prune=now
on git gcAlthough you'd successfully written your unwanted objects out of history, it looks like those unwanted objects were not being pruned because they were too young to be pruned by default (see the configuration docs on
git gc
for a bit more detail). Usinggit gc --prune=now
should handle that, or you could see this answer for a more nuclear option.Although that should fix your final problem, an underlying problem was the difficulty of finding big blobs in order to remove them using
git filter-branch
- to which I would say:...don't use git filter-branch
git filter-branch
is painful to use for a task like this, and there's a much better, less well-known tool called The BFG, specifically designed for removing Large Files from Git repos.The core command to remove big files looks just like this:
Any blob over 10MB in size (that isn't in your latest commit) will be totally removed from your repository's history - you don't have to manually find the files yourself, and files in protected commits are safe.
You can then use
git gc
to clean away the dead data:The BFG is typically hundreds of times faster than running
git-filter-branch
on a big repo and the options are tailored around these two common use-cases:Full disclosure: I'm the author of the BFG Repo-Cleaner.
您是否尝试过运行 git gc ? http://www.kernel.org/pub/software /scm/git/docs/git-gc.html
Have you tried running
git gc
? http://www.kernel.org/pub/software/scm/git/docs/git-gc.html您需要运行 David Underhill 的在存储库中的每个分支上编写脚本,以确保从所有分支中删除引用。
然后,如进一步讨论中所述,使用 git init 初始化新存储库,并从原始存储库中删除code> 然后拉出所有分支。
git pull
或 git remote add originYou need to run David Underhill's script on each branch in the repository to ensure the references are removed from all branches.
Then, as in the further discussion, initialize a new repository with
git init
and eithergit pull
from the original orgit remote add origin <original>
and then pull all branches.我不小心在 git 中存储了我网站的大型
.jpa
备份 -git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-取消匹配 MY_BIG_DIRECTORY_OR_FILE' --tag-name-filter cat -- --all
将
MY_BIG_DIRECTORY_OR_FILE
替换为有问题的文件夹以完全重写您的历史记录,包括标签。来源:
I had accidentally stored large
.jpa
backups of my site in git -git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY_BIG_DIRECTORY_OR_FILE' --tag-name-filter cat -- --all
Relpace
MY_BIG_DIRECTORY_OR_FILE
with the folder in question to completely rewrite your history, including tags.source: