如何从 git 存储库中删除未使用的对象?

发布于 2024-09-25 03:15:29 字数 913 浏览 2 评论 0原文

我不小心添加、提交并推送了一个巨大的二进制文件,其中包含我对 Git 存储库的最新提交。

如何让 Git 删除为该提交创建的对象,以便我的 .git 目录再次缩小到正常大小?

编辑:感谢您的回答;我尝试了几种解决方案。没有一个起作用。例如,GitHub 中的文件从历史记录中删除了,但 .git 目录大小并未减少:

$ BADFILES=$(find test_data -type f -exec echo -n "'{}' " \;)

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $BADFILES" HEAD
Rewrite 14ed3f41474f0a2f624a440e5a106c2768edb67b (66/66)
rm 'test_data/images/001.jpg'
[...snip...]
rm 'test_data/images/281.jpg'
Ref 'refs/heads/master' was rewritten

$ git log -p # looks nice

$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
Counting objects: 625, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (598/598), done.
Writing objects: 100% (625/625), done.
Total 625 (delta 351), reused 0 (delta 0)

$ du -hs .git
174M    .git
$ # still 175 MB :-(

I accidentally added, committed and pushed a huge binary file with my very latest commit to a Git repository.

How can I make Git remove the object(s) that was/were created for that commit so my .git directory shrinks to a sane size again?

Edit: Thanks for your answers; I tried several solutions. None worked. For example the one from GitHub removed the files from the history, but the .git directory size hasn't decreased:

$ BADFILES=$(find test_data -type f -exec echo -n "'{}' " \;)

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $BADFILES" HEAD
Rewrite 14ed3f41474f0a2f624a440e5a106c2768edb67b (66/66)
rm 'test_data/images/001.jpg'
[...snip...]
rm 'test_data/images/281.jpg'
Ref 'refs/heads/master' was rewritten

$ git log -p # looks nice

$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
Counting objects: 625, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (598/598), done.
Writing objects: 100% (625/625), done.
Total 625 (delta 351), reused 0 (delta 0)

$ du -hs .git
174M    .git
$ # still 175 MB :-(

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

如痴如狂 2024-10-02 03:15:29

我在其他地方回答过这个问题,并将复制在这里,因为我对此感到自豪!

...言归正传,我可以向您介绍这个有用的脚本 git-gc-all,保证删除所有 git 垃圾,直到它们可能提出额外的配置变量:

git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 \
  -c gc.rerereresolved=0 -c gc.rerereunresolved=0 \
  -c gc.pruneExpire=now gc "$@"

--aggressive 选项可能会有所帮助。

注意:这将删除所有未引用的东西,所以如果您稍后决定保留其中一些,请不要向我哭泣!

您可能还需要先运行类似的东西,天哪,git 很复杂!

git remote rm origin
rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
git for-each-ref --format="%(refname)" refs/original/ |
  xargs -n1 --no-run-if-empty git update-ref -d

我把所有这些都放在一个脚本中,在这里:

https://ucm.dev /t/bin.git/git-gc-all-ferocious

I answered this elsewhere, and will copy here since I'm proud of it!

... and without further ado, may I present to you this useful script, git-gc-all, guaranteed to remove all your git garbage until they might come up with extra config variables:

git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 \
  -c gc.rerereresolved=0 -c gc.rerereunresolved=0 \
  -c gc.pruneExpire=now gc "$@"

The --aggressive option might be helpful.

NOTE: this will remove ALL unreferenced thingies, so don't come crying to me if you decide later that you wanted to keep some of them!

You might also need to run something like these first, oh dear, git is complicated!!

git remote rm origin
rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
git for-each-ref --format="%(refname)" refs/original/ |
  xargs -n1 --no-run-if-empty git update-ref -d

I put all this in a script, here:

https://ucm.dev/t/bin.git/git-gc-all-ferocious

沐歌 2024-10-02 03:15:29

您的 git reflog expire --all 不正确。它会删除早于过期时间(默认为 90 天)的引用日志条目。使用 git reflog expire --all --expire=now 。

我对类似问题的回答涉及真正的问题从存储库中清除未使用的对象。

Your git reflog expire --all is incorrect. It removes reflog entries that are older than the expire time, which defaults to 90 days. Use git reflog expire --all --expire=now.

My answer to a similar question deals with the problem of really scrubbing unused objects from a repository.

揪着可爱 2024-10-02 03:15:29

1) 从 git 存储库(而不是文件系统)中删除文件:

  • git rm --cached path/to/file

2) 使用以下命令缩小存储库:

  • git gc< /code>,

  • git gc --aggressive

  • git prune

或此问题中建议的上述内容的组合:Reduce git 存储库大小

1) Remove the file from the git repo (& not the filesystem) :

  • git rm --cached path/to/file

2) Shrink the repo using:

  • git gc,

  • or git gc --aggressive

  • or git prune

or a combination of the above as suggested in this question: Reduce git repository size

拥抱影子 2024-10-02 03:15:29

可以使用相同的方法应用本关于删除敏感数据的指南。您将重写历史记录,以从该文件所在的每个修订版中删除该文件。这是破坏性的,并且会导致存储库与任何其他签出发生冲突,因此请首先警告所有协作者。

如果您想在存储库中保留二进制文件以供其他人使用,那么没有真正的方法可以实现您想要的功能。几乎是全有或全无。

This guide on removing sensitive data can apply, using the same method. You'll be rewriting history to remove that file from every revision it was present in. This is destructive and will cause repo conflicts with any other checkouts, so warn any collaborators first.

If you want to keep the binary available in the repo for other people, then there's no real way to do what you want. It's pretty much all or none.

茶色山野 2024-10-02 03:15:29

对我来说,关键是运行 git repack -A -d -f ,然后运行 ​​git gc 来减少我拥有的单个 git 包的大小。

The key for me turned out to be running git repack -A -d -f and then git gc to reduce the size of the single git pack I had.

澜川若宁 2024-10-02 03:15:29

嘿!

Git 在克隆存储库时只接收它实际需要的对象(如果我理解正确的话),

因此您可以修改最后一次提交,删除错误添加的文件,然后将更改推送到远程存储库(使用 -f 选项覆盖旧提交)服务器也是如此)

然后,当您对该存储库进行新克隆时,它的 .git 目录应该与提交大文件之前一样小。

或者,如果您也想从服务器中删除不必要的文件,您可以删除服务器上的存储库并推送新克隆的副本(具有完整的历史记录)

Hy!

Git only receives objects it actually needs when cloning repositories (if I understand it correctly)

So you can amend the last commit removing the file added by mistake, then push your changes to the remote repository (with -f option to overwrite the old commit on the server too)

Then when you make a new clone of that repo, it's .git directory should be as small as before the big file(s) committed.

Optionally if you want to remove the unnecessary files from the server too, you can delete the repository on the server and push your newly cloned copy (that has the full history)

甜是你 2024-10-02 03:15:29
git filter-branch --index-filter 'git rm --cached --ignore-unmatch Filename' --prune-empty -- --all

请记住更改您要从存储库中删除的文件名。

git filter-branch --index-filter 'git rm --cached --ignore-unmatch Filename' --prune-empty -- --all

Remember to change Filename for the one you want to remove from the repository.

我要还你自由 2024-10-02 03:15:29

2020 年,git-filter-branch 的文档不鼓励使用它并建议使用另一种选择,例如 git-filter-repo。也可以使用 而不是BFG

请注意 git 中关于 重写历史记录 的章节书还没更新。也没有 GitHub 关于删除敏感数据的建议

In 2020 the documentation for git-filter-branch discourages its use and recommends using an alternative such as git-filter-repo. It can also be used instead of BFG.

Note that the chapter on Rewriting History in the git book hasn't been updated. Neither has GitHub's recommendation on removing sensitive data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文