如何从 git 存储库中删除旧版本的媒体文件

发布于 2024-11-15 16:38:27 字数 167 浏览 5 评论 0原文

我有一个 Git 存储库,其中包含几个巨大的媒体文件(图像和音频文件)。这些媒体文件的多个版本已相继提交到存储库。这些文件是相同资产的连续改进版本,并且具有相同的名称。

我只想在 Git 存储库中保留最新版本,因为它变得太大了。
最简单的方法是什么?
如何将这些更改正确传播到上游存储库?

I have a Git repository with several huge media files (images and audio files). Several versions of these media files have been successively commited to the repo. The files are successively refined versions of the same assets, and they have the same name.

I want to keep only the latest version in the Git repository, because it is becoming too big.
What is the simplest way to do this?
How can I propagate these changes correctly to the upstream repository?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

蓝色星空 2024-11-22 16:38:27

旧线程,但以防万一其他人在这里绊倒......

GitHub & Bitbucket 都推荐使用 BFG Repo-Cleaner

请参阅:
GitHub:删除敏感数据
Bitbucket:减少存储库大小
Bitbucket:维护 Git 存储库

删除文件的示例1 MB,以及不在 HEAD 中的 jpg、png 和 mp3:

# First get the latest bfg.jar, then:
$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg.jar --strip-blobs-bigger-than 1M --delete-files '*.{jpg,png,mp3}' some-big-repo.git
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

注意:现在您已经推送了更新的转速,远程存储库也应该运行它是 git gc ……否则你将看不到大小的减小。 (参见例如https://stackoverflow.com/a/28782154/3419541

最后,重新克隆< /em> 存储库,以确保您不会意外地重新提交旧媒体文件 blob。

Old thread but in case someone else stumbles along here…

GitHub & Bitbucket both recommend using BFG Repo-Cleaner.

See:
GitHub: Remove Sensitive Data
Bitbucket: Reduce Repository Size &
Bitbucket: Maintaining a Git Repository

Example to remove files over 1 Megabyte, as well as jpgs, pngs and mp3s that are not in HEAD:

# First get the latest bfg.jar, then:
$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg.jar --strip-blobs-bigger-than 1M --delete-files '*.{jpg,png,mp3}' some-big-repo.git
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

Note: now you've pushed the updated revs, the remote repository should also run it's git gc …else you won't see the size reduction. (see e.g. https://stackoverflow.com/a/28782154/3419541)

Finally, re-clone the repository to be sure that you don't accidentally re-commit the old media file blobs.

亽野灬性zι浪 2024-11-22 16:38:27

检查章节中的“删除对象”部分 维护和数据恢复。它提供了有关如何从 git 存储库中删除对象的步骤。但请注意,它具有破坏性。

Check the section on 'Removing Objects' in the chapter Maintenance and Data Recovery in the ProGit book. It provides steps about how to go about removing objects from the git repo. But be warned though that it is destructive.

荒人说梦 2024-11-22 16:38:27

我有一个脚本(github要点)用于从 git 存储库的整个历史记录中删除选定的不需要的文件夹,或者删除除最新版本之外的所有文件夹。

假设所有 git 存储库都位于 ~/repos 中是硬编码的,但这很容易更改。它还应该很容易适应单个文件的工作。

I have a script (github gist here) to remove a selection of unwanted folders from the entire history of a git repo, or to delete all but the latest version of a folder.

It's hard-coded to assume that all git repositories are in ~/repos, but that's easy to change. It should also be easy to adapt to work with individual files.

原来是傀儡 2024-11-22 16:38:27

正如已经提到的,您将在这里重写历史,因此您必须让协作者(如果有的话)来执行 git rebase。

至于从历史记录中删除特定文件,Github 有一个很好的演练

对于未来的解决方案,您应该考虑将二进制文件放入子模块中。

Git 的子模块支持允许存储库作为子目录包含外部项目的签出。子模块维护自己的身份;子模块支持仅存储子模块存储库位置和提交 ID,因此克隆包含项目(“超级项目”)的其他开发人员可以轻松克隆同一修订版的所有子模块。超级项目的部分签出是可能的:您可以告诉 Git 不克隆、部分或全部子模块。

https://git-scm.com/docs/git-submodule

https://git-scm.com/book/en/v2/Git-Tools-Submodules

As mentioned already, you will be re-writing history here, so you will have to get collaborators (if any) to do git rebase.

As for stripping a particular file from history, Github has a nice walkthrough.

For a solution going forward, you should look at putting the binary files in a sub-module.

Git's submodule support allows a repository to contain, as a subdirectory, a checkout of an external project. Submodules maintain their own identity; the submodule support just stores the submodule repository location and commit ID, so other developers who clone the containing project ("superproject") can easily clone all the submodules at the same revision. Partial checkouts of the superproject are possible: you can tell Git to clone none, some or all of the submodules.

https://git-scm.com/docs/git-submodule

https://git-scm.com/book/en/v2/Git-Tools-Submodules

冷情妓 2024-11-22 16:38:27

据我所知,这是不可能做到的,因为在 git 中,每次提交都取决于截至该点的整个历史记录的内容。因此,摆脱旧的大文件的唯一方法是“重放”整个提交历史记录(最好使用相同的提交时间戳和作者),忽略大文件。请注意,这将产生完全独立的提交历史记录。

这显然不是一个非常可行的方法,因此教训可能是“不要使用 git 来版本化巨大的二进制文件”。相反,您也许可以为这些文件创建一个单独的(忽略的)文件夹,并使用单独的系统来对它们进行版本控制。

As far as I know, this can't be done, because in git, every commit depends on the contents of the entire history up to that point. So the only way to get rid of the old, big files would be to "replay" the entire commit history (preferrably with the same commit timestamps and authors), omitting the big files. Note that this will produce an entirely separate commit history.

This is obviously not a very viable approach, so the lesson is probably "don't use git to version huge binary files". Instead, you could perhaps have a separate (ignored) folder for the files and use a separate system to version control them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文