如何从 git 存储库中删除旧版本的媒体文件
我有一个 Git 存储库,其中包含几个巨大的媒体文件(图像和音频文件)。这些媒体文件的多个版本已相继提交到存储库。这些文件是相同资产的连续改进版本,并且具有相同的名称。
我只想在 Git 存储库中保留最新版本,因为它变得太大了。
最简单的方法是什么?
如何将这些更改正确传播到上游存储库?
I have a Git repository with several huge media files (images and audio files). Several versions of these media files have been successively commited to the repo. The files are successively refined versions of the same assets, and they have the same name.
I want to keep only the latest version in the Git repository, because it is becoming too big.
What is the simplest way to do this?
How can I propagate these changes correctly to the upstream repository?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
旧线程,但以防万一其他人在这里绊倒......
GitHub & Bitbucket 都推荐使用 BFG Repo-Cleaner。
请参阅:
GitHub:删除敏感数据
Bitbucket:减少存储库大小
Bitbucket:维护 Git 存储库
删除文件的示例1 MB,以及不在 HEAD 中的 jpg、png 和 mp3:
注意:现在您已经推送了更新的转速,远程存储库也应该运行它是 git gc ……否则你将看不到大小的减小。 (参见例如https://stackoverflow.com/a/28782154/3419541)
最后,重新克隆< /em> 存储库,以确保您不会意外地重新提交旧媒体文件 blob。
Old thread but in case someone else stumbles along here…
GitHub & Bitbucket both recommend using BFG Repo-Cleaner.
See:
GitHub: Remove Sensitive Data
Bitbucket: Reduce Repository Size &
Bitbucket: Maintaining a Git Repository
Example to remove files over 1 Megabyte, as well as jpgs, pngs and mp3s that are not in HEAD:
Note: now you've pushed the updated revs, the remote repository should also run it's
git gc
…else you won't see the size reduction. (see e.g. https://stackoverflow.com/a/28782154/3419541)Finally, re-clone the repository to be sure that you don't accidentally re-commit the old media file blobs.
检查章节中的“删除对象”部分 维护和数据恢复。它提供了有关如何从 git 存储库中删除对象的步骤。但请注意,它具有破坏性。
Check the section on 'Removing Objects' in the chapter Maintenance and Data Recovery in the ProGit book. It provides steps about how to go about removing objects from the git repo. But be warned though that it is destructive.
我有一个脚本(github要点)用于从 git 存储库的整个历史记录中删除选定的不需要的文件夹,或者删除除最新版本之外的所有文件夹。
假设所有 git 存储库都位于
~/repos
中是硬编码的,但这很容易更改。它还应该很容易适应单个文件的工作。I have a script (github gist here) to remove a selection of unwanted folders from the entire history of a git repo, or to delete all but the latest version of a folder.
It's hard-coded to assume that all git repositories are in
~/repos
, but that's easy to change. It should also be easy to adapt to work with individual files.正如已经提到的,您将在这里重写历史,因此您必须让协作者(如果有的话)来执行 git rebase。
至于从历史记录中删除特定文件,Github 有一个很好的演练。
对于未来的解决方案,您应该考虑将二进制文件放入子模块中。
https://git-scm.com/docs/git-submodule
https://git-scm.com/book/en/v2/Git-Tools-Submodules
As mentioned already, you will be re-writing history here, so you will have to get collaborators (if any) to do
git rebase
.As for stripping a particular file from history, Github has a nice walkthrough.
For a solution going forward, you should look at putting the binary files in a sub-module.
https://git-scm.com/docs/git-submodule
https://git-scm.com/book/en/v2/Git-Tools-Submodules
据我所知,这是不可能做到的,因为在 git 中,每次提交都取决于截至该点的整个历史记录的内容。因此,摆脱旧的大文件的唯一方法是“重放”整个提交历史记录(最好使用相同的提交时间戳和作者),忽略大文件。请注意,这将产生完全独立的提交历史记录。
这显然不是一个非常可行的方法,因此教训可能是“不要使用 git 来版本化巨大的二进制文件”。相反,您也许可以为这些文件创建一个单独的(忽略的)文件夹,并使用单独的系统来对它们进行版本控制。
As far as I know, this can't be done, because in git, every commit depends on the contents of the entire history up to that point. So the only way to get rid of the old, big files would be to "replay" the entire commit history (preferrably with the same commit timestamps and authors), omitting the big files. Note that this will produce an entirely separate commit history.
This is obviously not a very viable approach, so the lesson is probably "don't use git to version huge binary files". Instead, you could perhaps have a separate (ignored) folder for the files and use a separate system to version control them.