从 Git 历史记录中删除敏感文件及其提交
我想在 GitHub 上放置一个 Git 项目,但它包含某些包含敏感数据的文件(用户名和密码,例如 capistrano 的 /config/deploy.rb)。
我知道我可以将这些文件名添加到 .gitignore 中,但这不会删除它们在 Git 中的历史记录。
我也不想通过删除 /.git 目录重新开始。
有没有办法删除 Git 历史记录中特定文件的所有痕迹?
I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).
I know I can add these filenames to .gitignore, but this would not remove their history within Git.
I also don't want to start over again by deleting the /.git directory.
Is there a way to remove all traces of a particular file in your Git history?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
出于所有实际目的,您应该担心的第一事情是更改您的密码!从您的问题中不清楚您的 git 存储库是否完全是本地的,或者您是否有一个其他地方的远程存储库还没有; 如果它是远程的并且不受其他人的保护,那么您就会遇到问题。 如果有人在您修复此问题之前克隆了该存储库,他们将在本地计算机上拥有您的密码副本,并且您无法强迫他们更新到您的“固定”版本,使其从历史记录中消失。 您可以做的唯一安全的事情就是将您使用过的密码更改为其他密码。
解决这个问题后,以下是解决方法。 GitHub 作为常见问题解答准确回答了该问题:
Windows 用户注意:在此命令中使用双引号 (") 而不是单引号
2019 年更新:
这是常见问题解答中的当前代码:
请记住,一旦您将此代码推送到 GitHub 等远程存储库并且其他人克隆了该远程存储库,您现在就处于一个你正在改写历史的情况。 当其他人在此之后尝试下拉您的最新更改时,他们会收到一条消息,指示无法应用更改,因为它不是快进。
要解决此问题,他们必须删除现有存储库并重新克隆它,或者按照 git-rebase 手册页。
提示:执行
git rebase --interactive
将来,如果您不小心提交了一些包含敏感信息的更改,但在推送到远程之前注意到存储库,有一些更简单的修复。 如果您上次提交是添加敏感信息的提交,您可以简单地删除敏感信息,然后运行:
这将使用您所做的任何新更改来修改先前的提交,包括使用 git 完成的整个文件删除rm 。 如果更改在历史记录中更早,但仍未推送到远程存储库,您可以执行交互式变基:
这将打开一个编辑器,其中包含自您与远程存储库的最后一个共同祖先以来所做的提交。 在表示包含敏感信息的提交的任何行上将“pick”更改为“edit”,然后保存并退出。 Git 将遍历这些更改,并让您处于可以执行以下操作的位置:
对于包含敏感信息的每个更改。 最终,您将回到您的分支,并且可以安全地推送新的更改。
For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.
With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:
Note for Windows users: use double quotes (") instead of singles in this command
Update 2019:
This is the current code from the FAQ:
Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the changes can't be applied because it's not a fast-forward.
To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.
Tip: Execute
git rebase --interactive
In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:
That will amend the previous commit with any new changes you've made, including entire file removals done with a
git rm
. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:
For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.
更改密码是个好主意,但对于从存储库历史记录中删除密码的过程,我建议 BFG Repo-Cleaner,是
git-filter-branch
的更快、更简单的替代方案,专门设计用于从 Git 存储库中删除私有数据。创建一个
private.txt
文件,列出要删除的密码等(每行一个条目),然后运行以下命令:存储库中低于阈值大小(默认为 1MB)的所有文件将扫描历史记录,任何匹配的字符串(不在您的最新提交中)都将替换为字符串“***REMOVED***”。 然后,您可以使用 git gc 清理死数据:
BFG 通常比运行 git-filter-branch 快 10-50 倍,并且选项经过简化和定制这两个常见的用例:
完全披露:我是 BFG Repo-Cleaner 的作者。
Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to
git-filter-branch
explicitly designed for removing private data from Git repos.Create a
private.txt
file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use
git gc
to clean away the dead data:The BFG is typically 10-50x faster than running
git-filter-branch
and the options are simplified and tailored around these two common use-cases:Full disclosure: I'm the author of the BFG Repo-Cleaner.
git filter-repo
现在正式推荐超过git filter-branch
这是在
git filter-branch
的联机帮助页中提到的Git 2.5 本身中的代码>。使用 git filter repo,您可以使用以下命令删除某些文件: 从 git/GitHub 的历史记录中删除文件夹及其内容
这会自动删除空提交。
或者您可以将某些字符串替换为: 如何替换整个 Git 历史记录中的某个字符串?
如果你推送到 GitHub,强制推送还不够,请删除存储库或联系支持
即使你在一秒后强制推送,也是如此如下所述还不够。
唯一有效的行动方案是:
是什么泄露了密码等可更改凭证?
是:立即修改您的密码,并考虑使用更多 OAuth 和 API 密钥!
没有(裸照):
您关心存储库中的所有问题是否都被解决了吗?
否:删除存储库
是:
稍后强制推送是不够的,因为:
如果您联系 GitHub 工作人员,他们确实有权删除此类悬空提交。
当我将所有 GitHub 提交电子邮件上传到存储库 他们要求我把它拿下来,我就这么做了,然后他们做了一次
gc
。 包含数据的拉取请求 必须但已删除:因此,在最初删除后一年内,回购数据仍然可以访问。悬空提交可以通过以下方式查看:
在该提交处获取源代码的一种便捷方法是使用下载 zip 方法,该方法可以接受任何引用,例如: https://github.com/cirosantilli/myrepo/archive/SHA.zip
可以通过以下方式获取丢失的 SHA:
type": "PushEvent"
的 API 事件。例如我的:https://api.github.com/users/cirosantilli/events/public (回程机)有像 http://ghtorrent.org/ 和 https://www.githubarchive.org/ 定期汇集 GitHub 数据并将其存储在其他地方。
我无法找到他们是否抓取了实际的提交差异,这不太可能,因为数据太多,但在技术上是可能的,并且 NSA 和朋友可能有过滤器来仅存档与人员或提交相关的内容感兴趣。
但是,如果您删除存储库而不是仅仅强制推送,提交甚至会立即从 API 中消失并给出 404,例如 https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 即使您重新创建另一个同名的存储库,这也有效。
为了测试这一点,我创建了一个存储库: https://github.com/cirosantilli/test-dangling< /a> 并做了:
另请参阅:如何删除来自 GitHub 的悬空提交?
git filter-repo
is now officially recommended overgit filter-branch
This is mentioned in the manpage of
git filter-branch
in Git 2.5 itself.With git filter repo, you could either remove certain files with: Remove folder and its contents from git/GitHub's history
This automatically removes empty commits.
Or you can replace certain strings with: How to replace a string in whole Git history?
If you pushed to GitHub, force pushing is not enough, delete the repository or contact support
Even if you force push one second afterwards, it is not enough as explained below.
The only valid courses of action are:
is what leaked a changeable credential like a password?
yes: modify your passwords immediately, and consider using more OAuth and API keys!
no (naked pics):
do you care if all issues in the repository get nuked?
no: delete the repository
yes:
Force pushing a second later is not enough because:
GitHub keeps dangling commits for a long time.
GitHub staff does have the power to delete such dangling commits if you contact them however.
I experienced this first hand when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a
gc
. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this.Dangling commits can be seen either through:
One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip
It is possible to fetch the missing SHAs either by:
type": "PushEvent"
. E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.
I could not find if they scrape the actual commit diff, and that is unlikely because there would be too much data, but it is technically possible, and the NSA and friends likely have filters to archive only stuff linked to people or commits of interest.
If you delete the repository instead of just force pushing however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.
To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and did:
See also: How to remove a dangling commit from GitHub?
我推荐 David 的这个脚本安德希尔(Underhill)对我来说就像一种魅力。
除了 natacado 的过滤器分支之外,它还添加了这些命令,以清理它留下的混乱:
完整脚本(全部归功于 David Underhill)
如果更改为以下内容,最后两个命令可能会更好地工作:
I recommend this script by David Underhill, worked like a charm for me.
It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:
Full script (all credit to David Underhill)
The last two commands may work better if changed to the following:
您可以使用
gitforget-blob
。用法非常简单
gitforget-blob file-to-forget
。 您可以在此处获取更多信息:它将从你的历史记录、引用日志、标签等中的所有提交中消失
我时不时地遇到同样的问题,每次我必须回到这篇文章和其他文章,这就是为什么我自动化了这个过程。
感谢 Stack Overflow 的贡献者,让我能够将这些内容整合在一起
You can use
git forget-blob
.The usage is pretty simple
git forget-blob file-to-forget
. You can get more info here:It will disappear from all the commits in your history, reflog, tags and so on
I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.
Credits to contributors from Stack Overflow that allowed me to put this together
这是我在windows下的解决方案
确保路径正确
否则它不会工作
我希望它有帮助
Here is my solution in windows
make sure that the path is correct
otherwise it won't work
I hope it helps
使用过滤分支:
Use filter-branch:
需要明确的是:接受的答案是正确的。 先试试吧。 然而,对于某些用例来说,它可能不必要地复杂,特别是如果您遇到令人讨厌的错误,例如“致命:错误的修订--prune-empty”,或者真的不关心您的存储库的历史记录。
另一种方法是:
您的代码
https://help.github.com/articles /adding-an-existing-project-to-github-using-the-command-line/
这当然会删除所有提交历史分支以及 github 存储库和本地 git 存储库中的问题。 如果这是不可接受的,您将不得不使用替代方法。
称之为核选项。
To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.
An alternative would be:
your code
https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/
This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.
Call this the nuclear option.
在我的 Android 项目中,我在 app/src/main/res/values/ 文件夹中将 admob_keys.xml 作为单独的 xml 文件。 为了删除这个敏感文件,我使用了下面的脚本并且工作得很好。
In my android project I had admob_keys.xml as separated xml file in app/src/main/res/values/ folder. To remove this sensitive file I used below script and worked perfectly.
迄今为止我已经这样做过几次了。 请注意,这一次仅适用于 1 个文件。
获取修改文件的所有提交的列表。 底部的将是第一个提交:
git log --pretty=oneline --branches --pathToFile
要从历史记录中删除文件,请使用第一个提交 sha1 和文件路径来自上一个命令,并将它们填充到此命令中:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <文件路径>' --..
I've had to do this a few times to-date. Note that this only works on 1 file at a time.
Get a list of all commits that modified a file. The one at the bottom will the the first commit:
git log --pretty=oneline --branches -- pathToFile
To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..
考虑到 OP 使用 GitHub,如果将敏感数据提交到 Git 存储库,则可以使用前面的选项之一将其从历史记录中完全删除(请在下面阅读有关它们的更多信息):
git 过滤器 - repo
工具(在 GitHub 上查看源代码)。BFG Repo-Cleaner 工具(它是开源的 - 在 GitHub 上查看源代码)。
在执行上述选项之一之后,还需要执行其他步骤。 检查下面的其他部分。
如果目标是删除最近未推送的提交中添加的文件,请阅读下面的替代部分。
为了将来考虑,为了防止类似情况发生,请检查下面的未来部分。
选项 1
使用
git filter-repo
< /a>. 在继续之前,请注意现在让我们从存储库的历史记录中删除一个文件并将其添加到
.gitignore
(以防止再次重新提交)。在继续之前,请确保已安装
git filter-repo
(在此处阅读如何安装它),并且拥有存储库的本地副本(如果不是这种情况,在此处查看如何克隆存储库)。打开 GitBash 并访问存储库。
(可选)备份
.git/config
文件。运行
将
PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
替换为要删除的文件的路径,而不仅仅是其文件名到:强制 Git 进行处理,但不检查每个分支和标签的完整历史记录。
删除指定文件(以及由此生成的空提交)
删除一些配置(例如存储在
.git/config
文件中的远程 URL)覆盖现有标签。
将包含敏感数据的文件添加到
.gitignore
检查是否已从存储库历史记录中删除所有内容,以及是否已签出所有分支。 然后才能进入下一步。
强制推送本地更改以覆盖 GitHub.com 上的存储库以及您推送的所有分支。 需要强制推送才能从提交历史记录中删除敏感数据。 请阅读本答案底部的第一个注释,了解更多详细信息。
选项 2
使用 BFG Repo-Cleaner。 这比 git filter-branch 更快、更简单。
例如,要删除包含敏感数据的文件并保持最新提交不变,请运行
要替换
passwords.txt
中列出的所有文本(无论在存储库历史记录中是否存在),请运行在敏感数据被删除后删除后,必须强制将更改推送到 GitHub。
其他
使用上述选项之一后:
联系GitHub 支持。
(如果与团队合作)告诉他们rebase,而不是合并他们根据旧的(受污染的)存储库历史创建的任何分支。 一次合并提交可能会重新引入一些或全部被污染的历史记录,而这些历史记录是人们刚刚费尽心思清除的。
经过一段时间后,您确信没有任何意外副作用,可以使用以下命令强制取消引用本地存储库中的所有对象并进行垃圾收集(使用 Git 1.8.5 或更高版本):
替代方案
如果文件是使用最近的提交添加的,并且尚未推送到 GitHub.com,则可以删除该文件并修改提交:
打开 GitBash 并访问存储库。
要删除文件,请输入
git rm --cached
:使用
--amend -CHEAD
提交此更改:将提交推送到 GitHub.com:
<前><代码>git推送
# 推送我们重写的、更小的提交
为了未来
为了防止敏感数据被泄露,其他良好做法包括:
使用可视化程序提交更改。 有多种替代方案(例如 GitHub Desktop、GitKraken, gitk, .. .)并且可以更容易地跟踪更改。
避免使用包罗万象的命令
git add .
和git commit -a
。 相反,请使用 git add filename 和 git rm filename 单独暂存文件。使用
git add --interactive
单独检查和暂存每个文件中的更改。使用 git diff --cached 来查看已暂存提交的更改。 只要不使用
-a
标志,这就是git commit
将产生的确切差异。在安全硬件(HSM 盒、硬件密钥 - 如 Yubikey / Solokey)中生成密钥,永远不会离开它。
在 x508 上对团队进行培训。
注释:
当一次强制推送时,它会重写存储库历史记录,从而从提交历史记录中删除敏感数据。 这可能会覆盖其他人的工作所基于的提交。
对于这个答案,我们使用了一些 GitHub 帖子中的内容:
从存储库中删除敏感数据
关于 GitHub 上的大文件
Considering that OP is using GitHub, if one commits sensitive data into a Git repo, one can remove it entirely from the history by using one of the previous options (read more about them below):
The
git filter-repo
tool (view source on GitHub).The BFG Repo-Cleaner tool (it is open source - view source on GitHub).
After one of the previous options, there are additional steps to follow. Check the section Additional below.
If the goal is to remove a file that was added in the most recent unpushed commit, read the section Alternative below.
For future considerations, to prevent similar situations, check the For the Future section below.
Option 1
Using
git filter-repo
. Before moving forward, note thatLet us now remove one file from the history of one's repo and add it to
.gitignore
(to prevent re-committing it again).Before moving forward, make sure that one has
git filter-repo
installed (read here how to install it), and that one has a local copy of one's repo (if that is not the case, see here how to clone a repository).Open GitBash and access the repository.
(Optional) Backup the
.git/config
file.Run
replace
PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
with the path to the file you want to remove, not just its filename to:Force Git to process, but not check out the entire history of every branch and tag.
Remove the specified file (as well as empty commits generated as a result)
Remove some configs (such as remote URL stored in the
.git/config
file)Overwrite one's existing tags.
Add the file with sensitive data to
.gitignore
Check if everything was removed from one's repository history, and that all branches are checked out. Only then move to the next step.
Force-push the local changes to overwrite your repository on GitHub.com, as well as all the branches you've pushed up. A force push is required to remove sensitive data from your commit history. Read the first note at the bottom of this answer for more details one this.
Option 2
Using BFG Repo-Cleaner. This is faster and simpler than
git filter-branch
.For example, to remove one's file with sensitive data and leave your latest commit untouched, run
To replace all text listed in
passwords.txt
wherever it can be found in your repository's history, runAfter the sensitive data is removed, one must force push one's changes to GitHub.
Additional
After using one of the options above:
Contact GitHub Support.
(If working with a team) Tell them to rebase, not merge, any branches they created off of one's old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that one just went to the trouble of purging.
After some time has passed and you're confident that one had no unintended side effects, one can force all objects in one's local repository to be dereferenced and garbage collected with the following commands (using Git 1.8.5 or newer):
Alternative
If the file was added with the most recent commit, and one has not pushed to GitHub.com, one can delete the file and amend the commit:
Open GitBash and access the repository.
To remove the file, enter
git rm --cached
:Commit this change using
--amend -CHEAD
:Push one's commits to GitHub.com:
For the Future
In order to prevent sensitive data to be exposed, other good practices include:
Use a visual program to commit the changes. There are various alternatives (such as GitHub Desktop, GitKraken, gitk, ...) and it could be easier to track the changes.
Avoid the catch-all commands
git add .
andgit commit -a
. Instead, usegit add filename
andgit rm filename
to individually stage files.Use
git add --interactive
to individually review and stage changes within each file.Use
git diff --cached
to review the changes that one has staged for commit. This is the exact diff thatgit commit
will produce as long as one doesn't use the-a
flag.Generate Secret Keys in secure hardware (HSM boxes, hardware keys - like Yubikey / Solokey), that never leaves it.
Train the team on x508.
Notes:
When one force pushes, it rewrites the repository history, which removes sensitive data from the commit history. That may overwrite commits that other people have based their work on.
For this answer one used content from some GitHub posts:
Removing sensitive data from a repository
About large files on GitHub
所以,它看起来像这样:
So, It looks something like this: