如何替换 git 历史记录中文件中的文本?

发布于 2024-09-30 16:41:07 字数 639 浏览 4 评论 0原文

我一直使用基于界面的 git 客户端(smartGit),因此对 git 控制台没有太多经验。

但是,我现在需要替换历史记录中所有 .txt 文件中的字符串(因此,不会删除整个文件,而只是替换字符串)。我发现了以下命令:

git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all

我尝试了这个,不幸的是注意到虽然密码确实被更改,但所有二进制文件都已损坏。图像等都会被损坏。

有没有更好的方法来做到这一点,不会损坏我的二进制文件?

谢谢。

编辑:

我混淆了一些东西。导致二进制文件损坏的实际代码是:

$ git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"

顶部的代码实际上删除了带有我的密码的所有文件,这很奇怪。

I've always used an interface based git client (smartGit) and thus don't have much experience with the git console.

However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:

git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all

I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.

Is there a better way to do this that won't corrupt my binary files?

Thanks.

EDIT:

I got mixed up with something. The actual code that caused binary files to get corrupted was:

$ git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"

The code at the top actually removed all files with my password strangely enough.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

我建议使用 BFG Repo-Cleaner,这是一个更简单、更快的替代方案git-filter-branch 专为重写 Git 历史记录中的文件而设计。

您应该仔细按照以下步骤操作: https://rtyley.github.io/bfg -repo-cleaner/#usage - 但核心位就是这样:下载 BFG 的 jar(需要 Java 7 或更高版本)并运行此命令(其中 my-repo.git 是存储库裸克隆的文件夹名称):

$ java -jar bfg.jar  --replace-text replacements.txt -fi '*.php'  my-repo.git

replacements.txt 文件应包含您想要执行的所有替换,格式如下(每行一个条目 - 请注意不应包含注释):

PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass         # replace with 'examplePass' instead
PASSWORD3==>                    # replace with the empty string
regex:password=\w+==>password=  # Replace, using a regex
regex:\r(\n)==>$1               # Replace Windows newlines with Unix newlines

您的整个存储库历史记录将被扫描,并且 < code>.php 文件(大小低于 1MB)将执行替换:任何匹配的字符串(不在您的最新提交中)都将被替换。

全面披露:我是 BFG Repo-Cleaner 的作者。

I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for rewriting files from Git history.

You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 7 or above) and run this command (where my-repo.git is the folder name of the bare clone of your repo):

$ java -jar bfg.jar  --replace-text replacements.txt -fi '*.php'  my-repo.git

The replacements.txt file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):

PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass         # replace with 'examplePass' instead
PASSWORD3==>                    # replace with the empty string
regex:password=\w+==>password=  # Replace, using a regex
regex:\r(\n)==>$1               # Replace Windows newlines with Unix newlines

Your entire repository history will be scanned, and .php files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

很酷不放纵 2024-10-07 16:41:07

您可以通过将 -name "pattern" 传递给 find 来避免触及不需要的文件。

这对我有用:

git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
    's/originalpassword/newpassword/g' {} \;"

You can avoid touching undesired files by passing -name "pattern" to find.

This works for me:

git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
    's/originalpassword/newpassword/g' {} \;"
冰葑 2024-10-07 16:41:07

在 Git 2.24(2019 年第 4 季度)中,git filter-branch(和 BFG)已弃用

newren/git-filter-repo< /a> 确实做你想做的事。
它有一个示例,几乎是您想要的
示例部分

cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt

带有 expressions.txt

literal:originalpassword==>newpassword

但是,警告< /strong>:正如 Hasturkun 添加在 评论

使用--path-glob(或--path)会导致git filter-branch仅保留与这些匹配的文件规格
仅替换特定文件中文本的功能在 bfg-ish 中可用,如 -filint-history 脚本
否则,目前看来这只能通过自定义提交回调实现。
请参阅 newren/git-filter-repo 问题 74< /a>


这是有道理的,考虑到 --replace-text 选项本身就是一个 blob 回调


2024 年第一季度,newren/git-filter-repo< /code> 第 74 期建议(来自 Daniil):

解决方案

git filter-branch --tree-filter "find . -path './src/*' -regextype egrep -regex '.*\.(hpp|cpp)' -exec perl -0777 -pe 's{\n\n\n+}{\n\n}g' -i {} \;" <分支/HEAD/哈希..HEAD>

它将“>1 个空行”替换为单个空行

With Git 2.24 (Q4 2019), git filter-branch (and BFG) is deprecated.

newren/git-filter-repo does NOT do what you want.
It has an example that is ALMOST what you want in its example section:

cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt

with expressions.txt:

literal:originalpassword==>newpassword

However, WARNING: As Hasturkun adds in the comments

Using --path-glob (or --path) causes git filter-branch to only keep files matching those specifications.
The functionality to only replace text in specific files is available in bfg-ish as -fi, or the lint-history script.
Otherwise, it looks like this is only currently possible with a custom commit callback.
See newren/git-filter-repo issue 74

Which makes senses, considering the --replace-text option is itself a blob callback.


Q1 2024, newren/git-filter-repo issue 74 proposes (from Daniil):

Solution

git filter-branch --tree-filter "find . -path './src/*' -regextype egrep -regex '.*\.(hpp|cpp)' -exec perl -0777 -pe 's{\n\n\n+}{\n\n}g' -i {} \;" <branch/HEAD/hash..HEAD>

It was replacing ">1 blank lines" with single one

呆橘 2024-10-07 16:41:07

有关 git-filter-repo 的更多信息

https://stackoverflow.com/ a/58252169/895245 提供了基础知识,这里有一些更多信息。

安装

从 git 2.5 开始,至少它不随主线 git 一起提供,因此:https://superuser.com/questions/1563034/how-do-you-install-git-filter-repo/1589985#1589985

python3 -m pip install --user git-filter-repo

使用提示

这是我倾向于使用的更常见的方法:

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') HEAD

其中:

  • Bash 进程替换允许我们不创建用于简单替换的文件。如果您的 shell 不支持此功能,您只需将其写入文件即可:

    echo 'my_password==>xxxxxxxx' > tmp
    git filter-repo --replace-text tmp HEAD
    
  • HEAD 使其仅影响当前分支

仅修改一系列提交

如何使用 git filter-repo 仅修改一系列提交而不是整个分支历史记录?

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') --refs HEAD~2..HEAD

使用Python API替换

对于更复杂的替换,可以使用Python API,参见:如何使用 git filter-repo 作为带有 Python 模块接口的库?

More info on git-filter-repo

https://stackoverflow.com/a/58252169/895245 gives the basics, here is some more info.

Install

As of git 2.5 at least it is not shipped with mainline git so:https://superuser.com/questions/1563034/how-do-you-install-git-filter-repo/1589985#1589985

python3 -m pip install --user git-filter-repo

Usage tips

Here is the more common approach I tend to use:

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') HEAD

where:

  • Bash process substitution allows us to not create a file for simple replaces. If your shell does not support this feature, you just have to write it to a file instead:

    echo 'my_password==>xxxxxxxx' > tmp
    git filter-repo --replace-text tmp HEAD
    
  • HEAD makes it affect only the current branch

Modify only a range of commits

How to modify only a range of commits with git filter-repo instead of the entire branch history?

git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx') --refs HEAD~2..HEAD

Replace using the Python API

For more complex replacements, you can use the Python API, see: How to use git filter-repo as a library with the Python module interface?

迷迭香的记忆 2024-10-07 16:41:07

我在 /usr/local/git/findsed.sh 创建了一个文件,其中包含以下内容:

find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;

我运行了命令:

git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"

命令说明

当您运行 git filter-branch 时,它会遍历每个修订版你曾经承诺过,一一承诺。 --tree-filter 在每个提交的修订上运行founded.sh 脚本,保存它,然后进入下一个修订。

find 命令查找特定文件或文件集,并在该文件上执行 (-exec) sed 编辑器。 sed 是一个命令,它采用 s/ 之后的正则表达式并将其替换为 / 和 /g 之间的字符串(在我的示例中为空白)。 {} 是对 find 命令给出的文件路径的引用。文件路径被提供给 sed,以便 sed 知道要处理什么。 \;只是结束 -exec 命令。

将 shell 脚本和命令分成单独的部分可以减少引用 '' 或 "" 时的复杂性。

特点

我在 Mac 上成功实现了这个,显然 sed 是 Mac 上的一个特定(较旧的?)版本。这很重要,因为它有时表现不同。确保执行 sed -i '' ,否则它会在文件末尾添加一个“-e”,认为这就是我想要命名的备份文件。 -i '' 表示不创建备份文件,只需就地编辑文件,不需要备份文件。

指定 -name 'filename.sh' 帮助我避免了另一个我无法解决的问题。还有另一个带有 .sh 的文件,该文件结束时没有换行符。 sed 由于某种原因,会在末尾添加一个换行符,尽管 's/blah/blah/g' 与该文件中的任何内容都不匹配。因此,我没有解决这个问题,而是告诉 find 忽略所有其他文件。

有效的其他命令

此外,我发现这些命令可以在finded.sh 文件中使用(一次只能使用一个命令,不能使用多个命令,因此请将其他命令注释掉):

find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;

享受吧!

I created a file at /usr/local/git/findsed.sh , with the following contents:

find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;

I ran the command:

git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"

Explanation of commands

When you run git filter-branch, this goes through each revision that you ever committed, one by one. --tree-filter runs the findsed.sh script on each committed revision, saves it, then progresses to the next revision.

The find command finds a specific file or set of files and executes (-exec) the sed editor on that file. sed is a command that takes the regex after s/ and replaces it with the string between / and /g (blank in my example). {} is a reference to the files path that was given by the find command. The file path is fed to sed, so that sed knows what to work on. \; just ends the -exec command.

Seperating the shell script and command out into seperate pieces allows for less complication when it comes to quotes '' or "".

Peculiarities

I successfully implemented this on a mac, and apparently sed is a particular (older?) version on macs. This matters, as it sometimes behaves differently. Make sure to do sed -i '' or else it was adding a "-e" to the end of files, thinking that that was what i wanted to name my backup files. -i '' says dont make backup files, just edit the files in place and no backup file needed.

Specifying -name 'filename.sh' helped me avoid another issue that I could not solve. There was another file with .sh and that file ended without a newline character. sed for some reason, would add a newline character to the end, despite the 's/blah/blah/g' not matching anything in that file. So instead of figuring out that issue, I just told the find to ignore all other files.

Additional commands that work

Additionally, I found these commands to work in the findsed.sh file (only one command at a time, not multple, so comment # the others out):

find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;

Enjoy!

梦亿 2024-10-07 16:41:07

可能是 shell 扩展问题。如果 filter-branch 在执行命令时丢失了 "*.php" 周围的引号,则它可能会扩展为空,因此 git ls-files -z列出所有文件。

您可以检查过滤器分支源代码或尝试不同的引用技巧,但我要做的只是制作一个单行 shell 脚本来执行树过滤器并传递该脚本。

Could be a shell expansion issue. If filter-branch is losing the quotes around "*.php" by the time it evaluates the command, it may be expanding to nothing, thus git ls-files -z listing all files.

You could check the filter-branch source or trying different quoting tricks, but what I'd do is just make a one-line shell script that does your tree-filter and pass that script instead.

盛夏尉蓝 2024-10-07 16:41:07

由于 Google 中出现了 git 替换历史记录中的文本,并且由于使用非 git 工具有时麻烦大于其价值,因此这里有一个将替换多行文本的命令 strong> 从 ${COMMIT} 一直到 HEAD

警告:这不适合初学者。它使用 git filter-branch ,所以它的所有警告/陷阱/等等。申请。确保您已提交/备份了需要保存的所有内容,这样就不会丢失数据。

话虽如此,在 Bash 中创建别名,如下所示:

git config --global alias.filter-branch-replace-text '!main() { set -eu && if [ -n "${BASH_VERSION+x}" ]; then set -o pipefail; fi && local pattern patternq replacement replacementq commit && pattern="$1" && shift && replacement="$1" && shift && commit="$1" && shift && local sed_binary_flags="" && if [ msys = "${OSTYPE-}" ]; then sed_binary_flags="-b"; fi && patternq="$(printf "%s" "${pattern}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && patternq="'\''${patternq%.}'\''" && replacementq="$(printf "%s" "${replacement}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && replacementq="'\''${replacementq%.}'\''" && git filter-branch --tree-filter "for path in $(printf "%s\n" "$@" | sed ${sed_binary_flags} -e "s/'\''/'\''\\\\'\'''\''/g" -e "s/\(.*\)/'\''\1'\''/" | tr "\n" " ")"'\''; do if [ -f "${path}" ]; then perl -0777 -i -s -p -e "s/\\Q\$q\\E/\$s/sgm" -- -q='\''"${patternq}"'\'' -s='\''"${replacementq}"'\'' -- "${path}"; fi || break; done'\'' "${commit}~1..HEAD" --; } && main'

然后您可以从 Bash 调用它,如下所示:

git filter-branch-replace-text \
    

请注意,这执行文字文本替换,而不是正则表达式替换

如果您需要正则表达式,则需要删除 Perl 命令中的 \Q\E (执行转义),并根据 < 需要正确转义字符串。 code>s/$q/$s/sgm 自己命令。

如果你想漂亮地打印脚本,你可以将其格式化如下:

(f="$(git --no-pager config --get alias.filter-branch-replace-text)" && eval "${f%&&*}" && declare -f "${f%%()*}")
)\r\n{' \

请注意,这执行文字文本替换,而不是正则表达式替换

如果您需要正则表达式,则需要删除 Perl 命令中的 \Q\E (执行转义),并根据 < 需要正确转义字符串。 code>s/$q/$s/sgm 自己命令。

如果你想漂亮地打印脚本,你可以将其格式化如下:


) /* EOL */\r\n{' \
    "${COMMIT}" \
    src/*.txt

请注意,这执行文字文本替换,而不是正则表达式替换

如果您需要正则表达式,则需要删除 Perl 命令中的 \Q\E (执行转义),并根据 < 需要正确转义字符串。 code>s/$q/$s/sgm 自己命令。

如果你想漂亮地打印脚本,你可以将其格式化如下:

Since this comes up in Google for git replace text in history, and since using non-git tools is sometimes more trouble than it's worth, here's a command that will replace multi-line text all the way from ${COMMIT} onwards to HEAD.

Warning: This is NOT for beginners. It uses git filter-branch, so all of its caveats/pitfalls/etc. apply. Make sure you've committed/backed up everything you need to save, so you don't lose data.

With that said, create the alias in Bash as follows:

git config --global alias.filter-branch-replace-text '!main() { set -eu && if [ -n "${BASH_VERSION+x}" ]; then set -o pipefail; fi && local pattern patternq replacement replacementq commit && pattern="$1" && shift && replacement="$1" && shift && commit="$1" && shift && local sed_binary_flags="" && if [ msys = "${OSTYPE-}" ]; then sed_binary_flags="-b"; fi && patternq="$(printf "%s" "${pattern}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && patternq="'\''${patternq%.}'\''" && replacementq="$(printf "%s" "${replacement}" | sed ${sed_binary_flags} "s/'\''/'\''\\\\'\'''\''/g")." && replacementq="'\''${replacementq%.}'\''" && git filter-branch --tree-filter "for path in $(printf "%s\n" "$@" | sed ${sed_binary_flags} -e "s/'\''/'\''\\\\'\'''\''/g" -e "s/\(.*\)/'\''\1'\''/" | tr "\n" " ")"'\''; do if [ -f "${path}" ]; then perl -0777 -i -s -p -e "s/\\Q\$q\\E/\$s/sgm" -- -q='\''"${patternq}"'\'' -s='\''"${replacementq}"'\'' -- "${path}"; fi || break; done'\'' "${commit}~1..HEAD" --; } && main'

and you can then invoke it from Bash as follows:

git filter-branch-replace-text \
    

Note that this performs literal text replacement, not regular expression replacement.

If you need regexes, you'll need to remove the \Q and \E in the Perl command (which perform escaping) and properly escape the strings as needed for the s/$q/$s/sgm command yourself.

And if you want to pretty-print the script, you can format it like this:

(f="$(git --no-pager config --get alias.filter-branch-replace-text)" && eval "${f%&&*}" && declare -f "${f%%()*}")
)\r\n{' \

Note that this performs literal text replacement, not regular expression replacement.

If you need regexes, you'll need to remove the \Q and \E in the Perl command (which perform escaping) and properly escape the strings as needed for the s/$q/$s/sgm command yourself.

And if you want to pretty-print the script, you can format it like this:


) /* EOL */\r\n{' \
    "${COMMIT}" \
    src/*.txt

Note that this performs literal text replacement, not regular expression replacement.

If you need regexes, you'll need to remove the \Q and \E in the Perl command (which perform escaping) and properly escape the strings as needed for the s/$q/$s/sgm command yourself.

And if you want to pretty-print the script, you can format it like this:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文