如何从 git 存储库中删除作者?

发布于 2024-09-13 14:04:43 字数 181 浏览 3 评论 0 原文

如果我创建一个 Git 存储库并公开发布它(例如在 GitHub 等上),并且我收到存储库贡献者的请求,无论出于何种原因删除或隐藏他们的名字,有没有一种方法可以轻松做到这一点?

基本上,我有过这样的请求,可能希望将他们的姓名和电子邮件地址替换为“匿名贡献者”之类的内容,或者可能是他们电子邮件地址的 SHA-1 哈希值或类似的内容。

If I create a Git repository and publish it publicly (e.g. on GitHub etc.), and I get a request from a contributor to the repository to remove or obscure their name for whatever reason, is there a way of doing so easily?

Basically, I have had such a request and may want to replace their name and e-mail address with something like "Anonymous Contributor" or maybe a SHA-1 hash of their e-mail address or something like that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

も让我眼熟你 2024-09-20 14:04:43

Jeff 说得很对,正确的路线是 git filter-branch。它需要一个处理环境变量的脚本。对于您的用例,您可能想要这样的东西:

git filter-branch --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \
        export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="[email protected]"; \
    fi
    '

您可以测试它是否像这样工作:

$ cd /tmp
$ mkdir filter-branch && cd filter-branch
$ git init
Initialized empty Git repository in /private/tmp/filter-branch/.git/
$ 
$ touch hi && git add . && git commit -m bla
[master (root-commit) 081f7f5] bla
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hi
$ echo howdi >> hi && git commit -a -m bla
[master a466a18] bla
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git log
commit a466a18e4dc48908f7ba52f8a373dab49a6cfee4
Author: Niko Schwarz <[email protected]>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 081f7f50921edc703b55c04654218fe95d09dc3c
Author: Niko Schwarz <[email protected]>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla
$ 
$ git filter-branch --env-filter '
> if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \    
> export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="[email protected]"; \
> fi
> '
Rewrite a466a18e4dc48908f7ba52f8a373dab49a6cfee4 (2/2)
Ref 'refs/heads/master' was rewritten
$ git log
commit 5f0dfc0dc9a325a3f3aaf4575369f15b0fb21fe9
Author: Jon Doe <[email protected]>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 3cf865fa0a43d2343b4fb6c679c12fc23f7c6015
Author: Jon Doe <[email protected]>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla

Jeff is quite right, the right track is git filter-branch. It expects a script that plays with the environment variables. For your use case, you probably want something like this:

git filter-branch --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \
        export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="[email protected]"; \
    fi
    '

You can test that it works like this:

$ cd /tmp
$ mkdir filter-branch && cd filter-branch
$ git init
Initialized empty Git repository in /private/tmp/filter-branch/.git/
$ 
$ touch hi && git add . && git commit -m bla
[master (root-commit) 081f7f5] bla
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hi
$ echo howdi >> hi && git commit -a -m bla
[master a466a18] bla
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git log
commit a466a18e4dc48908f7ba52f8a373dab49a6cfee4
Author: Niko Schwarz <[email protected]>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 081f7f50921edc703b55c04654218fe95d09dc3c
Author: Niko Schwarz <[email protected]>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla
$ 
$ git filter-branch --env-filter '
> if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \    
> export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="[email protected]"; \
> fi
> '
Rewrite a466a18e4dc48908f7ba52f8a373dab49a6cfee4 (2/2)
Ref 'refs/heads/master' was rewritten
$ git log
commit 5f0dfc0dc9a325a3f3aaf4575369f15b0fb21fe9
Author: Jon Doe <[email protected]>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 3cf865fa0a43d2343b4fb6c679c12fc23f7c6015
Author: Jon Doe <[email protected]>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla

Please beware. There's no way to delete the author's name without invalidating all later commit hashes. That will make later merging a pain for people that have been using your repository.

瑾夏年华 2024-09-20 14:04:43

如果您不仅需要对某个用户,而且还需要对所有用户“匿名”git 存储库,Git 2.2(2014 年 11 月)提供了一项有趣的功能,其中改进和增强了 git fast-export

参见 提交 a872275提交 75d3d65 by Jeff King (peff)

教学快速导出一个--anonymize选项:

有时,用户想要报告他们在存储库中遇到的错误,但他们无权共享存储库的内容。
如果他们能够生成一个与其历史和树具有相似形状的存储库,但不会泄漏任何信息,那将非常有用。
然后可以与开发人员共享这个“匿名”存储库(假设它仍然复制原始问题)。

此补丁为 fast-export 实现了“--anonymize”选项,该选项生成可以重新创建此类存储库的流。
生成单个流使调用者可以轻松验证他们没有泄漏任何有用的信息。您可以通过运行如下命令来概述将共享的内容:

git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less

这将显示我们生成的每个唯一行,对任何数字取模(每个匿名令牌都分配一个数字,例如“User 0”,并且我们在输出中一致地替换它)。

除了匿名化之外,这会生成相对较小(与原始存储库相比)且生成速度较快的测试用例(与使用 filter-branch 相比,或修改快速导出自己)

文档:

如果给出了--anonymize选项,git将尝试从存储库中删除所有标识信息,同时仍然保留足够的原始树和历史模式来重现一些错误。

使用此选项,git 将使用匿名数据替换输出中的所有引用名称、路径、blob 内容、提交和标记消息、名称和电子邮件地址
同一字符串的两个实例将被等效替换(例如,同一作者的两次提交将在输出中具有相同的匿名作者,但与原始作者字符串没有相似之处)。
提交、分支和标签之间的关系以及提交时间戳是+保留的(但提交消息和引用名称与原始消息没有相似之处)。
树的相对组成将被保留(例如,如果您有一个包含 10 个文件和 3 棵树的根树,那么输出也会如此),但它们的名称和文件内容将被替换。


另请参阅 Git 2.28(2020 年第 3 季度),“git fast-export --anonymize” 学会了采用自定义映射,以允许用户调整其输出,使其更可用于调试。

请参阅提交f39ad38提交 8a49495, 提交 d5bf91f提交 6416a86提交 55b0145, 提交 a0f6564提交 7f40759提交 750bb32, 提交 b897bf5提交 b8c0689(2020 年 6 月 23 日),作者:杰夫·金 (peff)
(由 Junio C Hamano -- gitster -- 合并于 提交 0a23331,2020 年 7 月 6 日)

快速导出:允许播种匿名映射

帮助者:Eric Sunshine
签字人:杰夫·金

对存储库进行匿名化后,可能很难找到原始版本和结果之间对应的提交,因此很难重现触发原始版本中错误的命令。

让我们能够播种匿名化映射。
这让用户可以:

  • 如果他们不认为名称是秘密的,则将名称标记为按原样保留(在这种情况下,他们的原始命令将起作用)
  • 将名称映射到新值,这使他们可以在不泄露原始名称的情况下将复制配方调整为新名称

实现相当简单。
我们已经将每个匿名令牌存储在哈希图中(以便出现两次的相同令牌将转换为相同的结果)。我们可以引入一个新的“种子”哈希图,首先会参考该哈希图。

这确实向用户做出了更多有关我们如何匿名化事物的承诺(例如,令牌分割路径名)。但即使单个令牌的实际匿名化发生变化,我们也不太可能想要改变这些规则。对于用户来说,这使事情变得更加容易,他们可以仅对目录名称进行取消隐藏,而无需指定其中的每个路径。

此方法的一种替代方法是根据我们认为合适的方式进行匿名化,然后将整个引用名和路径名映射转储到文件中。这确实有效,但使用起来有点尴尬(你必须手动从映射中挖掘出你关心的项目)。

git fast-export< /code> 现在有:

--anonymize-map=[:]:

将匿名输出中的标记 转换为
如果省略 ,则将 映射到自身(即,不对其进行匿名化)。

重现一些错误可能需要引用特定的提交或
路径,在引用名称和路径被修改后,这变得具有挑战性
匿名。
您可以要求保留特定令牌,或者
映射到新值。

例如,如果您有一个使用 git rev-listsensitive --secret.c 重现的错误,您可以运行:

<前><代码>-------------------------------------------------------- --------
$ git fast-export --anonymize --all \
--anonymize-map=sensitive:foo \
--anonymize-map=secret.c:bar.c \
>流
-------------------------------------------------- -

导入流后,您可以运行git rev-list foo -- bar.c
在匿名存储库中。

请注意,路径和引用名称在斜杠边界处被拆分为标记。
上面的命令会将 subdir/secret.c 匿名化为类似的内容
path123/bar.c;然后您可以在匿名中搜索 bar.c
存储库以确定最终路径名。

为了更简单地引用最终路径名,您可以映射每个路径
成分;因此,如果您还将 subdir 匿名化为 publicdir,那么
最终路径名将是 publicdir/bar.c


在 Git 2.34(2021 年第 4 季度)之前,“ 的输出git fast-export"( man),在使用其匿名功能时,错误地显示了带注释的标签。

请参阅 提交 2f040a9(2021 年 8 月 31 日),作者:Tal Kelrich (hasturkun)
(由 Junio C Hamano -- gitster -- 合并于 提交 febba80,2021 年 9 月 10 日)

fast-export:使用修复匿名标签原始长度

签署人:Tal Kelrich

提交7f40759(“快速导出:收紧 anonymize_mem() 接口以仅处理字符串”,2020-06-23,Git v2.28.0-rc0 -- 合并 列于 第 7 批) 更改了匿名字符串所使用的接口,但未能更新带注释的标签消息的大小以匹配新的匿名字符串。

因此,导出消息长度超过 13 个字符的标签将创建无法通过快速导入进行解析的输出,因为指示的数据长度大于数据输出。

在匿名时重置消息大小,并在测试中添加带有“长”消息的标签。

If you ever have to "anonymize" a git repo not just for one user, but all users, Git 2.2 (November 2014) provides an interesting feature with the improved and enhanced git fast-export:

See commit a872275 and commit 75d3d65 by Jeff King (peff):

teach fast-export an --anonymize option:

Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository.
It would be useful if they could produce a repository that has a similar shape to its history and tree, but without leaking any information.
This "anonymized" repository could then be shared with developers (assuming it still replicates the original problem).

This patch implements an "--anonymize" option to fast-export, which generates a stream that can recreate such a repository.
Producing a single stream makes it easy for the caller to verify that they are not leaking any useful information. You can get an overview of what will be shared by running a command like:

git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less

which will show every unique line we generate, modulo any numbers (each anonymized token is assigned a number, like "User 0", and we replace it consistently in the output).

In addition to anonymizing, this produces test cases that are relatively small (compared to the original repository) and fast to generate (compared to using filter-branch, or modifying the output of fast-export yourself)

Doc:

If the --anonymize option is given, git will attempt to remove all identifying information from the repository while still retaining enough of the original tree and history patterns to reproduce some bugs.

With this option, git will replace all refnames, paths, blob contents, commit and tag messages, names, and email addresses in the output with anonymized data.
Two instances of the same string will be replaced equivalently (e.g., two commits with the same author will have the same anonymized author in the output, but bear no resemblance to the original author string).
The relationship between commits, branches, and tags is +retained, as well as the commit timestamps (but the commit messages and refnames bear no resemblance to the originals).
The relative makeup of the tree is retained (e.g., if you have a root tree with 10 files and 3 trees, so will the output), but their names and the contents of the files will be replaced.


See also Git 2.28 (Q3 2020), "git fast-export --anonymize" learned to take customized mapping to allow its users to tweak its output more usable for debugging.

See commit f39ad38, commit 8a49495, commit 65b5d9f (25 Jun 2020), and commit d5bf91f, commit 6416a86, commit 55b0145, commit a0f6564, commit 7f40759, commit 750bb32, commit b897bf5, commit b8c0689 (23 Jun 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 0a23331, 06 Jul 2020)

fast-export: allow seeding the anonymized mapping

Helped-by: Eric Sunshine
Signed-off-by: Jeff King

After you anonymize a repository, it can be hard to find which commits correspond between the original and the result, and thus hard to reproduce commands that triggered bugs in the original.

Let's make it possible to seed the anonymization map.
This lets users either:

  • mark names to be retained as-is, if they don't consider them secret (in which case their original commands would just work)
  • map names to new values, which lets them adapt the reproduction recipe to the new names without revealing the originals

The implementation is fairly straight-forward.
We already store each anonymized token in a hashmap (so that the same token appearing twice is converted to the same result). We can just introduce a new "seed" hashmap which is consulted first.

This does make a few more promises to the user about how we'll anonymize things (e.g., token-splitting pathnames). But it's unlikely that we'd want to change those rules, even if the actual anonymization of a single token changes. And it makes things much easier for the user, who can unblind only a directory name without having to specify each path within it.

One alternative to this approach would be to anonymize as we see fit, and then dump the whole refname and pathname mappings to a file. This does work, but it's a bit awkward to use (you have to manually dig the items you care about out of the mapping).

git fast-export now have:

--anonymize-map=<from>[:<to>]:

Convert token <from> to <to> in the anonymized output.
If <to> is omitted, map <from> to itself (i.e., do not anonymize it).

Reproducing some bugs may require referencing particular commits or
paths, which becomes challenging after refnames and paths have been
anonymized.
You can ask for a particular token to be left as-is or
mapped to a new value.

For example, if you have a bug which reproduces with git rev-list sensitive -- secret.c, you can run:

---------------------------------------------------
$ git fast-export --anonymize --all \
      --anonymize-map=sensitive:foo \
      --anonymize-map=secret.c:bar.c \
      >stream
---------------------------------------------------

After importing the stream, you can then run git rev-list foo -- bar.c
in the anonymized repository.

Note that paths and refnames are split into tokens at slash boundaries.
The command above would anonymize subdir/secret.c as something like
path123/bar.c; you could then search for bar.c in the anonymized
repository to determine the final pathname.

To make referencing the final pathname simpler, you can map each path
component; so if you also anonymize subdir to publicdir, then the
final pathname would be publicdir/bar.c.


Before Git 2.34 (Q4 2021), the output from "git fast-export"(man), when its anonymization feature is in use, showed an annotated tag incorrectly.

See commit 2f040a9 (31 Aug 2021) by Tal Kelrich (hasturkun).
(Merged by Junio C Hamano -- gitster -- in commit febba80, 10 Sep 2021)

fast-export: fix anonymized tag using original length

Signed-off-by: Tal Kelrich

Commit 7f40759 ("fast-export: tighten anonymize_mem() interface to handle only strings", 2020-06-23, Git v2.28.0-rc0 -- merge listed in batch #7) changed the interface used in anonymizing strings, but failed to update the size of annotated tag messages to match the new anonymized string.

As a result, exporting tags having messages longer than 13 characters would create output that couldn't be parsed by fast-import, as the data length indicated was larger than the data output.

Reset the message size when anonymizing, and add a tag with a "long" message to the test.

棒棒糖 2024-09-20 14:04:43

您可以在本地存储库中进行更改,git commit --amend 适当的提交(您添加名称的位置),然后git push --force 更新 github与您的存储库版本。

带有贡献者姓名的原始提交仍然可以在引用日志中找到(直到它过期,但是需要花费很多精力才能找到它。如果这是一个问题,您也可以从引用日志中删除该特定提交 - 请参阅git help reflog 了解语法以及如何在列表中找到它。

You can make the change in your local repository, git commit --amend the appropriate commit (where you added the name), and then git push --force to update github with your version of the repository.

The original commit with the contributor's name will still be available in the reflog (until it expires, but it would take a lot of effort to find it. If this is a concern, you can obliterate that specific commit from the reflog too -- see git help reflog for the syntax and how to find it in the list.

只有一腔孤勇 2024-09-20 14:04:43

如果您想更改多个提交,请查看手册页

git filter-branch --env-filter

您可以使用 git-filter-branch 更改先前提交的内容/元。

请注意,由于您不处理本地分支(它已被推送到 github),因此您无法从任何已克隆您分支的人中删除作者。

修改已经发布的分支通常也是不好的做法,因为这可能会导致跟踪分支的人感到困惑。

If you want to change more than one commit, check out the man page for

git filter-branch --env-filter

You can use git-filter-branch to change the content/meta of previous commits.

Note that since you're not dealing with a local branch (it's already been pushed to github), you have no way to remove the author from anyone who has already cloned your branch.

It's also generally bad practice to modify a branch which has already been published, since it can lead to confusion for people who are tracking the branch.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文