Git 将如何处理 blob 上的 SHA-1 冲突?

发布于 2025-01-08 12:44:02 字数 208 浏览 1 评论 0 原文

这可能在现实世界中从未发生过,也可能永远不会发生,但让我们考虑一下:假设您有一个 git 存储库,进行提交,然后非常非常不幸:其中一个 blob 最终具有相同的 SHA-1作为另一个已经在您的存储库中的。问题是,Git 将如何处理这个问题?简单地失败?找到一种方法来链接两个 blob 并根据上下文检查需要哪一个?

与其说是一个实际问题,不如说是一个脑筋急转弯,但我发现这个问题很有趣。

This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?

More a brain-teaser than an actual problem, but I found the issue interesting.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

风月客 2025-01-15 12:44:02

我做了一个实验来了解 Git 在这种情况下的具体表现。这是版本 2.7.9~rc0+next.20151210(Debian 版本)。我基本上只是通过应用以下 diff 并重建 git 将哈希大小从 160 位减少到 4 位:

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
    blk_SHA1_Update(ctx, padlen, 8);

    /* Output hash */
-   for (i = 0; i < 5; i++)
-       put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+   for (i = 1; i < 5; i++)
+       put_be32(hashout + i * 4, 0);
 }

然后我做了一些提交并注意到以下内容。

  1. 如果已经存在具有相同哈希值的 blob,则您根本不会收到任何警告。一切似乎都很好,但是当您推送、有人克隆或您恢复时,您将丢失最新版本(与上面的解释一致)。
  2. 如果一个树对象已经存在并且您使用相同的哈希创建了一个 blob:一切都会看起来正常,直到您尝试推送或有人克隆您的存储库。然后你会看到存储库已损坏。
  3. 如果提交对象已经存在并且您使用相同的哈希创建了一个 Blob:与 #2 相同 - 损坏
  4. 如果 Blob 已经存在并且您使用相同的哈希创建了一个提交对象,则更新“ref”时将会失败。
  5. 如果一个 blob 已经存在并且您使用相同的哈希创建一个树对象。创建提交时会失败。
  6. 如果树对象已经存在并且您使用相同的哈希创建提交对象,则更新“ref”时将会失败。
  7. 如果一个树对象已经存在并且您使用相同的哈希创建一个树对象,那么一切都会看起来没问题。但是当您提交时,所有存储库都将引用错误的树。
  8. 如果提交对象已经存在并且您使用相同的哈希创建提交对象,则一切看起来都正常。但是当你提交时,该提交将永远不会被创建,并且 HEAD 指针将被移动到旧的提交。
  9. 如果提交对象已经存在并且您使用相同的哈希创建树对象,则创建提交时将会失败。

对于#2,当您运行“git push”时,通常会收到如下错误:

error: object 0400000000000000000000000000000000000000 is a tree, not a blob
fatal: bad blob object
error: failed to push some refs to origin

或者:

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)

如果删除文件然后运行“git checkout file.txt”。

对于 #4 和 #6,您通常会收到如下错误:

error: Trying to write non-commit object
f000000000000000000000000000000000000000 to branch refs/heads/master
fatal: cannot update HEAD ref

运行“git commit”时。在这种情况下,您通常可以再次键入“git commit”,因为这将创建一个新的哈希(由于更改了时间戳)

对于#5和#9,您通常会收到如下错误:

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object

当运行“git commit”时,

如果有人试图克隆您损坏的存储库,他们通常会看到类似以下内容:

git clone (one repo with collided blob,
d000000000000000000000000000000000000000 is commit,
f000000000000000000000000000000000000000 is tree)

Cloning into 'clonedversion'...
done.
error: unable to read sha1 file of s (d000000000000000000000000000000000000000)
error: unable to read sha1 file of tullebukk
(f000000000000000000000000000000000000000)
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

让我“担心”的是,在两种情况下(2,3),存储库会在没有任何警告的情况下损坏,而在 3 种情况下(1,7,8),一切看起来都不错,但是存储库内容与实际不同你期望它是。克隆或拉取的人将拥有与您不同的内容。情况 4、5、6 和 9 都可以,因为它会因错误而停止。我想如果至少在所有情况下都因错误而失败会更好。

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
    blk_SHA1_Update(ctx, padlen, 8);

    /* Output hash */
-   for (i = 0; i < 5; i++)
-       put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+   for (i = 1; i < 5; i++)
+       put_be32(hashout + i * 4, 0);
 }

Then I did a few commits and noticed the following.

  1. If a blob already exists with the same hash, you will not get any warnings at all. Everything seems to be ok, but when you push, someone clones, or you revert, you will lose the latest version (in line with what is explained above).
  2. If a tree object already exists and you make a blob with the same hash: Everything will seem normal, until you either try to push or someone clones your repository. Then you will see that the repo is corrupt.
  3. If a commit object already exists and you make a blob with the same hash: same as #2 - corrupt
  4. If a blob already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  5. If a blob already exists and you make a tree object with the same hash. It will fail when creating the commit.
  6. If a tree object already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  7. If a tree object already exists and you make a tree object with the same hash, everything will seem ok. But when you commit, all of the repository will reference the wrong tree.
  8. If a commit object already exists and you make a commit object with the same hash, everything will seem ok. But when you commit, the commit will never be created, and the HEAD pointer will be moved to an old commit.
  9. If a commit object already exists and you make a tree object with the same hash, it will fail when creating the commit.

For #2 you will typically get an error like this when you run "git push":

error: object 0400000000000000000000000000000000000000 is a tree, not a blob
fatal: bad blob object
error: failed to push some refs to origin

or:

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)

if you delete the file and then run "git checkout file.txt".

For #4 and #6, you will typically get an error like this:

error: Trying to write non-commit object
f000000000000000000000000000000000000000 to branch refs/heads/master
fatal: cannot update HEAD ref

when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)

For #5 and #9, you will typically get an error like this:

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object

when running "git commit"

If someone tries to clone your corrupt repository, they will typically see something like:

git clone (one repo with collided blob,
d000000000000000000000000000000000000000 is commit,
f000000000000000000000000000000000000000 is tree)

Cloning into 'clonedversion'...
done.
error: unable to read sha1 file of s (d000000000000000000000000000000000000000)
error: unable to read sha1 file of tullebukk
(f000000000000000000000000000000000000000)
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.

音盲 2025-01-15 12:44:02

原始答案(2012)(请参阅下面的 shattered.io 2017 SHA1 冲突)

Linus 的旧(2006)答案可能仍然相关:

不。如果它具有相同的 SHA1,则意味着当我们从另一端接收到对象时,我们将不会覆盖我们已有的对象。

因此,如果我们看到冲突,任何特定存储库中的“较早”对象总是会被覆盖。但请注意,“较早”显然是针对每个存储库的,从某种意义上说,git 对象网络生成一个未完全排序的 DAG,因此,虽然不同的存储库在直接祖先的情况下会就什么是“较早”达成一致,但如果对象来自单独且不直接相关的分支,两个不同的存储库显然可能以不同的顺序获取两个对象。

但是,从安全角度来看,“较早的将覆盖”正是您想要的:请记住,git 模型是您应该主要只信任您自己的存储库。
因此,如果您执行“git pull”,则根据定义,新传入的对象不如您已有的对象可信,因此允许新对象访问是错误的。
更换旧的。

所以你有两种碰撞情况:

  • 无意的类型,您不知何故非常非常不幸,两个文件最终具有相同的 SHA1。
    此时,发生的情况是,当您提交该文件(或执行“git-update-index”将其移动到索引中,但尚未提交)时,新内容的 SHA1将被计算,但由于它与旧对象匹配,因此不会创建新对象,并且提交或索引最终指向对象
    你不会立即注意到(因为索引将与旧对象 SHA1 匹配,这意味着像“git diff”这样的东西将使用签出的副本),但如果你曾经做过一棵树-level diff(或者您进行克隆或拉取,或强制签出)您会突然注意到该文件已更改为与您预期完全不同的内容。
    所以你通常会很快注意到这种碰撞。
    相关新闻中,问题是如何处理不经意的碰撞。.
    首先,让我提醒人们,这种无意的碰撞真的真的不太可能发生,所以我们很可能在宇宙的整个历史中永远不会看到它。
    但是如果发生这种情况,也不是世界末日:您最有可能要做的就是更改轻微冲突的文件,然后强制使用更改后的文件进行新的提交内容(添加一条注释“/* 添加此行是为了避免冲突 */”),然后向 git 传授有关已被证明是危险的神奇 SHA1 的信息。
    因此,在几百万年的时间里,也许我们将不得不向 git 添加一两个“中毒”的 SHA1 值。这不太可能是维护问题;)

  • 攻击者类型的冲突,因为有人破坏(或暴力破解)SHA1。
    这显然比无意中发生的可能性要大得多,但根据定义,它始终是一个“远程”存储库。如果攻击者能够访问本地存储库,他就会有更简单的方法来搞砸你。
    因此,在这种情况下,冲突完全不是问题:您将得到一个与攻击者意图不同的“坏”存储库,但是因为您永远不会实际使用他的碰撞对象,字面上与攻击者根本没有发现碰撞没有什么不同,而只是使用你已经拥有的对象(即它100%相当于“微不足道”生成相同文件的冲突相同的 SHA1)。

使用 SHA 的问题- 256 经常被提及,但目前(2012 年)尚未采取行动。
注意:从 2018 年和 Git 2.19 开始,代码将被重构以使用 SHA-256。


注意(幽默):您可以使用项目 < 强制提交特定的 SHA1 前缀来自 Brad Fitzpatrick (bradfitz) 的strong>gitbrute。

gitbrute 强制使用一对作者+提交者时间戳,以便生成的 git 提交具有您想要的前缀。

示例: https://github.com/bradfitz/deadbeef


丹尼尔·丁尼斯指出在评论中 7.1 Git 工具 - 修订选择,其中包括:

更有可能的是,你的编程团队的每个成员都会在同一个晚上在不相关的事件中被狼袭击并杀死。


即使是最近(2017 年 2 月)的 shattered.io 也展示了伪造 SHA1 冲突的可能性:
(请参阅我的单独答案了解更多信息,包括 Linus Torvalds 的 Google+ 帖子)

  • a/ still需要超过 9,223,372,036,854,775,808 次 SHA1 计算。这相当于单 CPU 计算 6,500 年和单 GPU 计算 110 年的处理能力。
  • b/ 将伪造一个 文件(具有相同的 SHA1),但在附加约束的情况下,其内容 大小将产生相同的 SHA1(仅内容上的冲突并不足够了):参见“git哈希值是如何计算的?”):a blob SHA1 是根据内容大小计算的。

请参阅加密哈希函数的生命周期”。 org/" rel="noreferrer">Valerie Anita Aurora 了解更多。
在那一页中,她指出:

Google 花费了 6500 个 CPU 年和 110 个 GPU 年来说服所有人,我们需要停止在安全关键应用程序中使用 SHA-1。
也因为它很酷

请参阅我的下面的单独答案了解更多内容。

Original answer (2012) (see shattered.io 2017 SHA1 collision below)

That old (2006) answer from Linus might still be relevant:

Nope. If it has the same SHA1, it means that when we receive the object from the other end, we will not overwrite the object we already have.

So what happens is that if we ever see a collision, the "earlier" object in any particular repository will always end up overriding. But note that "earlier" is obviously per-repository, in the sense that the git object network generates a DAG that is not fully ordered, so while different repositories will agree about what is "earlier" in the case of direct ancestry, if the object came through separate and not directly related branches, two different repos may obviously have gotten the two objects in different order.

However, the "earlier will override" is very much what you want from a security standpoint: remember that the git model is that you should primarily trust only your own repository.
So if you do a "git pull", the new incoming objects are by definition less trustworthy than the objects you already have, and as such it would be wrong to allow a new object to
replace an old one.

So you have two cases of collision:

  • the inadvertent kind, where you somehow are very very unlucky, and two files end up having the same SHA1.
    At that point, what happens is that when you commit that file (or do a "git-update-index" to move it into the index, but not committed yet), the SHA1 of the new contents will be computed, but since it matches an old object, a new object won't be created, and the commit-or-index ends up pointing to the old object.
    You won't notice immediately (since the index will match the old object SHA1, and that means that something like "git diff" will use the checked-out copy), but if you ever do a tree-level diff (or you do a clone or pull, or force a checkout) you'll suddenly notice that that file has changed to something completely different than what you expected.
    So you would generally notice this kind of collision fairly quickly.
    In related news, the question is what to do about the inadvertent collision..
    First off, let me remind people that the inadvertent kind of collision is really really really damn unlikely, so we'll quite likely never ever see it in the full history of the universe.
    But if it happens, it's not the end of the world: what you'd most likely have to do is just change the file that collided slightly, and just force a new commit with the changed contents (add a comment saying "/* This line added to avoid collision */") and then teach git about the magic SHA1 that has been shown to be dangerous.
    So over a couple of million years, maybe we'll have to add one or two "poisoned" SHA1 values to git. It's very unlikely to be a maintenance problem ;)

  • The attacker kind of collision because somebody broke (or brute-forced) SHA1.
    This one is clearly a lot more likely than the inadvertent kind, but by definition it's always a "remote" repository. If the attacker had access to the local repository, he'd have much easier ways to screw you up.
    So in this case, the collision is entirely a non-issue: you'll get a "bad" repository that is different from what the attacker intended, but since you'll never actually use his colliding object, it's literally no different from the attacker just not having found a collision at all, but just using the object you already had (ie it's 100% equivalent to the "trivial" collision of the identical file generating the same SHA1).

The question of using SHA-256 is regularly mentioned, but not act upon for now (2012).
Note: starting 2018 and Git 2.19, the code is being refactored to use SHA-256.


Note (Humor): you can force a commit to a particular SHA1 prefix, with the project gitbrute from Brad Fitzpatrick (bradfitz).

gitbrute brute-forces a pair of author+committer timestamps such that the resulting git commit has your desired prefix.

Example: https://github.com/bradfitz/deadbeef


Daniel Dinnyes points out in the comments to 7.1 Git Tools - Revision Selection, which includes:

A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.


Even the more recently (February 2017) shattered.io demonstrated the possibility of forging a SHA1 collision:
(see much more in my separate answer, including Linus Torvalds' Google+ post)

  • a/ still requires over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
  • b/ would forge one file (with the same SHA1), but with the additional constraint its content and size would produce the identical SHA1 (a collision on the content alone is not enough): see "How is the git hash calculated?"): a blob SHA1 is computed based on the content and size.

See "Lifetimes of cryptographic hash functions" from Valerie Anita Aurora for more.
In that page, she notes:

Google spent 6500 CPU years and 110 GPU years to convince everyone we need to stop using SHA-1 for security critical applications.
Also because it was cool

See more in my separate answer below.

梦亿 2025-01-15 12:44:02

根据 Pro Git

如果您确实提交了一个与存储库中的前一个对象散列到相同 SHA-1 值的对象,Git 将看到前一个对象已存在于您的 Git 数据库中,并假设它已被写入。如果您尝试在某个时刻再次检查该对象,您将始终获得第一个对象的数据。

所以它不会失败,但也不会保存你的新对象。
我不知道这在命令行上会是什么样子,但这肯定会令人困惑。

再往下一点,同一参考文献试图说明这种碰撞的可能性:

下面是一个示例,可让您了解如何发生 SHA-1 冲突。如果地球上所有 65 亿人都在编程,每一秒,每个人都在生成相当于整个 Linux 内核历史(100 万个 Git 对象)的代码,并将其推送到一个巨大的 Git 存储库中,那么需要 5 年时间才能实现该存储库包含足够的对象,单个 SHA-1 对象冲突的概率为 50%。更有可能的是,你的编程团队的每个成员都会在同一个晚上在不相关的事件中被狼袭击并杀死。

According to Pro Git:

If you do happen to commit an object that hashes to the same SHA-1 value as a previous object in your repository, Git will see the previous object already in your Git database and assume it was already written. If you try to check out that object again at some point, you’ll always get the data of the first object.

So it wouldn't fail, but it wouldn't save your new object either.
I don't know how that would look on the command line, but that would certainly be confusing.

A bit further down, that same reference attempts to illustrate the likely-ness of such a collision:

Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (1 million Git objects) and pushing it into one enormous Git repository, it would take 5 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

简单 2025-01-15 12:44:02

要添加到我之前在 2012 年的回答,现在(2017 年 2 月,五年后)有一个实际的示例SHA-1 与 shattered.io 碰撞,您可以在其中制作两个碰撞的 PDF文件:即在第一个 PDF 文件上获取 SHA-1 数字签名,该签名也可以被滥用为第二个 PDF 文件上的有效签名。
另请参阅“多年来广泛使用的 SHA1 函数现已死亡”,此插图

2 月 26 日更新:Linus 在 Google+ 帖子中确认了以下几点:

(1) 首先 - 天不会塌下来。使用加密哈希进行安全签名等操作,与使用加密哈希为 git 等内容可寻址系统生成“内容标识符”之间存在很大差异。

(2) 其次,这种特定 SHA1 攻击的性质意味着它实际上很容易缓解,并且已经发布了两组用于缓解该缓解的补丁。

(3) 最后,实际上有一个相当简单的过渡到其他不会破坏世界的哈希值 - 甚至是旧的 git 存储库。

关于该转换,请参阅 Q1 2018 Git 2.16 添加表示哈希算法的结构。该过渡的实施已经开始。

从 Git 2.19(2018 年第 3 季度)开始,Git 选择了 SHA-256 作为 NewHash,并且正在将其集成到代码中(这意味着 SHA1 仍然是默认值(2019 年第二季度,Git 2.21),但 SHA2 将成为继任者


)答复(2月25日)
但是:

Joey HessGit 存储库他发现

这包括两个具有相同 SHA 和大小的文件,它们确实得到
由于 git 将标头添加到的方式,不同的 blob
内容。

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65    bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1    good.pdf

虽然将相同的数据附加到这些冲突文件中确实会生成
其他冲突,前置数据则不会。

因此主要攻击向量(伪造提交)将是

  • 生成常规提交对象;
  • 使用整个提交对象 + NUL 作为所选前缀,并且
  • 使用相同前缀碰撞攻击来生成碰撞的好/坏对象。
  • ...这是没有用的,因为好的和坏的提交对象仍然指向同一棵树!

另外,您已经可以使用 检测每个文件中存在的针对 SHA-1 的密码分析冲突攻击cr-marcstevens/sha1collisiondetection

在 Git 本身中添加类似的检查 会有一些计算成本

关于更改哈希值,Linux 评论

哈希的大小和哈希算法的选择是独立的问题。
你可能会做的是切换到 256 位哈希,使用它
内部和本机 git 数据库中,然后默认情况下仅
将哈希值显示为 40 个字符的十六进制字符串(有点像我们
在很多情况下已经缩写了)。
这样,除非传入,否则 git 周围的工具甚至看不到更改
一些特殊的“--full-hash”参数(或“--abbrev=64”或其他 -
默认情况下我们缩写为 40)。


尽管如此,过渡计划(从 SHA1 到另一个哈希函数)仍然很复杂< /a>,但积极研究。
convert-to-object_id 活动 正在进行中


更新3 月 20 日:GitHub 详细介绍了可能的攻击及其保护< /a>:

SHA-1 名称可以通过各种机制分配信任。例如,Git 允许您对提交或标签进行加密签名。这样做仅对提交或标记对象本身进行签名,而提交或标记对象本身又通过使用其 SHA-1 名称指向包含实际文件数据的其他对象。这些对象中的冲突可能会产生看似有效的签名,但其指向的数据与签名者预期的不同。在这种攻击中,签名者只能看到冲突的一半,而受害者则可以看到另一半。

保护:

最近的攻击使用特殊技术来利用 SHA-1 算法中的弱点,从而在更短的时间内发现冲突。这些技术在字节中留下了一种模式,在计算冲突对的任一半的 SHA-1 时可以检测到该模式。

GitHub.com 现在对其计算的每个 SHA-1 执行此检测,如果有证据表明该对象是冲突对的一半,则中止操作。这可以防止攻击者使用 GitHub 说服项目接受其冲突的“无辜”部分,并防止他们托管恶意的部分。

请参阅 sha1collisiondetection”,作者 Marc Stevens


再次,Q1 2018 Git 2.16 添加表示哈希算法的结构,开始实施向新哈希的过渡。
如上所述,新支持的哈希将为 SHA-256

To add to my previous answer from 2012, there is now (Feb. 2017, five years later), an example of actual SHA-1 collision with shattered.io, where you can craft two colliding PDF files: that is obtain a SHA-1 digital signature on the first PDF file which can also be abused as a valid signature on the second PDF file.
See also "At death’s door for years, widely used SHA1 function is now dead", and this illustration.

Update 26 of February: Linus confirmed the following points in a Google+ post:

(1) First off - the sky isn't falling. There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier" for a content-addressable system like git.

(2) Secondly, the nature of this particular SHA1 attack means that it's actually pretty easy to mitigate against, and there's already been two sets of patches posted for that mitigation.

(3) And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories.

Regarding that transition, see the Q1 2018 Git 2.16 adding a structure representing hash algorithm. The implementation of that transition has started.

Starting Git 2.19 (Q3 2018), Git has picked SHA-256 as NewHash, and is in the process of integrating it to the code (meaning SHA1 is still the default (Q2 2019, Git 2.21), but SHA2 will be the successor)


Original answer (25th of February)
But:

Joey Hess tries those pdf in a Git repo and he found:

That includes two files with the same SHA and size, which do get
different blobs thanks to the way git prepends the header to the
content.

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65    bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1    good.pdf

While appending identical data to these colliding files does generate
other collisions, prepending data does not.

So the main vector of attack (forging a commit) would be:

  • Generate a regular commit object;
  • use the entire commit object + NUL as the chosen prefix, and
  • use the identical-prefix collision attack to generate the colliding good/bad objects.
  • ... and this is useless because the good and bad commit objects still point to the same tree!

Plus, you already can and detect cryptanalytic collision attacks against SHA-1 present in each file with cr-marcstevens/sha1collisiondetection

Adding a similar check in Git itself would have some computation cost.

On changing hash, Linux comments:

The size of the hash and the choice of the hash algorithm are independent issues.
What you'd probably do is switch to a 256-bit hash, use that
internally and in the native git database, and then by default only
show the hash as a 40-character hex string (kind of like how we
already abbreviate things in many situations).
That way tools around git don't even see the change unless passed in
some special "--full-hash" argument (or "--abbrev=64" or whatever -
the default being that we abbreviate to 40).

Still, a transition plan (from SHA1 to another hash function) would still be complex, but actively studied.
A convert-to-object_id campaign is in progress:


Update 20th of March: GitHub detail a possible attack and its protection:

SHA-1 names can be assigned trust through various mechanisms. For instance, Git allows you to cryptographically sign a commit or tag. Doing so signs only the commit or tag object itself, which in turn points to other objects containing the actual file data by using their SHA-1 names. A collision in those objects could produce a signature which appears valid, but which points to different data than the signer intended. In such an attack the signer only sees one half of the collision, and the victim sees the other half.

Protection:

The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair.

GitHub.com now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the "innocent" half of their collision, as well as preventing them from hosting the malicious half.

See "sha1collisiondetection" by Marc Stevens


Again, with Q1 2018 Git 2.16 adding a structure representing hash algorithm, the implementation of a transition to a new hash has started.
As mentioned above, the new supported Hash will be SHA-256.

多像笑话 2025-01-15 12:44:02

针对 SHA-1 等哈希值有多种不同的攻击模型,但通常讨论的是冲突搜索,包括 Marc Stevens 的 HashClash 工具。

"截至 2012 年,针对 SHA-1 最有效的攻击被认为是
马克·史蒂文斯 (Marc Stevens) 的作品[34] 预计成本为 277 万美元
通过从云服务器租用CPU算力来破解单个哈希值。”

正如人们所指出的,您可以强制与 git 发生哈希冲突,但这样做不会覆盖另一个存储库中的现有对象。我想即使 git push -f --no-thin 也不会覆盖现有对象,但不能 100% 确定,

如果你侵入远程存储库,那么你可以使你的错误对象变得更旧。一个那里,可能将被黑客攻击的代码嵌入 github 或类似的开源项目中,如果你小心的话,也许你可以引入新用户下载的被黑客攻击的版本,

但我怀疑该项目的开发人员可能会做的许多事情可能会暴露或意外地破坏你的多个项目。特别是,如果某些开发人员(您没有破解过)在修改受影响的文件后运行了上述的 git push --no-thin ,那么这将是一笔巨额资金的损失。有时甚至没有--no-thin 取决于。

There are several different attack models for hashes like SHA-1, but the one usually discussed is collision search, including Marc Stevens' HashClash tool.

"As of 2012, the most efficient attack against SHA-1 is considered to
be the one by Marc Stevens[34] with an estimated cost of $2.77M to
break a single hash value by renting CPU power from cloud servers."

As folks pointed out, you could force a hash collision with git, but doing so won't overwrite the existing objects in another repository. I'd imagine even git push -f --no-thin won't overwrite the existing objects, but not 100% sure.

That said, if you hack into a remote repository then you could make your false object the older one there, possibly embedding hacked code into an open source project on github or similar. If you were careful then maybe you could introduce a hacked version that new users downloaded.

I suspect however that many things the project's developers might do could either expose or accidentally destroy your multi-million dollar hack. In particular, that's a lot of money down the drain if some developer, who you didn't hack, ever runs the aforementioned git push --no-thin after modifying the effected files, sometimes even without the --no-thin depending.

原谅过去的我 2025-01-15 12:44:02

我认为密码学家会庆祝。

引用关于 SHA-1 的维基百科文章

2005年2月,王晓云、尹逸群和于洪波的攻击被宣布。
这些攻击可以在完整版本的 SHA-1 中发现冲突,需要少于 2^69 次操作。 (暴力搜索需要 2^80 次操作。)

I think cryptographers would celebrate.

Quote from Wikipedia article on SHA-1:

In February 2005, an attack by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu was announced.
The attacks can find collisions in the full version of SHA-1, requiring fewer than 2^69 operations. (A brute-force search would require 2^80 operations.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文