Git 将如何处理 blob 上的 SHA-1 冲突?
这可能在现实世界中从未发生过,也可能永远不会发生,但让我们考虑一下:假设您有一个 git 存储库,进行提交,然后非常非常不幸:其中一个 blob 最终具有相同的 SHA-1作为另一个已经在您的存储库中的。问题是,Git 将如何处理这个问题?简单地失败?找到一种方法来链接两个 blob 并根据上下文检查需要哪一个?
与其说是一个实际问题,不如说是一个脑筋急转弯,但我发现这个问题很有趣。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我做了一个实验来了解 Git 在这种情况下的具体表现。这是版本 2.7.9~rc0+next.20151210(Debian 版本)。我基本上只是通过应用以下 diff 并重建 git 将哈希大小从 160 位减少到 4 位:
然后我做了一些提交并注意到以下内容。
对于#2,当您运行“git push”时,通常会收到如下错误:
或者:
如果删除文件然后运行“git checkout file.txt”。
对于 #4 和 #6,您通常会收到如下错误:
运行“git commit”时。在这种情况下,您通常可以再次键入“git commit”,因为这将创建一个新的哈希(由于更改了时间戳)
对于#5和#9,您通常会收到如下错误:
当运行“git commit”时,
如果有人试图克隆您损坏的存储库,他们通常会看到类似以下内容:
让我“担心”的是,在两种情况下(2,3),存储库会在没有任何警告的情况下损坏,而在 3 种情况下(1,7,8),一切看起来都不错,但是存储库内容与实际不同你期望它是。克隆或拉取的人将拥有与您不同的内容。情况 4、5、6 和 9 都可以,因为它会因错误而停止。我想如果至少在所有情况下都因错误而失败会更好。
I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:
Then I did a few commits and noticed the following.
For #2 you will typically get an error like this when you run "git push":
or:
if you delete the file and then run "git checkout file.txt".
For #4 and #6, you will typically get an error like this:
when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)
For #5 and #9, you will typically get an error like this:
when running "git commit"
If someone tries to clone your corrupt repository, they will typically see something like:
What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.
原始答案(2012)(请参阅下面的
shattered.io
2017 SHA1 冲突)Linus 的旧(2006)答案可能仍然相关:
使用 SHA 的问题- 256 经常被提及,但目前(2012 年)尚未采取行动。
注意:从 2018 年和 Git 2.19 开始,代码将被重构以使用 SHA-256。
注意(幽默):您可以使用项目 < 强制提交特定的 SHA1 前缀来自 Brad Fitzpatrick (
bradfitz
) 的strong>gitbrute。示例: https://github.com/bradfitz/deadbeef
丹尼尔·丁尼斯指出在评论中 7.1 Git 工具 - 修订选择,其中包括:
即使是最近(2017 年 2 月)的
shattered.io
也展示了伪造 SHA1 冲突的可能性:(请参阅我的单独答案了解更多信息,包括 Linus Torvalds 的 Google+ 帖子)
请参阅加密哈希函数的生命周期”。 org/" rel="noreferrer">Valerie Anita Aurora 了解更多。
在那一页中,她指出:
请参阅我的下面的单独答案了解更多内容。
Original answer (2012) (see
shattered.io
2017 SHA1 collision below)That old (2006) answer from Linus might still be relevant:
The question of using SHA-256 is regularly mentioned, but not act upon for now (2012).
Note: starting 2018 and Git 2.19, the code is being refactored to use SHA-256.
Note (Humor): you can force a commit to a particular SHA1 prefix, with the project gitbrute from Brad Fitzpatrick (
bradfitz
).Example: https://github.com/bradfitz/deadbeef
Daniel Dinnyes points out in the comments to 7.1 Git Tools - Revision Selection, which includes:
Even the more recently (February 2017)
shattered.io
demonstrated the possibility of forging a SHA1 collision:(see much more in my separate answer, including Linus Torvalds' Google+ post)
See "Lifetimes of cryptographic hash functions" from Valerie Anita Aurora for more.
In that page, she notes:
See more in my separate answer below.
根据 Pro Git:
所以它不会失败,但也不会保存你的新对象。
我不知道这在命令行上会是什么样子,但这肯定会令人困惑。
再往下一点,同一参考文献试图说明这种碰撞的可能性:
According to Pro Git:
So it wouldn't fail, but it wouldn't save your new object either.
I don't know how that would look on the command line, but that would certainly be confusing.
A bit further down, that same reference attempts to illustrate the likely-ness of such a collision:
要添加到我之前在 2012 年的回答,现在(2017 年 2 月,五年后)有一个实际的示例SHA-1 与 shattered.io 碰撞,您可以在其中制作两个碰撞的 PDF文件:即在第一个 PDF 文件上获取 SHA-1 数字签名,该签名也可以被滥用为第二个 PDF 文件上的有效签名。
另请参阅“多年来广泛使用的 SHA1 函数现已死亡”,此插图。
2 月 26 日更新:Linus 在 Google+ 帖子中确认了以下几点:
关于该转换,请参阅 Q1 2018 Git 2.16 添加表示哈希算法的结构。该过渡的实施已经开始。
从 Git 2.19(2018 年第 3 季度)开始,Git 选择了 SHA-256 作为 NewHash,并且正在将其集成到代码中(这意味着 SHA1 仍然是默认值(2019 年第二季度,Git 2.21),但 SHA2 将成为继任者
)答复(2月25日)
但是:
它确实有一些
git-svn
问题。或者更确切地说使用 svn 本身,如 见此处。git fsck
,如 Linus Torvalds 今天提到。git fsck
会警告在NUL
之后隐藏不透明数据的提交消息(尽管NUL
并不总是出现在欺诈性文件中)。并非每个人都开启
transfer.fsck
,但 GitHub 确实如此:如果对象格式错误或链接损坏,任何推送都将中止。虽然...有一个默认情况下未激活的原因 。创建两个具有相同头提交哈希和不同内容的 Git 存储库的实际问题。即便如此,攻击仍然很复杂。
<块引用>
SCM 的全部要点是它不是一次性事件,
但关于连续的历史。这也从根本上意味着
成功的攻击需要随着时间的推移而发挥作用,并且无法被检测到。
如果你能欺骗 SCM 一次,插入你的代码,它就会得到
下周发现,你实际上没有做任何有用的事情。只有你
烧伤自己。
Joey Hess 在 Git 存储库 和 他发现:
因此主要攻击向量(伪造提交)将是:
另外,您已经可以使用
检测每个文件中存在的针对 SHA-1 的密码分析冲突攻击cr-marcstevens/sha1collisiondetection
在 Git 本身中添加类似的检查 会有一些计算成本。
关于更改哈希值,Linux 评论:
尽管如此,过渡计划(从 SHA1 到另一个哈希函数)仍然很复杂< /a>,但积极研究。
convert-to-object_id
活动 正在进行中:更新3 月 20 日:GitHub 详细介绍了可能的攻击及其保护< /a>:
保护:
请参阅
sha1collisiondetection
”,作者 Marc Stevens再次,Q1 2018 Git 2.16 添加表示哈希算法的结构,开始实施向新哈希的过渡。
如上所述,新支持的哈希将为 SHA-256。
To add to my previous answer from 2012, there is now (Feb. 2017, five years later), an example of actual SHA-1 collision with shattered.io, where you can craft two colliding PDF files: that is obtain a SHA-1 digital signature on the first PDF file which can also be abused as a valid signature on the second PDF file.
See also "At death’s door for years, widely used SHA1 function is now dead", and this illustration.
Update 26 of February: Linus confirmed the following points in a Google+ post:
Regarding that transition, see the Q1 2018 Git 2.16 adding a structure representing hash algorithm. The implementation of that transition has started.
Starting Git 2.19 (Q3 2018), Git has picked SHA-256 as NewHash, and is in the process of integrating it to the code (meaning SHA1 is still the default (Q2 2019, Git 2.21), but SHA2 will be the successor)
Original answer (25th of February)
But:
It does have some issue for
git-svn
though. Or rather with svn itself, as seen here.git fsck
, as mentioned by Linus Torvalds today.git fsck
would warn about a commit message with opaque data hidden after aNUL
(althoughNUL
isn't always present in a fraudulent file).Not everybody turns on
transfer.fsck
, but GitHub does: any push would be will aborted in the case of a malformed object or a broken link. Although... there is a reason this is not activated by default.The actual issue in creating two Git repositories with the same head commit hash and different contents. And even then, the attack remains convoluted.
Joey Hess tries those pdf in a Git repo and he found:
So the main vector of attack (forging a commit) would be:
Plus, you already can and detect cryptanalytic collision attacks against SHA-1 present in each file with
cr-marcstevens/sha1collisiondetection
Adding a similar check in Git itself would have some computation cost.
On changing hash, Linux comments:
Still, a transition plan (from SHA1 to another hash function) would still be complex, but actively studied.
A
convert-to-object_id
campaign is in progress:Update 20th of March: GitHub detail a possible attack and its protection:
Protection:
See "
sha1collisiondetection
" by Marc StevensAgain, with Q1 2018 Git 2.16 adding a structure representing hash algorithm, the implementation of a transition to a new hash has started.
As mentioned above, the new supported Hash will be SHA-256.
针对 SHA-1 等哈希值有多种不同的攻击模型,但通常讨论的是冲突搜索,包括 Marc Stevens 的 HashClash 工具。
正如人们所指出的,您可以强制与 git 发生哈希冲突,但这样做不会覆盖另一个存储库中的现有对象。我想即使 git push -f --no-thin 也不会覆盖现有对象,但不能 100% 确定,
如果你侵入远程存储库,那么你可以使你的错误对象变得更旧。一个那里,可能将被黑客攻击的代码嵌入 github 或类似的开源项目中,如果你小心的话,也许你可以引入新用户下载的被黑客攻击的版本,
但我怀疑该项目的开发人员可能会做的许多事情可能会暴露或意外地破坏你的多个项目。特别是,如果某些开发人员(您没有破解过)在修改受影响的文件后运行了上述的 git push --no-thin ,那么这将是一笔巨额资金的损失。有时甚至没有
--no-thin
取决于。There are several different attack models for hashes like SHA-1, but the one usually discussed is collision search, including Marc Stevens' HashClash tool.
As folks pointed out, you could force a hash collision with git, but doing so won't overwrite the existing objects in another repository. I'd imagine even
git push -f --no-thin
won't overwrite the existing objects, but not 100% sure.That said, if you hack into a remote repository then you could make your false object the older one there, possibly embedding hacked code into an open source project on github or similar. If you were careful then maybe you could introduce a hacked version that new users downloaded.
I suspect however that many things the project's developers might do could either expose or accidentally destroy your multi-million dollar hack. In particular, that's a lot of money down the drain if some developer, who you didn't hack, ever runs the aforementioned
git push --no-thin
after modifying the effected files, sometimes even without the--no-thin
depending.我认为密码学家会庆祝。
引用关于 SHA-1 的维基百科文章:
I think cryptographers would celebrate.
Quote from Wikipedia article on SHA-1: