为什么 git hash-object 返回与 openssl sha1 不同的哈希值?

发布于 2024-10-21 20:51:42 字数 672 浏览 1 评论 0原文

上下文:我下载了 将文件 (Audirvana 0.7.1.zip) 从 code.google 传输到我的 Macbook Pro (Mac OS X 10.6.6)。

我想验证校验和,该特定文件的校验和被发布为 862456662a11e2f386ff0b24fdabcb4f6c1c446a (SHA-1)。 git hash-object 给了我一个不同的哈希值,但是 openssl sha1 返回了预期的 862456662a11e2f386ff0b24fdabcb4f6c1c446a。

以下实验似乎排除了任何可能的下载损坏或换行符差异,并表明实际上有两种不同的算法在起作用:

$ echo A > foo.txt
$ cat foo.txt
A
$ git hash-object foo.txt 
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ openssl sha1 foo.txt 
SHA1(foo.txt)= 7d157d7c000ae27db146575c08ce30df893d3a64

发生了什么?

Context: I downloaded a file (Audirvana 0.7.1.zip) from code.google to my Macbook Pro (Mac OS X 10.6.6).

I wanted to verify the checksum, which for that particular file is posted as 862456662a11e2f386ff0b24fdabcb4f6c1c446a (SHA-1). git hash-object gave me a different hash, but openssl sha1 returned the expected 862456662a11e2f386ff0b24fdabcb4f6c1c446a.

The following experiment seems to rule out any possible download corruption or newline differences and to indicate that there are actually two different algorithms at play:

$ echo A > foo.txt
$ cat foo.txt
A
$ git hash-object foo.txt 
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ openssl sha1 foo.txt 
SHA1(foo.txt)= 7d157d7c000ae27db146575c08ce30df893d3a64

What's going on?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

橘寄 2024-10-28 20:51:43

您会看到差异,因为 git hash-object 不仅仅采用文件中字节的哈希值 - 它在文件内容之前添加字符串“blob”,后跟文件大小和 NUL散列。 Stack Overflow 上的另一个答案中有更多详细信息:

或者,为了说服自己,尝试如下操作:

$ echo -n hello | git hash-object --stdin
b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

$ printf 'blob 5\0hello' > test.txt
$ openssl sha1 test.txt
SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

You see a difference because git hash-object doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow:

Or, to convince yourself, try something like:

$ echo -n hello | git hash-object --stdin
b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

$ printf 'blob 5\0hello' > test.txt
$ openssl sha1 test.txt
SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0
梦开始←不甜 2024-10-28 20:51:43

SHA1 摘要是根据标头字符串和后跟文件数据计算的。标头由对象类型、空格和十进制字节数的对象长度组成。它通过空字节与数据分隔。

因此:

$ git hash-object foo.txt
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \
               && cat foo.txt ) | openssl sha1
f70f10e4db19068f79bc43844b49f3eece45c4e8

这样做的结果之一是“该”空树和“该”空 blob 具有不同的 ID。即:

e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 始终表示“空文件”
4b825dc642cb6eb9a060e54bf8d69288fbee4904 总是意味着“空目录”

你会发现你实际上可以在没有注册对象的新 git 存储库中执行 git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 ,因为它被认为是一种特殊情况从未实际存储过(使用现代 Git 版本)。相比之下,如果您向存储库添加一个空文件,则会存储一个 blob“e69de29bb2d1d6434b8b29ae775ad8c2e48c5391”。

The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte.

So:

$ git hash-object foo.txt
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \
               && cat foo.txt ) | openssl sha1
f70f10e4db19068f79bc43844b49f3eece45c4e8

One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is:

e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file"
4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory"

You will find that you can in fact do git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.

反目相谮 2024-10-28 20:51:43

Git 将对象存储为 [对象类型、对象长度、分隔符 (\0)、内容]
在你的情况下:

$ echo "A" | git hash-object --stdin
f70f10e4db19068f79bc43844b49f3eece45c4e8

尝试计算哈希值:

$ echo -e "blob 2\0A" | shasum 
f70f10e4db19068f79bc43844b49f3eece45c4e8  -

注意使用 -e (对于 bash shell)并调整换行符的长度。

Git stores objects as [Object Type, Object Length, delimeter (\0), Content]
In your case:

$ echo "A" | git hash-object --stdin
f70f10e4db19068f79bc43844b49f3eece45c4e8

Try to calculate hash as:

$ echo -e "blob 2\0A" | shasum 
f70f10e4db19068f79bc43844b49f3eece45c4e8  -

Note using -e (for bash shell) and adjusting length for newline.

一腔孤↑勇 2024-10-28 20:51:43

答案就在这里:

如何在没有 Git 的情况下将 Git SHA1 分配给文件?

< code>git 计算文件元数据+内容,而不仅仅是内容。

就目前而言,这是一个足够好的答案,要点是 git 不是用于校验和下载的工具。

The answer lies here:

How to assign a Git SHA1's to a file without Git?

git calculates on file metadata + contents, not just contents.

That is a good enough answer for now, and the takeaway is that git is not the tool for checksumming downloads.

铜锣湾横着走 2024-10-28 20:51:43

注意过滤!

git 实际上是在计算 sha 之前过滤文件。通常,\r\n 行尾会转换为 \n。这就是为什么 git hash-objectgit hash-object --no-filters 之间可能会得到不同结果的原因
其他一些内容可能会被过滤,并且 .gitattributes 可能会对结果产生影响。

使用 Windows cmd 的小例子:

在新文件夹中创建测试文件:

$ echo this is a test $Id$ > test1.txt
$ echo this is a test $Id: ffbf88668784c14e809c8c449d799b654d7a5fc5 $ > test2.txt

现在使用 git hash-object

$ git hash-object test1.txt
0c3a75d8155d54c2367e290cf7f33434805410be

$ git hash-object test2.txt
60fff1b8ec47ed41254719681e32369d640d6a0f

$ git hash-object --no-filters test2.txt
2f68d9b80a38fb800f039ef9062c764d2a4d4352

不同的文件会导致不同的哈希值:好的,但是 git 确实以某种方式过滤文件--no-filters 有影响。

现在在文件夹中创建一个 git repo 和 .gitattributes:

$ git init .
Initialized empty Git repository in ~/.git

$ echo *.txt ident > .gitattributes

$ git hash-object test1.txt
0c3a75d8155d54c2367e290cf7f33434805410be

$ git hash-object test2.txt
0c3a75d8155d54c2367e290cf7f33434805410be

$ git hash-object --no-filters test2.txt
2f68d9b80a38fb800f039ef9062c764d2a4d4352

现在 test1 和 test2 具有相同的哈希值!但 --no-filters 选项仍然给出相同的值。

结论:您可以使用 git 和 openssl 获得相同的哈希值,但您需要确保您的文件不受 git 过滤器的影响。

Take care to filters !

git is actually filtering the file before calculating the sha. Typically \r\n end of lines are converted to \n. this is why you may have different results between git hash-object and git hash-object --no-filters
some other stuff may be filtered and .gitattributes can have an impact on the results.

little example using windows cmd :

create test files in a new folder:

$ echo this is a test $Id$ > test1.txt
$ echo this is a test $Id: ffbf88668784c14e809c8c449d799b654d7a5fc5 $ > test2.txt

now use git hash-object

$ git hash-object test1.txt
0c3a75d8155d54c2367e290cf7f33434805410be

$ git hash-object test2.txt
60fff1b8ec47ed41254719681e32369d640d6a0f

$ git hash-object --no-filters test2.txt
2f68d9b80a38fb800f039ef9062c764d2a4d4352

different files leads to different hashes : OK but git does somehow filter the file as --no-filters has an impact.

now create a git repo and .gitattributes in the folder:

$ git init .
Initialized empty Git repository in ~/.git

$ echo *.txt ident > .gitattributes

$ git hash-object test1.txt
0c3a75d8155d54c2367e290cf7f33434805410be

$ git hash-object test2.txt
0c3a75d8155d54c2367e290cf7f33434805410be

$ git hash-object --no-filters test2.txt
2f68d9b80a38fb800f039ef9062c764d2a4d4352

Now test1 and test2 have the same hash ! but --no-filters option is still giving the same value.

Conclusion: you can get the same hash with git and openssl but you need to make sure that your file is not impacted by git filters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文