Git 如何计算文件哈希值?

发布于 2024-12-01 14:56:31 字数 253 浏览 0 评论 0 原文

存储在树对象中的 SHA1 哈希值(由 git ls-tree 返回)与文件内容的 SHA1 哈希值(由 sha1sum 返回)不匹配:

$ git cat-file blob 4716ca912495c805b94a88ef6dc3fb4aff46bf3c | sha1sum
de20247992af0f949ae8df4fa9a37e4a03d7063e  -

如何Git 计算文件哈希值?它会在计算哈希值之前压缩内容吗?

The SHA1 hashes stored in the tree objects (as returned by git ls-tree) do not match the SHA1 hashes of the file content (as returned by sha1sum):

$ git cat-file blob 4716ca912495c805b94a88ef6dc3fb4aff46bf3c | sha1sum
de20247992af0f949ae8df4fa9a37e4a03d7063e  -

How does Git compute file hashes? Does it compress the content before computing the hash?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

空城缀染半城烟沙 2024-12-08 14:56:31

Git 在对象前面加上“blob”前缀,后跟长度(作为
人类可读的整数),后跟一个 NUL 字符

$ echo -e 'blob 14\0Hello, World!' |沙苏姆
8ab686eafeb1f44702738c8b0f24f2567c36da6d

来源:http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

Git prefixes the object with "blob ", followed by the length (as a
human-readable integer), followed by a NUL character

$ echo -e 'blob 14\0Hello, World!' | shasum
8ab686eafeb1f44702738c8b0f24f2567c36da6d

Source: http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

携余温的黄昏 2024-12-08 14:56:31

我只是扩展 @Leif Gruenwoldt 的答案,并详细说明 参考@Leif Gruenwoldt提供

自己动手..

  • 第 1 步:在存储库中创建一个空文本文档(名称不重要)
  • 第 2 步:暂存并提交文档
  • 第 3 步:通过执行 git ls-tree HEAD 识别 blob 的哈希值
  • 第 4 步:查找 blob 的哈希值 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  • 第 5 步:从惊讶中恢复过来,阅读下文

GIT 如何计算其提交哈希值

    Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "\0" + <contents_of_file>)

文本 blob⎵ 是一个常量前缀,\0 也是常量,并且是NULL 字符。 因文件而异。

请参阅:git 提交对象的文件格式是什么?

这就是全部!

但是等等!,您是否注意到不是用于哈希计算的参数?如果两个文件的内容相同,无论它们的创建日期和时间以及名称如何,它们都可能具有相同的哈希值。这是 Git 比其他版本控制系统更好地处理移动和重命名的原因之一。

自己动手(分机)

  • 第 6 步:在同一目录中创建另一个具有不同文件名的空文件
  • 第 7 步:比较两个文件的哈希值。

注意:

该链接没有提及 tree 对象是如何进行哈希处理的。我不确定算法和参数,但是根据我的观察,它可能会根据它包含的所有 blobtrees (可能是它们的哈希值)计算哈希值

I am only expanding on the answer by @Leif Gruenwoldt and detailing what is in the reference provided by @Leif Gruenwoldt

Do It Yourself..

  • Step 1. Create an empty text document (name does not matter) in your repository
  • Step 2. Stage and Commit the document
  • Step 3. Identify the hash of the blob by executing git ls-tree HEAD
  • Step 4. Find the blob's hash to be e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  • Step 5. Snap out of your surprise and read below

How does GIT compute its commit hashes

    Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "\0" + <contents_of_file>)

The text blob⎵ is a constant prefix and \0 is also constant and is the NULL character. The <size_of_file> and <contents_of_file> vary depending on the file.

See: What is the file format of a git commit object?

And thats all folks!

But wait!, did you notice that the <filename> is not a parameter used for the hash computation? Two files could potentially have the same hash if their contents are same indifferent of the date and time they were created and their name. This is one of the reasons Git handles moves and renames better than other version control systems.

Do It Yourself (Ext)

  • Step 6. Create another empty file with a different filename in the same directory
  • Step 7. Compare the hashes of both your files.

Note:

The link does not mention how the tree object is hashed. I am not certain of the algorithm and parameters however from my observation it probably computes a hash based on all the blobs and trees (their hashes probably) it contains

你对谁都笑 2024-12-08 14:56:31

git hash-object

这是验证测试方法的快速方法:

s='abc'
printf "$s" | git hash-object --stdin
printf "blob $(printf "$s" | wc -c)\0$s" | sha1sum

输出:

f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f
f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f  -

其中 sha1sum 位于 GNU Coreutils 中。

然后归结为理解每种对象类型的格式。我们已经介绍了简单的 blob,以下是其他内容:

git hash-object

This is a quick way to verify your test method:

s='abc'
printf "$s" | git hash-object --stdin
printf "blob $(printf "$s" | wc -c)\0$s" | sha1sum

Output:

f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f
f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f  -

where sha1sum is in GNU Coreutils.

Then it comes down to understanding the format of each object type. We have already covered the trivial blob, here are the others:

梦里泪两行 2024-12-08 14:56:31

我需要这个来进行 Python 3 中的一些单元测试,所以我想把它留在这里。

def git_blob_hash(data):
    if isinstance(data, str):
        data = data.encode()
    data = b'blob ' + str(len(data)).encode() + b'\0' + data
    h = hashlib.sha1()
    h.update(data)
    return h.hexdigest()

我在任何地方都坚持 \n 行结尾,但在某些情况下 Git 也可能是 在计算此哈希之前更改行结尾,因此您可能需要 .replace('\r\n', '\n') 也在那里。

I needed this for some unit tests in Python 3 so thought I'd leave it here.

def git_blob_hash(data):
    if isinstance(data, str):
        data = data.encode()
    data = b'blob ' + str(len(data)).encode() + b'\0' + data
    h = hashlib.sha1()
    h.update(data)
    return h.hexdigest()

I stick to \n line endings everywhere but in some circumstances Git might also be changing your line endings before calculating this hash so you may need a .replace('\r\n', '\n') in there too.

自由如风 2024-12-08 14:56:31

基于 Leif Gruenwoldt 答案,这里是 Leif Gruenwoldt 的 shell 函数替代品。 com/git/git/blob/master/builtin/hash-object.c" rel="nofollow noreferrer">git hash-object

git-hash-object () { # substitute when the `git` command is not available
    local type=blob
    [ "$1" = "-t" ] && shift && type=$1 && shift
    # depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
    # by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
    local size=$(cat $1 | wc -c | sed 's/ .*$//')
    ( echo -en "$type $size\0"; cat "$1" ) | sha1sum | sed 's/ .*$//'
}

测试:

$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d

Based on Leif Gruenwoldt answer, here is a shell function substitute to git hash-object :

git-hash-object () { # substitute when the `git` command is not available
    local type=blob
    [ "$1" = "-t" ] && shift && type=$1 && shift
    # depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
    # by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
    local size=$(cat $1 | wc -c | sed 's/ .*$//')
    ( echo -en "$type $size\0"; cat "$1" ) | sha1sum | sed 's/ .*$//'
}

Test:

$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
云之铃。 2024-12-08 14:56:31

这是用于二进制哈希计算的 python3 版本(上面的示例适用于文本)

为了便于阅读,请将此代码放在您自己的 def 中。
另请注意,代码是一个片段,而不是完整的脚本。为您带来灵感。

    targetSize: int
exists: bool
if os.path.exists(targetFile):
    exists = True
    targetSize = os.path.getsize(targetFile)
else:
    exists = False
    targetSize = 0
openMode: str
if exists:
    openMode = 'br+'
else:
    openMode = 'bw+'
with open(targetFile, openMode) as newfile:
    if targetSize > 0:
        header: str = f"blob {targetSize}\0"
        headerBytes = header.encode('utf-8')
        headBytesLen = len(headerBytes)
        buffer = bytearray(headBytesLen + targetSize)
        buffer[0:0+headBytesLen] = headerBytes
        buffer[headBytesLen:headBytesLen+targetSize] = newfile.read()
        sha1Hash = hashlib.sha1(buffer).hexdigest()
        if not sha == sha1Hash:
            newfile.truncate()
        else:
            continue
    with requests.get(fullFile) as response2:            
        newfile.write(response2.content)

This is a python3 version for binary hash calculation (the above example is for text)

For purpose of readability put this code in your own def.
Also note, the code is a snippet, not a complete script. For your inspiration.

    targetSize: int
exists: bool
if os.path.exists(targetFile):
    exists = True
    targetSize = os.path.getsize(targetFile)
else:
    exists = False
    targetSize = 0
openMode: str
if exists:
    openMode = 'br+'
else:
    openMode = 'bw+'
with open(targetFile, openMode) as newfile:
    if targetSize > 0:
        header: str = f"blob {targetSize}\0"
        headerBytes = header.encode('utf-8')
        headBytesLen = len(headerBytes)
        buffer = bytearray(headBytesLen + targetSize)
        buffer[0:0+headBytesLen] = headerBytes
        buffer[headBytesLen:headBytesLen+targetSize] = newfile.read()
        sha1Hash = hashlib.sha1(buffer).hexdigest()
        if not sha == sha1Hash:
            newfile.truncate()
        else:
            continue
    with requests.get(fullFile) as response2:            
        newfile.write(response2.content)
笑梦风尘 2024-12-08 14:56:31

Git 2.45(2024 年第 2 季度),第 10 批 现在提供了这方面的官方文档。

请参阅 提交 28636d7(2024 年 3 月 12 日),作者:德克·古德斯 (dgouders-whs)
(由 Junio C Hamano -- gitster -- 合并于 提交 509a047,2024 年 3 月 21 日)

Documentation/user-manual.txt:生成对象哈希的示例

签字人:Dirk Gouders

添加一个关于如何手动生成对象哈希的简单示例。

此外,由于文档建议查看初始提交,请澄清自那时以来一些细节发生了变化。

user-manual 现在包含在其 手册页

代表“文件”(Git 的最早版本的哈希值略有不同
但结论还是一样)。

以下是一个简短的示例,演示了这些哈希值如何
可以手动生成:

让我们假设一个包含一些简单内容的小文本文件:

$ echo "Hello world" >hello.txt

我们现在可以手动生成 Git 用于该文件的哈希值:

  • 我们想要哈希的对象的类型为“blob”,其大小为
    12 字节。

  • 将对象标头添加到文件内容之前并将其提供给
    sha1sum

$ { printf "blob 12\0";猫你好.txt; } |沙1苏姆
802992c4220de19a90767f3000a79a31b98d0df7 -

可以使用git hash-object来验证手动构建的哈希值
这当然隐藏了标题的添加:

$ git hash-object hello.txt
802992c4220de19a90767f3000a79a31b98d0df7

Git 2.45 (Q2 2024), batch 10 now offers an official documentation on this.

See commit 28636d7 (12 Mar 2024) by Dirk Gouders (dgouders-whs).
(Merged by Junio C Hamano -- gitster -- in commit 509a047, 21 Mar 2024)

Documentation/user-manual.txt: example for generating object hashes

Signed-off-by: Dirk Gouders

Add a simple example on how object hashes can be generated manually.

Further, because the document suggests to have a look at the initial commit, clarify that some details changed since that time.

user-manual now includes in its man page:

for 'file' (the earliest versions of Git hashed slightly differently
but the conclusion is still the same).

The following is a short example that demonstrates how these hashes
can be generated manually:

Let's assume a small text file with some simple content:

$ echo "Hello world" >hello.txt

We can now manually generate the hash Git would use for this file:

  • The object we want the hash for is of type "blob" and its size is
    12 bytes.

  • Prepend the object header to the file content and feed this to
    sha1sum:

$ { printf "blob 12\0"; cat hello.txt; } | sha1sum
802992c4220de19a90767f3000a79a31b98d0df7  -

That manually constructed hash can be verified using git hash-object
which of course hides the addition of the header:

$ git hash-object hello.txt
802992c4220de19a90767f3000a79a31b98d0df7
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文