组合 MD5 哈希值

发布于 2024-08-20 11:44:18 字数 311 浏览 7 评论 0原文

在计算大文件上的单个 MD5 校验和时，通常使用什么技术将各个 MD5 值组合成单个值？你只是把它们加在一起吗？我对任何特定的语言、库或 API 并不真正感兴趣；相反，我只对其背后的技术感兴趣。有人可以解释它是如何完成的吗？

给出以下伪代码算法：

MD5Digest X
for each file segment F
   MD5Digest Y = CalculateMD5(F)
   Combine(X,Y)

但是 Combine 到底会做什么？它将两个 MD5 摘要添加在一起，还是什么？

原文

When calculating a single MD5 checksum on a large file, what technique is generally used to combine the various MD5 values into a single value? Do you just add them together? I'm not really interested in any particular language, library or API which will do this; rather I'm just interested in the technique behind it. Can someone explain how it is done?

Given the following algorithm in pseudo-code:

MD5Digest X
for each file segment F
   MD5Digest Y = CalculateMD5(F)
   Combine(X,Y)

But what exactly would Combine do? Does it add the two MD5 digests together, or what?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

執念 2024-08-27 11:44:18

为了计算太大而无法放入内存的文件的 MD5 值

考虑到这一点，您不想“组合”两个 MD5 哈希值。对于任何 MD5 实现，您都会有一个保留当前校验和状态的对象。因此，您可以随时提取 MD5 校验和，这在对具有相同开头的两个文件进行哈希处理时非常方便。对于大文件，您只需继续输入数据 - 一次或分块散列文件没有什么区别，因为状态会被记住。在这两种情况下，您都会得到相同的哈希值。

回复收藏 0 原文

沧笙踏歌 2024-08-27 11:44:18

MD5是一种迭代算法。您不需要计算大量的小 MD5，然后以某种方式将它们组合起来。您只需读取文件的小块，然后将它们添加到摘要中，因此您不必将整个文件一次性保存在内存中。这是一个java实现。

FileInputStream f = new FileInputStream(new File("bigFile.txt"));
MessageDigest digest = MessageDigest.getInstance("md5");
byte[] buffer = new byte[8192];
int len = 0;
while (-1 != (len = f.read(buffer))) {
   digest.update(buffer,0,len);
}
byte[] md5hash = digest.digest();

等等瞧。您拥有整个文件的 MD5，而无需将整个文件一次性存储在内存中。

值得注意的是，如果由于某种原因您确实需要文件各部分的 MD5 哈希值（这有时对于通过低带宽连接传输的大文件进行临时检查很有用），那么您可以通过克隆来获取它们随时更新摘要对象，如下所示

byte[] interimHash = ((MessageDigest)digest.clone()).digest();

这不会影响实际的摘要对象，因此您可以继续使用整体 MD5 哈希值。

还值得注意的是，MD5 是用于加密目的（例如验证来自不受信任来源的文件真实性）的过时哈希值，在大多数情况下应替换为更好的哈希值，例如 SHA-1。对于非加密目的，例如验证两个可信源之间的文件完整性，MD5 仍然足够。

MD5 is an iterative algorithm. You don't need to calculate a ton of small MD5's and then combine them somehow. You just read small chunks of the the file and add them to the digest as your're going, so you never have to have the entire file in memory at once. Here's a java implementation.

FileInputStream f = new FileInputStream(new File("bigFile.txt"));
MessageDigest digest = MessageDigest.getInstance("md5");
byte[] buffer = new byte[8192];
int len = 0;
while (-1 != (len = f.read(buffer))) {
   digest.update(buffer,0,len);
}
byte[] md5hash = digest.digest();

Et voila. You have the MD5 of an entire file without ever having the whole file in memory at once.

Its worth noting that if for some reason you do want MD5 hashes of subsections of the file as you go along (this is sometimes useful for doing interim checks on a large file being transferred over a low bandwidth connection) then you can get them by cloning the digest object at any time, like so

byte[] interimHash = ((MessageDigest)digest.clone()).digest();

This does not affect the actual digest object so you can continue to work with the overall MD5 hash.

Its also worth noting that MD5 is an outdated hash for cryptographic purposes (such as verifying file authenticity from an untrusted source) and should be replaced with something better in most circumstances, such as SHA-1. For non-cryptographic purposes, such as verifying file integrity between two trusted sources, MD5 is still adequate.

回复收藏 0 原文

吾性傲以野 2024-08-27 11:44:18

AndiDog 的答案的 Python 2.7 示例。文件 123.txt 有多行。

>>> import hashlib
>>> md5_A, md5_B, md5_C = hashlib.md5(), hashlib.md5(), hashlib.md5()
>>> with open('123.txt', 'r') as f_r:
...     md5_A.update(f_r.read()) # read whole contents
... 
>>> with open('123.txt', 'r') as f_r:
...     for line in f_r: # read file line by line
...         md5_B.update(line)
... 
>>> with open('123.txt', 'r') as f_r:
...     while True: # read file chunk by chunk
...         chunk = f_r.read(10)
...         if not chunk: break
...         md5_C.update(chunk)
... 
>>> md5_A.hexdigest()
'5976ddfa19bc2e1669ac3bd836101f58'
>>> md5_B.hexdigest()
'5976ddfa19bc2e1669ac3bd836101f58'
>>> md5_C.hexdigest()
'5976ddfa19bc2e1669ac3bd836101f58'

对于内存无法容纳的大文件，可以逐行或逐块读取。 MD5 的一种用途是当 diff 命令失败时比较两个大文件。

A Python 2.7 example for AndiDog's answer. File 123.txt has multiple lines.

>>> import hashlib
>>> md5_A, md5_B, md5_C = hashlib.md5(), hashlib.md5(), hashlib.md5()
>>> with open('123.txt', 'r') as f_r:
...     md5_A.update(f_r.read()) # read whole contents
... 
>>> with open('123.txt', 'r') as f_r:
...     for line in f_r: # read file line by line
...         md5_B.update(line)
... 
>>> with open('123.txt', 'r') as f_r:
...     while True: # read file chunk by chunk
...         chunk = f_r.read(10)
...         if not chunk: break
...         md5_C.update(chunk)
... 
>>> md5_A.hexdigest()
'5976ddfa19bc2e1669ac3bd836101f58'
>>> md5_B.hexdigest()
'5976ddfa19bc2e1669ac3bd836101f58'
>>> md5_C.hexdigest()
'5976ddfa19bc2e1669ac3bd836101f58'

For large file that can't fit in memory, it can be read line by line or chunk by chunk. One usage of this MD5 is comparing two large files when diff command fails.

回复收藏 0 原文

残疾 2024-08-27 11:44:18

openSSL 库允许您将数据块添加到正在进行的哈希 (sha1/md5) 中，然后当您完成添加所有数据后，您可以调用 Final 方法，它将输出最终哈希。

您不需要计算每个单独块的 md5，然后添加它，而是将数据添加到 openssl 库中正在进行的哈希方法中。然后，这将为您提供所有单个数据块的 md5 哈希值，对输入数据大小没有限制。

http://www.openssl.org/docs/crypto/md5.html#< /a>

回复收藏 0 原文

如梦亦如幻 2024-08-27 11:44:18

这个问题没有多大意义，因为 MD5 算法接受任意长度的输入。一个像样的库应该具有功能，这样您就不必一次添加整个消息，因为消息被分解为按顺序散列的块，正在处理的块仅取决于前一个的结果散列环形。

维基百科文章中的伪代码应该概述该算法的工作原理。

回复收藏 0 原文

壹場煙雨 2024-08-27 11:44:18

大多数摘要计算实现允许您以较小的块的形式向它们提供数据。您无法以结果等于整个输入的 MD5 的方式组合多个 MD5 摘要。 MD5 会进行一些填充，并在最后阶段使用已处理的字节数，这使得原始引擎状态无法从最终摘要值恢复。

回复收藏 0 原文

对岸观火 2024-08-27 11:44:18

这是一种组合哈希的 C# 方法。让我们创建扩展方法来简化用户代码。

public static class MD5Append
{
    public static int Append(this MD5 md5, byte[] data)
    {
        return md5.TransformBlock(data, 0, data.Length, data, 0);
    }

    public static void AppendFinal(this MD5 md5, byte[] data)
    {
        md5.TransformFinalBlock(data, 0, data.Length);
    }
}

用法：

   using (var md5 = MD5CryptoServiceProvider.Create("MD5"))
        {
            md5.Initialize();

            var abcBytes = Encoding.Unicode.GetBytes("abc");
            md5.Append(abcBytes);
            md5.AppendFinal(abcBytes);

            var h1 = md5.Hash;

            md5.Initialize(); // mandatory
            var h2= md5.ComputeHash(Encoding.Unicode.GetBytes("abcabc"));

            Console.WriteLine(Convert.ToBase64String(h1));
            Console.WriteLine(Convert.ToBase64String(h2));
        }

h1和h2相同。就是这样。

Here is a C# way to combine hash. Let's make extention methods to simplify the user code.

public static class MD5Append
{
    public static int Append(this MD5 md5, byte[] data)
    {
        return md5.TransformBlock(data, 0, data.Length, data, 0);
    }

    public static void AppendFinal(this MD5 md5, byte[] data)
    {
        md5.TransformFinalBlock(data, 0, data.Length);
    }
}

Usage:

   using (var md5 = MD5CryptoServiceProvider.Create("MD5"))
        {
            md5.Initialize();

            var abcBytes = Encoding.Unicode.GetBytes("abc");
            md5.Append(abcBytes);
            md5.AppendFinal(abcBytes);

            var h1 = md5.Hash;

            md5.Initialize(); // mandatory
            var h2= md5.ComputeHash(Encoding.Unicode.GetBytes("abcabc"));

            Console.WriteLine(Convert.ToBase64String(h1));
            Console.WriteLine(Convert.ToBase64String(h2));
        }

h1 and h2 are the same. That's it.

回复收藏 0 原文

~没有更多了~