可以使用缓冲读取来计算 MD5(或其他)哈希值吗?

发布于 2024-08-18 13:02:09 字数 844 浏览 7 评论 0 原文

我需要计算相当大的文件(千兆字节)的校验和。这可以使用以下方法来完成:

    private byte[] calcHash(string file)
    {
        System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
        FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
        byte[] hash = ha.ComputeHash(fs);
        fs.Close();
        return hash;
    }

但是,文件通常是预先以缓冲方式写入的(例如一次写入 32mb)。我确信我看到了哈希函数的重写,它允许我在写入的同时计算 MD5(或其他)哈希,即:计算一个缓冲区的哈希,然后将生成的哈希输入下一次迭代。

像这样的东西:(伪代码式)

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
   buffer = readFromSourceFile();
   writefile(buffer);
   hash = calchash(buffer, hash);
}

哈希现在类似于通过在整个文件上运行 calcHash 函数来完成的操作。

现在,我在 .Net 3.5 Framework 中找不到任何类似的覆盖,我是在做梦吗?它从来没有存在过,还是我只是不擅长搜索?一次进行写入和校验和计算的原因是因为由于文件很大,所以这是有意义的。

I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:

    private byte[] calcHash(string file)
    {
        System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
        FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
        byte[] hash = ha.ComputeHash(fs);
        fs.Close();
        return hash;
    }

However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.

Something like this: (pseudocode-ish)

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
   buffer = readFromSourceFile();
   writefile(buffer);
   hash = calchash(buffer, hash);
}

hash is now sililar to what would be accomplished by running the calcHash function on the entire file.

Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

娇妻 2024-08-25 13:02:09

您可以使用 TransformBlockTransformFinalBlock 方法来处理块中的数据。

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

注意:它可以(至少与 MD5 提供程序一起)将所有块发送到 TransformBlock,然后将空块发送到 TransformFinalBlock 来完成该过程。

You use the TransformBlock and TransformFinalBlock methods to process the data in chunks.

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock and then send an empty block to TransformFinalBlock to finalise the process.

随波逐流 2024-08-25 13:02:09

我喜欢上面的答案,但为了完整起见,并且作为更通用的解决方案,请参阅 CryptoStream 类。如果您已经在处理流,则可以轻松地将流包装在 CryptoStream 中,并将 HashAlgorithm 作为 ICryptoTransform 参数传递。

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
    buffer = Get32MB();
    cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));

您可能必须在获取哈希之前关闭流(以便 HashAlgorithm 知道它已完成)。

I like the answer above but for the sake of completeness, and being a more general solution, refer to the CryptoStream class. If you are already handling streams, it is easy to wrap your stream in a CryptoStream, passing a HashAlgorithm as the ICryptoTransform parameter.

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
    buffer = Get32MB();
    cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));

You might have to close the stream before getting the hash (so the HashAlgorithm knows it's done).

倦话 2024-08-25 13:02:09

似乎您可以使用 TransformBlock / TransformFinalBlock,如本示例所示:散列大文件时显示进度更新

Seems you can to use TransformBlock / TransformFinalBlock, as shown in this sample: Displaying progress updates when hashing large files

放赐 2024-08-25 13:02:09

哈希算法预计可以处理这种情况,并且通常使用 3 个函数来实现:

hash_init() - 调用以分配资源并开始哈希。
hash_update() - 在新数据到达时调用它。
hash_final() - 完成计算并释放资源。

查看http://www.openssl.org/docs/crypto/md5.htmlhttp://www.openssl.org/docs/crypto/sha.html 提供良好的标准 C 示例;我确信您的平台有类似的库。

Hash algorithms are expected to handle this situation and are typically implemented with 3 functions:

hash_init() - Called to allocate resources and begin the hash.
hash_update() - Called with new data as it arrives.
hash_final() - Complete the calculation and free resources.

Look at http://www.openssl.org/docs/crypto/md5.html or http://www.openssl.org/docs/crypto/sha.html for good, standard examples in C; I'm sure there are similar libraries for your platform.

青巷忧颜 2024-08-25 13:02:09

我只需要做类似的事情,但想异步读取文件。它使用 TransformBlock 和 TransformFinalBlock 并给我与 Azure 一致的答案,所以我认为它是正确的!

private static async Task<string> CalculateMD5Async(string fullFileName)
{
  var block = ArrayPool<byte>.Shared.Rent(8192);
  try
  {
     using (var md5 = MD5.Create())
     {
         using (var stream = new FileStream(fullFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192, true))
         {
            int length;
            while ((length = await stream.ReadAsync(block, 0, block.Length).ConfigureAwait(false)) > 0)
            {
               md5.TransformBlock(block, 0, length, null, 0);
            }
            md5.TransformFinalBlock(block, 0, 0);
         }
         var hash = md5.Hash;
         return Convert.ToBase64String(hash);
      }
   }
   finally
   {
      ArrayPool<byte>.Shared.Return(block);
   }
}

I've just had to do something similar, but wanted to read the file asynchronously. It's using TransformBlock and TransformFinalBlock and is giving me answers consistent with Azure, so I think it is correct!

private static async Task<string> CalculateMD5Async(string fullFileName)
{
  var block = ArrayPool<byte>.Shared.Rent(8192);
  try
  {
     using (var md5 = MD5.Create())
     {
         using (var stream = new FileStream(fullFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192, true))
         {
            int length;
            while ((length = await stream.ReadAsync(block, 0, block.Length).ConfigureAwait(false)) > 0)
            {
               md5.TransformBlock(block, 0, length, null, 0);
            }
            md5.TransformFinalBlock(block, 0, 0);
         }
         var hash = md5.Hash;
         return Convert.ToBase64String(hash);
      }
   }
   finally
   {
      ArrayPool<byte>.Shared.Return(block);
   }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文