当前位置：文江博客话题详情

可以使用缓冲读取来计算 MD5（或其他）哈希值吗？

发布于 2024-08-18 13:02:09 字数 844 浏览 7 评论 0 原文

我需要计算相当大的文件（千兆字节）的校验和。这可以使用以下方法来完成：

    private byte[] calcHash(string file)
    {
        System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
        FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
        byte[] hash = ha.ComputeHash(fs);
        fs.Close();
        return hash;
    }

但是，文件通常是预先以缓冲方式写入的（例如一次写入 32mb）。我确信我看到了哈希函数的重写，它允许我在写入的同时计算 MD5（或其他）哈希，即：计算一个缓冲区的哈希，然后将生成的哈希输入下一次迭代。

像这样的东西：（伪代码式）

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
   buffer = readFromSourceFile();
   writefile(buffer);
   hash = calchash(buffer, hash);
}

哈希现在类似于通过在整个文件上运行 calcHash 函数来完成的操作。

现在，我在 .Net 3.5 Framework 中找不到任何类似的覆盖，我是在做梦吗？它从来没有存在过，还是我只是不擅长搜索？一次进行写入和校验和计算的原因是因为由于文件很大，所以这是有意义的。

原文

I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:

    private byte[] calcHash(string file)
    {
        System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
        FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
        byte[] hash = ha.ComputeHash(fs);
        fs.Close();
        return hash;
    }

However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.

Something like this: (pseudocode-ish)

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
   buffer = readFromSourceFile();
   writefile(buffer);
   hash = calchash(buffer, hash);
}

hash is now sililar to what would be accomplished by running the calcHash function on the entire file.

Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娇妻 2024-08-25 13:02:09

您可以使用 TransformBlock 和 TransformFinalBlock 方法来处理块中的数据。

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

注意：它可以（至少与 MD5 提供程序一起）将所有块发送到 TransformBlock，然后将空块发送到 TransformFinalBlock 来完成该过程。

You use the TransformBlock and TransformFinalBlock methods to process the data in chunks.

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock and then send an empty block to TransformFinalBlock to finalise the process.

回复收藏 0 原文

随波逐流 2024-08-25 13:02:09

我喜欢上面的答案，但为了完整起见，并且作为更通用的解决方案，请参阅 CryptoStream 类。如果您已经在处理流，则可以轻松地将流包装在 CryptoStream 中，并将 HashAlgorithm 作为 ICryptoTransform 参数传递。

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
    buffer = Get32MB();
    cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));

您可能必须在获取哈希之前关闭流（以便 HashAlgorithm 知道它已完成）。

I like the answer above but for the sake of completeness, and being a more general solution, refer to the CryptoStream class. If you are already handling streams, it is easy to wrap your stream in a CryptoStream, passing a HashAlgorithm as the ICryptoTransform parameter.

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
    buffer = Get32MB();
    cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));

You might have to close the stream before getting the hash (so the HashAlgorithm knows it's done).

回复收藏 0 原文

倦话 2024-08-25 13:02:09

似乎您可以使用 TransformBlock / TransformFinalBlock，如本示例所示：散列大文件时显示进度更新

回复收藏 0 原文

放赐 2024-08-25 13:02:09

哈希算法预计可以处理这种情况，并且通常使用 3 个函数来实现：

hash_init() - 调用以分配资源并开始哈希。
hash_update() - 在新数据到达时调用它。
hash_final() - 完成计算并释放资源。

查看http://www.openssl.org/docs/crypto/md5.html 或 http://www.openssl.org/docs/crypto/sha.html 提供良好的标准 C 示例；我确信您的平台有类似的库。

回复收藏 0 原文

青巷忧颜 2024-08-25 13:02:09

我只需要做类似的事情，但想异步读取文件。它使用 TransformBlock 和 TransformFinalBlock 并给我与 Azure 一致的答案，所以我认为它是正确的！

private static async Task<string> CalculateMD5Async(string fullFileName)
{
  var block = ArrayPool<byte>.Shared.Rent(8192);
  try
  {
     using (var md5 = MD5.Create())
     {
         using (var stream = new FileStream(fullFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192, true))
         {
            int length;
            while ((length = await stream.ReadAsync(block, 0, block.Length).ConfigureAwait(false)) > 0)
            {
               md5.TransformBlock(block, 0, length, null, 0);
            }
            md5.TransformFinalBlock(block, 0, 0);
         }
         var hash = md5.Hash;
         return Convert.ToBase64String(hash);
      }
   }
   finally
   {
      ArrayPool<byte>.Shared.Return(block);
   }
}

I've just had to do something similar, but wanted to read the file asynchronously. It's using TransformBlock and TransformFinalBlock and is giving me answers consistent with Azure, so I think it is correct!

private static async Task<string> CalculateMD5Async(string fullFileName)
{
  var block = ArrayPool<byte>.Shared.Rent(8192);
  try
  {
     using (var md5 = MD5.Create())
     {
         using (var stream = new FileStream(fullFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192, true))
         {
            int length;
            while ((length = await stream.ReadAsync(block, 0, block.Length).ConfigureAwait(false)) > 0)
            {
               md5.TransformBlock(block, 0, length, null, 0);
            }
            md5.TransformFinalBlock(block, 0, 0);
         }
         var hash = md5.Hash;
         return Convert.ToBase64String(hash);
      }
   }
   finally
   {
      ArrayPool<byte>.Shared.Return(block);
   }
}

回复收藏 0 原文

~没有更多了~