C# 通过网络进行高速 MD5/SHA 哈希

发布于 2024-10-13 05:15:28 字数 1067 浏览 2 评论 0原文

在我目前正在进行的一个C#项目中，我们正在尝试计算网络上大量文件的MD5（当前罐子是270万，客户端罐子可能超过1000万）。由于我们正在处理的文件数量较多，因此速度是一个问题。

我们这样做的原因是验证文件是否已复制到其他位置而不进行修改。

我们目前使用以下代码来计算文件的MD5

MD5 md5 = new MD5CryptoServiceProvider();
StringBuilder sb = new StringBuilder();

byte[] hashMD5 = null;

try
{
   // Open stream to file to get MD5 hash for, create hash
   using (FileStream fsMD5 = new FileStream(sFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
      hashMD5 = md5.ComputeHash(fsMD5);
}
catch (Exception ex)
{
   clsLogging.logError(clsLogging.ErrorLevel.ERROR, ex);
}

string md5sum = "";
if (hashMD5 != null)
{
   // Change hash into readable text
   foreach (byte hex in hashMD5)
      sb.Append(hex.ToString("x2"));
    md5sum = sb.ToString();
}

，但是，这个速度并不是我的经理所希望的。我们对计算 MD5 的文件方式和数量进行了多次更改（即我们不会对不复制的文件执行此操作...直到今天我的经理改变了主意，所以所有文件必须有一个为它们计算的MD5，以防将来某个时候客户希望对我们的程序进行窃听，所以我猜所有文件都会被复制）

我意识到网络的速度可能是主要影响因素（100Mbit/s）。有没有一种有效的方法来计算网络上文件内容的 MD5？

提前致谢。 Trevor Watson

编辑：将所有代码放入块中，而不仅仅是其中的一部分。

原文

In a C# project that I am currently working on, we're attempting to calculate the MD5 of a large quantity of files over a network (current pot is 2.7 million, client pot may be in excess of 10 million). With the number of files that we are processing, speed is of the issue.

The reason we do this is to verify the file was copied to a different location without modification.

We currently use the following code to calculate the MD5 of a file

MD5 md5 = new MD5CryptoServiceProvider();
StringBuilder sb = new StringBuilder();

byte[] hashMD5 = null;

try
{
   // Open stream to file to get MD5 hash for, create hash
   using (FileStream fsMD5 = new FileStream(sFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
      hashMD5 = md5.ComputeHash(fsMD5);
}
catch (Exception ex)
{
   clsLogging.logError(clsLogging.ErrorLevel.ERROR, ex);
}

string md5sum = "";
if (hashMD5 != null)
{
   // Change hash into readable text
   foreach (byte hex in hashMD5)
      sb.Append(hex.ToString("x2"));
    md5sum = sb.ToString();
}

However, the speed of this isn't what my manager has been hoping for. We've gone through a number of changes to the way and number of files that we calculate the MD5 for (i.e. we don't do it for files that we don't copy... until today when my manager changed his mind so ALL files must have a MD5 calculated for them, in case at some future time a client wishes to bugger with our program so all files are copied i guess)

I realize that the speed of the network is probably a major contributing factor (100Mbit/s). Is there an efficient way to calculate the MD5 of the contents of a file over a network?

Thanks in advance.
Trevor Watson

Edit: put all code in block instead of just a part of it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

十六岁半 2024-10-20 05:15:28

瓶颈是整个文件必须通过网络进行流式传输/复制，并且您的看起来不错......
不同的哈希函数 (md5/sha256/sha512) 具有几乎相同的计算时间

此问题的两种可能的解决方案：

1) 在远程系统上运行哈希器并将哈希值存储到单独的文件中 - 如果在您的环境中可能的话。

2) 创建文件的部分散列，以便您只复制文件的一部分。
我的意思是这样的：

part1Hash = md5(file.getXXXBytesFromFileAtPosition1)
part2Hash = md5(file.getXXXBytesFromFileAtPosition2)
part3Hash = md5(file.getXXXBytesFromFileAtPosition3)
finalHash = part1Hash ^ part2Hash ^ part3Hash;

您必须测试文件的哪一部分最适合读取，以便哈希值保持唯一。

希望有帮助...

编辑：更改为按位异或

The bottleneck is that the whole file must be streamed/copied over the network, and your seems to look good...
the different hash functions (md5/sha256/sha512) have almost the same computation time

Two possible solutions for this problem:

1) run a hasher on the remote system and store the hashes in to separate files - if that is possible in your environment.

2) Create a part-wise hash of the file, so that you only copy a part of the file.
I mean something like that:

part1Hash = md5(file.getXXXBytesFromFileAtPosition1)
part2Hash = md5(file.getXXXBytesFromFileAtPosition2)
part3Hash = md5(file.getXXXBytesFromFileAtPosition3)
finalHash = part1Hash ^ part2Hash ^ part3Hash;

you have to test which part of the file are optimal to read, so the hashes stay unique.

hope that helps...

edit: changed to bitwise xor

回复收藏 0 原文