C# 通过网络进行高速 MD5/SHA 哈希
在我目前正在进行的一个C#项目中,我们正在尝试计算网络上大量文件的MD5(当前罐子是270万,客户端罐子可能超过1000万)。由于我们正在处理的文件数量较多,因此速度是一个问题。
我们这样做的原因是验证文件是否已复制到其他位置而不进行修改。
我们目前使用以下代码来计算文件的MD5
MD5 md5 = new MD5CryptoServiceProvider();
StringBuilder sb = new StringBuilder();
byte[] hashMD5 = null;
try
{
// Open stream to file to get MD5 hash for, create hash
using (FileStream fsMD5 = new FileStream(sFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
hashMD5 = md5.ComputeHash(fsMD5);
}
catch (Exception ex)
{
clsLogging.logError(clsLogging.ErrorLevel.ERROR, ex);
}
string md5sum = "";
if (hashMD5 != null)
{
// Change hash into readable text
foreach (byte hex in hashMD5)
sb.Append(hex.ToString("x2"));
md5sum = sb.ToString();
}
,但是,这个速度并不是我的经理所希望的。我们对计算 MD5 的文件方式和数量进行了多次更改(即我们不会对不复制的文件执行此操作...直到今天我的经理改变了主意,所以所有文件必须有一个为它们计算的MD5,以防将来某个时候客户希望对我们的程序进行窃听,所以我猜所有文件都会被复制)
我意识到网络的速度可能是主要影响因素(100Mbit/s)。有没有一种有效的方法来计算网络上文件内容的 MD5?
提前致谢。 Trevor Watson
编辑:将所有代码放入块中,而不仅仅是其中的一部分。
In a C# project that I am currently working on, we're attempting to calculate the MD5 of a large quantity of files over a network (current pot is 2.7 million, client pot may be in excess of 10 million). With the number of files that we are processing, speed is of the issue.
The reason we do this is to verify the file was copied to a different location without modification.
We currently use the following code to calculate the MD5 of a file
MD5 md5 = new MD5CryptoServiceProvider();
StringBuilder sb = new StringBuilder();
byte[] hashMD5 = null;
try
{
// Open stream to file to get MD5 hash for, create hash
using (FileStream fsMD5 = new FileStream(sFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
hashMD5 = md5.ComputeHash(fsMD5);
}
catch (Exception ex)
{
clsLogging.logError(clsLogging.ErrorLevel.ERROR, ex);
}
string md5sum = "";
if (hashMD5 != null)
{
// Change hash into readable text
foreach (byte hex in hashMD5)
sb.Append(hex.ToString("x2"));
md5sum = sb.ToString();
}
However, the speed of this isn't what my manager has been hoping for. We've gone through a number of changes to the way and number of files that we calculate the MD5 for (i.e. we don't do it for files that we don't copy... until today when my manager changed his mind so ALL files must have a MD5 calculated for them, in case at some future time a client wishes to bugger with our program so all files are copied i guess)
I realize that the speed of the network is probably a major contributing factor (100Mbit/s). Is there an efficient way to calculate the MD5 of the contents of a file over a network?
Thanks in advance.
Trevor Watson
Edit: put all code in block instead of just a part of it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
瓶颈是整个文件必须通过网络进行流式传输/复制,并且您的看起来不错......
不同的哈希函数 (md5/sha256/sha512) 具有几乎相同的计算时间
此问题的两种可能的解决方案:
1) 在远程系统上运行哈希器并将哈希值存储到单独的文件中 - 如果在您的环境中可能的话。
2) 创建文件的部分散列,以便您只复制文件的一部分。
我的意思是这样的:
您必须测试文件的哪一部分最适合读取,以便哈希值保持唯一。
希望有帮助...
编辑:更改为按位异或
The bottleneck is that the whole file must be streamed/copied over the network, and your seems to look good...
the different hash functions (md5/sha256/sha512) have almost the same computation time
Two possible solutions for this problem:
1) run a hasher on the remote system and store the hashes in to separate files - if that is possible in your environment.
2) Create a part-wise hash of the file, so that you only copy a part of the file.
I mean something like that:
you have to test which part of the file are optimal to read, so the hashes stay unique.
hope that helps...
edit: changed to bitwise xor
一种可能的方法是利用 .Net 4.0 中的并行任务库。 100Mbps 仍然是一个瓶颈,但您应该会看到适度的改进。
我去年编写了一个小型应用程序,它遍历文件夹树的顶层,检查文件夹和文件的安全设置。在 10Mbps WAN 上运行大约需要 7 分钟才能完成我们的一个大型文件共享。当我并行操作时,执行时间减少到 1 分钟多一点。
One possible approach would be to make use of the parallel task library in .Net 4.0. 100Mbps will still be a bottleneck, but you should see a modest improvement.
I wrote a small application last year that walks the top levels of a folder tree checking folder and file security settings. Running over a 10Mbps WAN it took about 7 minutes to complete one of our large file shares. When I parallelised the operation the execution time came down to a bit over 1 minute.
为什么不尝试在每个监听端口上安装一个“客户端”,并在收到信号时计算所请求文件的 MD5 哈希值。
然后主服务器只需要要求每个客户端计算MD5。使用这种分布式方法,您将获得所有客户端的综合速度并减少网络拥塞。
Why don't you try installing a 'client' on each one which listens on a port and when signaled, will calculate the MD5 hash for the files requested.
The main server will then only need to ask each client to calculate the MD5. Using this distributed approach you will gain the combined speed of all the clients and reduce network congestion.