如何在 vb.net 中加速 MD5 校验和的生成?

发布于 2024-08-25 14:28:00 字数 449 浏览 7 评论 0原文

我正在处理 P2(松下)卡上的一些非常大的文件。我们采用的过程的一部分是首先生成我们要复制的文件的校验和,然后复制该文件,然后对文件运行校验和以确认复制正确。问题是,文件很大(70 GB+)并且需要很长时间才能完成。这是一个问题,因为我们最终将处理数千个这样的文件。

我想找到一种比使用 System.Security.Cryptography.MD5CryptoServiceProvider 更快的方法来生成校验和 我不在乎这是否意味着使用专门的硬件卡,只要它能工作并且价格不是太贵。我更希望有一种编码方法,可以提供一些有关该过程进行了多远的反馈,以便我可以像现在一样显示它。

该应用程序是用 vb.net 编写的。我希望能够将其用作应用程序中的组件、库、参考,但如果生成校验和的速度有足够的改进,我愿意调用外部应用程序。

不用说,校验和必须一致且正确。 :-)

预先感谢您付出的时间和努力,

理查德

I'm working with some very large files residing on P2 (Panasonic) cards. Part of the process we employ is to first generate a checksum of the file we are going to copy, then copy the file, then run a checksum on the file to confirm that it copied OK. The problem is, is that files are large (70 GB+) and take a long time to complete. It's an issue since we will eventually be dealing with thousands of these files.

I would like to find a faster way to generate the checksum other than using the System.Security.Cryptography.MD5CryptoServiceProvider
I don't care if this means using a specialized hardware card, provided it works and is not to ungodly expensive. I would prefer to have a method of encoding that provided some feedback as to how far the process has gone along so I can display it like I do now.

The application is written in vb.net. I would prefer to be able to use it as component, library, reference within my application, but I'm willing to call an outside application if there is enough improvement in the speed of generating the checksum.

Needless to say, the checksum must be consistent and correct. :-)

Thank you in advance for your time and efforts,

Richard

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爺獨霸怡葒院 2024-09-01 14:28:00

我发现一种加速此过程的潜在方法:在执行复制时计算源文件的 MD5,而不是在执行复制之前。这会将您需要读取整个文件的次数从 3 次(源哈希、副本、目标哈希)减少到 2 次(副本、目标哈希)。

这一切的缺点是您必须编写自己的复制代码(而不是仅仅依赖 System.IO.File.Copy),并且有非零的机会这会变得更慢无论如何都要结束比三步过程。

除此之外,我认为您可以在这里做很多事情,因为整个过程都是 I/O 设计限制的。您大部分时间都花在读/写文件上,即使以 100MB/s(对于典型 SATA 驱动器来说是相当不错的 I/O 速度),您最多也只能达到 5.8GB/分钟。

使用现代处理器,计算 MD5(或其他任何内容)的开销不会对事情产生太大影响,因此加速它不会提高您的整体吞吐量。加密加速器在这里尤其无济于事,因为除非驱动程序实现非常高效,否则由于将数据提供给外部卡所需的上下文切换,它们会增加比它们节省的更多的开销。

您真正想要提高的是 I/O 速度。 .NET 框架在这方面已经相当高效(使用大小合适的缓冲区、重叠 I/O 等),但优化的本机 Windows 应用程序可能会在这里表现得更好。我的建议:Google 左右一些原生 MD5 计算器,并查看它们与您当前的 .NET 实现相比如何。如果哈希计算速度的差异> 10%,则值得切换到使用所述外部应用程序。

I see one potential way to speed up this process: calculate the MD5 of the source file while performing the copy, not prior to it. This will reduce the number of times you'll need to read the entire file from 3 (source hash, copy, destination hash) to 2 (copy, destination hash).

The downside of this all is that you'll have to write your own copying code (as opposed to just relying on System.IO.File.Copy), and there's a non-zero chance that this will turn out to be slower in the end anyway than the 3-step process.

Other than that, I don't think there's much you can do here, as the entire process is I/O bound by design. You're spending most of your time reading/writing the file, and even at 100MB/s (a respectable I/O speed for your typical SATA drive), you'll do about 5.8GB/min at best.

With a modern processor, the overhead of calculating the MD5 (or anything else) doesn't factor into things very much, so speeding it up won't improve your overall throughput. Crypto accelerators in particular won't help you here, as unless the driver implementation is very efficient, they'll add more overhead due to context switches required to feed the data to the external card than they'll save.

What you do want to improve is the I/O speed. The .NET framework is already pretty efficient when it comes to this (using nicely-sized buffers, overlapped I/O and such), but it's possible an optimized native Windows application will perform better here. My advice: Google around for a few native MD5 calculators, and see how they compare to your current .NET implementation. If the difference in hash calculation speed is >10%, it's worth switching to using said external app.

雨夜星沙 2024-09-01 14:28:00

正确的答案是避免使用 MD5。 MD5 是一种加密哈希函数,旨在提供某些加密功能。仅用于检测意外损坏,它设计过度且速度缓慢。有许多更快的校验和,其设计可以通过检查错误检测和纠正的文献来理解。一些常见的例子是 CRC 校验和,其中 CRC32 很常见,但您也可以相对轻松计算 64 或 128 位甚至更大的 CRC,比 MD5 哈希快得多。

The correct answer is to avoid using MD5. MD5 is a cryptographic hash function, designed to provide certain cryptographic features. For merely detecting accidental corruption, it is way over-engineered and slow. There are many faster checksums, the design of which can be understood by examining the literature of error detection and correction. Some common examples are the CRC checksums, of which CRC32 is very common, but you can also relatively easily compute 64 or 128 bit or even larger CRCs much much faster than an MD5 hash.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文