MD5在C++中的快速实现
首先,需要明确的是,我知道 C++ 中存在大量 MD5 实现。这里的问题是我想知道是否有比较哪个实现比其他实现更快。由于我在大小超过 10GB 的文件上使用此 MD5 哈希函数,因此速度确实是这里的一个主要问题。
First of all, to be clear, I'm aware that a huge number of MD5 implementations exist in C++. The problem here is I'm wondering if there is a comparison of which implementation is faster than the others. Since I'm using this MD5 hash function on files with size larger than 10GB, speed indeed is a major concern here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
此处提供的表格:
http://www.golubev.com/gpuest.htm
看起来可能是您的瓶颈将是你的硬盘IO
table available here:
http://www.golubev.com/gpuest.htm
looks like probably your bottleneck will be your harddrive IO
我认为 avakar 想要表达的观点是:凭借现代的处理能力,硬盘的 IO 速度才是瓶颈,而不是哈希的计算。获得更有效的算法不会对您有帮助,因为这(可能)不是最慢的点。
如果您正在做任何特殊的事情(例如 1000 轮),那么它可能会有所不同,但如果您只是计算文件的哈希值。你需要加速你的 IO,而不是你的数学。
I think the point avakar is trying to make is: with modern processing power the IO speed of your hard drive is the bottleneck not the calculation of the hash. Getting a more efficient algorithm will not help you as that is not (likely) the slowest point.
If you are doing anything special (1000's of rounds for example) then it may be different, but if you are just calculating a hash of a file. You need to speed up your IO, not your math.
我认为这并不重要(在相同的硬件上;但对于此类问题,GPGPU 确实是不同的,而且可能更快)。 md5 的主要部分是一个相当复杂的复杂算术运算循环。重要的是编译器优化的质量。
同样重要的是您如何阅读该文件。在 Linux 上, mmap 和 madvise 和 预读 可能是相关的。磁盘速度可能是瓶颈(如果可以的话,使用 SSD)。
你确定你特别想要 md5 吗?还有更简单、更快的哈希编码算法(md4 等)。您的问题仍然是 I/O 限制多于 CPU 限制。
I don't think it matters much (on the same hardware; but indeed GPGPU-s are different, and perhaps faster, hardware for that kind of problem). The main part of md5 is a quite complex loop of complex arithmetic operations. What does matter is the quality of compiler optimizations.
And what does also matter is how you read the file. On Linux, mmap and madvise and readahead could be relevant. Disk speed is probably the bottleneck (use an SSD if you can).
And are you sure you want md5 specifically? There are simpler and faster hash coding algorithms (md4, etc.). Still your problem is more I/O bound than CPU bound.
我确信该算法有很多 CUDA/OpenCL 改编版本,应该会给您带来明显的加速。您也可以采用基本算法并思考一下 ->开始 CUDA/OpenCL 实施。
分组密码是此类实现的完美候选者。
您还可以获得它的 C 实现,并获取英特尔 C 编译器的副本,看看它有多好。英特尔 CPU 中的矢量化扩展对于速度的提升非常惊人。
I'm sure there are plenty of CUDA/OpenCL adaptations of the algorithm out there which should give you a definite speedup. You could also take the basic algorithm and think a bit -> get a CUDA/OpenCL implementation going.
Block-ciphers are perfect candidates for this type of implementation.
You could also get a C implementation of it and grab a copy of the Intel C compiler and see how good that is. The vectorization extensions in Intel CPUs are amazing for speed boosts.