Java - 哈希算法 - 最快的实现
我想知道 Java 哈希算法的最佳和最快实现是什么,特别是 MD5 和 SHA-2 512 (SHA512) 或 256。我想要一个函数来获取字符串作为参数并返回哈希结果。谢谢你。
编辑:这是为了将每个 URL 映射到唯一的哈希值。由于 MD5 在这个领域不是那么可靠,所以我更感兴趣的是寻找最好的和最佳的。 SHA-2 算法的最快实现。请注意,我知道即使 SHA-2 也可能为某些 URL 生成相同的哈希值,但我可以接受这一点。
I want to know what is the best and fastest implementation of hash algorithms for Java especially MD5 and SHA-2 512 (SHA512) or 256. I want a function to get a string as an argument and return the hash as the result. Thak you.
Edit: This is for getting mapping each URL to a unique hash. Since MD5 is not that reliable in this area, I'm more interested in finding the best & fastest implementation for SHA-2 algorithms. Note that I know even SHA-2 might produce the same hash for some URLs but I can live with that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
首先要做的事情是:速度被高估了。在声明给定算法“太慢”之前,您应该采取措施。大多数时候,哈希函数的速度无论如何都没有明显的区别。如果你对安全性有疑虑,那么首先选择一个足够安全的哈希函数,然后只担心性能。
此外,您想要散列“字符串”。 Java
String
在内部是来自表示 Unicode 代码点的char
值数组的块(实际上,Unicode 16 位代码单元使用 UTF 对代码点进行编码) -16)。哈希函数将位或字节序列作为输入。因此,您必须执行转换步骤,例如str.getBytes("UTF-8")
,以获取一堆字节形式的字符串。与散列本身相比,转换步骤可能会产生不可忽略的成本。注意:小心 URL 编码!在 URL 中,某些字节可以替换为以“
%
”符号开头的序列;这是为了支持不可打印的字符,但它也可以用于“标准”字符(例如,将“a
”替换为“%61
”)。这意味着两个不同的字符串(在String.equals()
意义上)实际上可能表示相同的 URL(就 URL 处理而言)。根据您的情况,这可能是也可能不是问题。您应该首先尝试将 Java 的
MessageDigest
API 与标准(已安装)JCE 提供程序一起使用(即调用MessageDigest.getInstance("SHA-256")
),然后进行测试结果。理论上,JCE 可以将调用映射到使用“本机”代码(用 C 或汇编语言编写)的实现,这将比使用 Java 获得的速度更快。话虽这么说...
sphlib 是许多加密哈希函数的开源实现,采用 C 和 Java 语言。该代码已经针对速度进行了优化,并且在实践中,Java 版本比 Sun/Oracle 提供的标准 JRE 更快。使用此链接以防上一个链接失败(主主机服务器有时会停机进行维护,似乎是现在的情况)(警告:10 MB 下载)。该档案还包含一份报告(在第二次 SHA-3 会议上提交) 2010 年候选者会议),其中给出了 SHA-2 和即将推出的 SHA-3 的 14 个“第二轮”候选者在多个平台上的一些测量性能数据。
但你确实应该制定现场基准。例如,对 L1 缓存的影响会对性能产生巨大影响,并且无法通过获取功能代码并单独运行来准确预测。
First things first: speed is overrated. You should make measures before declaring that a given algorithm is "too slow". Most of the time, hash function speed makes no noticeable difference anyway. If you have qualms about security, then first select a hash function which is secure enough, and then only worry about performance.
Moreover, you want to hash "strings". A Java
String
is, internally, a chunk from an array ofchar
values which represent Unicode code points (actually, Unicode 16-bit code units which encode the code points using UTF-16). A hash function takes as input a sequence of bits or bytes. So you will have to make a conversion step, e.g.str.getBytes("UTF-8")
, to obtain your string as a bunch of bytes. It is likely that the conversion step will have a non-negligible cost when compared to the hashing itself.Note: beware of URL-encoding ! In a URL, some bytes can be replaced with sequences beginning with a '
%
' sign; this is meant to support non-printable characters, but it can be used on "standard" characters as well (e.g., replacing 'a
' with '%61
'). This means that two strings which are distinct (in theString.equals()
sense) may actually represent the same URL (as far as URL processing is concerned). Depending on your situation, this may or may not be an issue.You should first try to use Java's
MessageDigest
API with the standard (already installed) JCE provider (i.e. you callMessageDigest.getInstance("SHA-256")
), and bench the result. Theoretically, the JCE may map the call to an implementation with "native" code (written in C or assembly), which will be faster than what you can get with Java.That being said...
sphlib is an opensource implementation of many cryptographic hash functions, in C and in Java. The code has been optimized for speed, and, in practice, the Java version turns out to be faster than what the standard JRE from Sun/Oracle offers. Use this link in case the previous link fails (the main host server is sometimes down for maintenance, as seems to be the case right now)(warning: 10 MB download). The archive also contains a report (which was presented at the second SHA-3 candidate conference in 2010) which gives some measured performance figures on several platforms, for SHA-2 and the 14 "second round" candidates for the upcoming SHA-3.
But you really should make in-situation benchmarks. For instance, effects on L1 cache can have a drastic effect on performance, and cannot be accurately predicted by taking the function code and running it in isolation.
编辑:我最初将这个问题读为“最快的哈希算法”,它已被澄清为“每种算法的最快实现”。这是一个有效的问题,其他人指出了更快的实现。然而,除非您在短时间内对大量数据进行哈希处理,否则它根本不会有太大影响。我怀疑使用标准 JCE 提供的内容之外的其他内容通常是否值得花费时间和复杂性。
对于 URL 地址,您需要在现代硬件上使用 SHA-256 进行每秒百万次以上的哈希处理,才能要求更快的速度。我无法想象大多数应用程序每秒需要超过 1000 个(每天超过 8600 万个),这意味着用于散列的总体 CPU 时间将远低于 1%。因此,即使您拥有无限快的哈希算法,您最多也只能将整体性能提高 1%。
原始答案:获得最好和最快是相互矛盾的。更好的哈希值通常更慢。如果您确实需要速度并且安全性不是那么重要,那么请使用 MD5。如果您需要最好的安全性,请使用 SHA-256 甚至 SHA-512。您没有提到您使用它的用途,因此很难推荐其中之一。使用 SHA-256 可能是最安全的,因为无论如何它对于现代硬件上的大多数用例来说都应该足够快。操作方法如下:
如果您将其用于安全目的,例如对密码进行哈希处理,那么您也应该在摘要中添加盐。如果您想要从哈希中获得可打印的字符串,可以将其编码回十六进制字符串:
Edit: I originally read the question as what's "the fastest hash algorithm" and it has been clarified to be "the fastest implementation of each algorithm". It's a valid question and others have pointed out faster implementations. However unless you're hashing large amounts of data in a short amount of time, it's simply not going to matter very much. I doubt it's usually worth the time and complexity to use something other than what's provided with the standard JCE.
For URL addresses you'd need to be hashing with SHA-256 upward of a million per second on modern hardware to require something faster. I can't imagine most applications needing more than a thousand per second (over 86 million per day), which means the overall CPU time spent hashing would be far less than 1%. So even if you had an infinitely fast hash algorithm you'd only be able to improve overall performance by 1% at best.
Original Answer: Getting both the best and fastest are at odds with each other. The better hashes are generally slower. If you really need speed and security isn't as much of a concern then use MD5. If you need the best security then go with SHA-256 or even SHA-512. You haven't mentioned what you're using it for so it's hard to recommend one or the other. You're probably safest going with SHA-256, as it should be fast enough for most use cases on modern hardware anyhow. Here's how you can do it:
If you're using this for security purposes, like hashing a password, then you should add salt to the digest as well. If you want a printable string out of the hash, you can encode it back to a string as hex:
查看这些:很多 SHA / MD5 示例
另外:
来自同一线程:Fast MD5
String hash = MD5.asHex(MD5.getHash(new文件(文件名)));
Check these out: Lots of SHA / MD5 examples
Also:
From same thread: Fast MD5
String hash = MD5.asHex(MD5.getHash(new File(filename)));
另一件需要考虑的事情是使用 MD4。它不如 MD5 安全,但计算速度更快。 Windows 到 XP 之前都以 MD4 形式存储和交换密码,因此我们使用此哈希,因为它仍然允许我们向该平台提供身份验证服务。
Another thing to consider is using MD4. It is not as safe as MD5, but is computed even faster. Windows up to XP used to store and exchange passwords in MD4, so we use this hash because it still allows us to provide authentication services to this platform.
考虑 BLAKE2,它比上面提到的哈希更快、更安全。
MD5、SHA-1、SHA256 和 SHA-512 容易受到长度扩展的影响。
MD5 和 SHA-1 很容易发生冲突。
MD5 很容易受到选择前缀冲突的影响。
SHA-3 和 BLAKE2 没有已知的安全问题,并且可以生成不同长度的摘要。
SHA-3 在硬件中实现时速度最快;使用软件实现时,BLAKE2 速度最快。
BLAKE2b 针对 64 位平台进行了优化,可生成 1 到 64 字节之间任意大小的摘要。
BLAKE2s 针对 8 至 32 位平台进行了优化,可生成 1 至 32 字节之间任意大小的摘要。
以下是 AES、MD5、SHA-256 和 BLAKE2b 的基准。
https://blake2.net/
https://www.cryptopp.com/benchmarks.html
在第一个链接中,BLAKE2b (947 Mbits) 比 SHA-256 (413 Mbits) 和 MD5 ( 632 兆位)。
在第二条链路中,AES-256 CBC (805 Mbits) 和 BLAKE2b (776 Mbits) 的速度大致相等,并且比 SHA-256 (275 Mbits) 和 MD5 (602) Mbits 更快。
Consider BLAKE2 which is faster and more secure than the hashes mentioned above.
MD5, SHA-1, SHA256, and SHA-512 are susceptible to length-extension.
MD5 and SHA-1 are vulnerable to collisions.
MD5 is vulnerable to chosen-prefix collisions.
SHA-3 and BLAKE2 have no known security issues and can generate digests of varying length.
SHA-3 is fastest when implemented in hardware; BLAKE2 is fastest when using software implementations.
BLAKE2b is optimized for 64-bit platforms and produces digests of any size between 1 and 64 bytes.
BLAKE2s is optimized for 8 to 32-bit platforms and produces digests of any size between 1 and 32 bytes.
Here are benchmarks for AES, MD5, SHA-256, and BLAKE2b.
https://blake2.net/
https://www.cryptopp.com/benchmarks.html
In the first link, BLAKE2b (947 Mbits) is much faster than SHA-256 (413 Mbits) and MD5 (632 Mbits).
In the second link, AES-256 CBC (805 Mbits) and BLAKE2b (776 Mbits) are about equal in speed and faster then SHA-256 (275 Mbits) and MD5 (602) Mbits.
对于字符串,只需调用
hashCode()
因为内存开销更便宜。否则我推荐这个代码用于私有哈希:
仅供参考:
http://lemire.me/blog/2015/10/ 22/更快的散列而不费力/
For a string, just call the
hashCode()
because is cheaper in memory overhead.Otherwise I recommend this code for private hash:
FYI:
http://lemire.me/blog/2015/10/22/faster-hashing-without-effort/