有人可以验证使用此方法获取 md5 哈希值的正确性吗?
MessageDigest m=MessageDigest.getInstance("MD5");
StringBuffer sb = new StringBuffer();
if(nodeName!=null) sb.append(nodeName);
if(nodeParentName!=null) sb.append(nodeParentName);
if(nodeParentFieldName!=null) sb.append(nodeParentFieldName);
if(nodeRelationName!=null) sb.append(nodeRelationName);
if(nodeViewName!=null) sb.append(nodeViewName);
if(treeName!=null) sb.append(treeName);
if(nodeValue!=null && nodeValue.trim().length()>0) sb.append(nodeValue);
if(considerParentHash) sb.append(parentHash);
m.update(sb.toString().getBytes("UTF-8"),0,sb.toString().length());
BigInteger i = new BigInteger(1,m.digest());
hash = String.format("%1$032X", i);
这些代码行背后的想法是,我们将类/模型的所有值附加到 StringBuilder 中,然后返回该值的填充哈希值(Java 实现返回长度为 30 或 31 的 md5 哈希值,因此最后一行格式化哈希值用 0 填充)。
我可以验证这是否有效,但我有一种感觉它在某一时刻失败了(我们的应用程序失败了,我相信这是可能的原因)。
谁能看出这行不通的原因吗?是否有任何解决方法可以使此代码不易出错(例如,消除字符串为 UTF-8 的需要)。
MessageDigest m=MessageDigest.getInstance("MD5");
StringBuffer sb = new StringBuffer();
if(nodeName!=null) sb.append(nodeName);
if(nodeParentName!=null) sb.append(nodeParentName);
if(nodeParentFieldName!=null) sb.append(nodeParentFieldName);
if(nodeRelationName!=null) sb.append(nodeRelationName);
if(nodeViewName!=null) sb.append(nodeViewName);
if(treeName!=null) sb.append(treeName);
if(nodeValue!=null && nodeValue.trim().length()>0) sb.append(nodeValue);
if(considerParentHash) sb.append(parentHash);
m.update(sb.toString().getBytes("UTF-8"),0,sb.toString().length());
BigInteger i = new BigInteger(1,m.digest());
hash = String.format("%1$032X", i);
The idea behind these lines of code is that we append all the values of a class/model into a StringBuilder and then return the padded hash of that (the Java implementation returns md5 hashes that are lenght 30 or 31, so the last line formats the hash to be padded with 0s).
I can verify that this works, but I have a feeling it fails at one point (our application fails and I believe this to be the probable cause).
Can anyone see a reason why this wouldn't work? Are there any workarounds to make this code less prone to errors (e.g. removing the need for the strings to be UTF-8).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的代码中有一些奇怪的东西。
一个字符的 UTF-8 编码可能使用多个字节。因此,您不应使用字符串长度作为
update()
调用的最终参数,而应使用getBytes()
的字节数组的长度> 确实回来了。按照 Paŭlo 的建议,使用update()
方法,该方法采用单个byte[]
作为参数。MD5 的输出是一个 16 字节的序列,具有相当任意的值。如果您将其解释为整数(这就是您调用
BigInteger()
所做的事情),那么您将得到一个小于 2160 的数值sup>,可能小得多。当转换回十六进制数字时,您可能会得到 32、31、30...或少于 30 个字符。您使用的"%032X"
格式字符串左填充有足够的零,因此您的代码可以工作,但它是间接的(MD5 的输出从来都不是整数) 。您可以使用原始串联来组装哈希输入元素。这可能会引发问题。例如,如果
modeName
为“foo
”且modeParentName
为“barqux
”,则 MD5 输入将开始与(UTF-8 编码)“foobarqux
”。如果modeName
为“foobar
”且modeParentName
为“qux
”,则 MD5 输入也将 以“foobarqux
”开头。你没有说出为什么要使用哈希函数,但通常情况下,当一个人使用哈希函数时,是为了对某条数据有唯一的踪迹;两个不同的数据元素应该产生不同的哈希输入。处理
nodeValue
时,您调用trim()
,这意味着该字符串可以以空格开始和/或结束,并且您不希望将该空格包含到哈希中输入 - 但您确实包含它,因为您附加nodeValue
而不是nodeValue.trim()
。如果您尝试执行的操作与安全性有任何关系,那么您不应该使用 MD5,因为它的加密方式已被破坏。请改用 SHA-256。
对 XML 元素进行哈希处理通常是通过规范化(处理空格、属性顺序、文本表示等)来完成的。请参阅此问题,了解有关使用 Java 规范化 XML 数据的主题。
There are a few weird things in your code.
UTF-8 encoding of a character may use more than one byte. So you should not use the string length as final parameter to the
update()
call, but the length of the array of bytes thatgetBytes()
actually returned. As suggested by Paŭlo, use theupdate()
method which takes a singlebyte[]
as parameter.The output of MD5 is a sequence of 16 bytes with quite arbitrary values. If you interpret it as an integer (that's what you do with your call to
BigInteger()
), then you will get a numerical value which will be smaller than 2160, possibly much smaller. When converted back to hexadecimal digits, you may get 32, 31, 30... or less than 30 characters. Your usage of the the"%032X"
format string left-pads with enough zeros, so your code works, but it is kind of indirect (the output of MD5 has never been an integer to begin with).You assemble the hash input elements with raw concatenation. This may induce issues. For instance, if
modeName
is "foo
" andmodeParentName
is "barqux
", then the MD5 input will begin with (the UTF-8 encoding of) "foobarqux
". IfmodeName
is "foobar
" andmodeParentName
is "qux
", then the MD5 input will also begin with "foobarqux
". You do not tell why you want to use a hash function, but usually, when one uses a hash function, it is to have a unique trace of some piece of data; two distinct data elements should yield distinct hash inputs.When handling
nodeValue
, you calltrim()
, which means that this string could begin and/or end with whitespace, and you do not want to include that whitespace into the hash input -- but you do include it, since you appendnodeValue
and notnodeValue.trim()
.If what you are trying to do has any relation to security then you should not use MD5, which is cryptographically broken. Use SHA-256 instead.
Hashing an XML element is normally done through canonicalization (which handles whitespace, attribute order, text representation, and so on). See this question on the topic of canonicalizing XML data with Java.
一个可能的问题是:
正如 Robining Green 所说,
UTF-8
编码可以生成一个byte[]
,它比原始字符串长(它会准确地执行此操作)当字符串包含非 ASCII 字符时)。在这种情况下,您仅对字符串的开头进行哈希处理。最好这样写:
当然,如果字符串中有非 ASCII 字符,这不会导致异常,只是产生另一个哈希值。您应该尝试将失败归结为 SSCCE,就像 lesmana 推荐的那样。
One possible problem is here:
As said by Robing Green, the
UTF-8
encoding can produce abyte[]
which is longer than your original string (it will do this exactly when the String contains non-ASCII characters). In this case, you are only hashing the start of your String.Better write it like this:
Of course, this would not cause an exception, simply another hash than would be produced otherwise, if you have non-ASCII-characters in your string. You should try to brew your failure down to an SSCCE, like lesmana recommended.