Java 中的 UTF-16 编码与 C# 的比较
我正在尝试读取 UTF-16 编码方案中的字符串并对其执行 MD5 哈希。但奇怪的是,当我尝试这样做时,Java 和 C# 返回了不同的结果。
以下是 Java 中的代码片段:
public static void main(String[] args) {
String str = "preparar mantecado con coca cola";
try {
MessageDigest digest = MessageDigest.getInstance("MD5");
digest.update(str.getBytes("UTF-16"));
byte[] hash = digest.digest();
String output = "";
for(byte b: hash){
output += Integer.toString( ( b & 0xff ) + 0x100, 16).substring( 1 );
}
System.out.println(output);
} catch (Exception e) {
}
}
输出为: 249ece65145dca34ed310445758e5504
以下是 C# 中的代码片段:
public static string GetMD5Hash()
{
string input = "preparar mantecado con coca cola";
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
string output= s.ToString();
Console.WriteLine(output);
}
输出因为这是: c04d0f518ba2555977fa1ed7f93ae2b3
我不确定为什么输出不一样。我们如何更改上面的代码,以便它们都返回相同的输出?
I am trying to read a String in UTF-16 encoding scheme and perform MD5 hashing on it. But strangely, Java and C# are returning different results when I try to do it.
The following is the piece of code in Java:
public static void main(String[] args) {
String str = "preparar mantecado con coca cola";
try {
MessageDigest digest = MessageDigest.getInstance("MD5");
digest.update(str.getBytes("UTF-16"));
byte[] hash = digest.digest();
String output = "";
for(byte b: hash){
output += Integer.toString( ( b & 0xff ) + 0x100, 16).substring( 1 );
}
System.out.println(output);
} catch (Exception e) {
}
}
The output for this is: 249ece65145dca34ed310445758e5504
The following is the piece of code in C#:
public static string GetMD5Hash()
{
string input = "preparar mantecado con coca cola";
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
string output= s.ToString();
Console.WriteLine(output);
}
The output for this is: c04d0f518ba2555977fa1ed7f93ae2b3
I am not sure, why the outputs are not the same. How do we change the above piece of code, so that both of them return the same output?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
UTF-16 != UTF-16。
在 Java 中,
getBytes("UTF-16")
返回带有可选字节排序标记的大端表示形式。 C# 的System.Text.Encoding.Unicode.GetBytes
返回小端表示形式。我无法从这里检查您的代码,但我认为您需要准确指定转换。在 Java 版本中尝试
getBytes("UTF-16LE")
。UTF-16 != UTF-16.
In Java,
getBytes("UTF-16")
returns an a big-endian representation with optional byte-ordering mark. C#'sSystem.Text.Encoding.Unicode.GetBytes
returns a little-endian representation. I can't check your code from here, but I think you'll need to specify the conversion precisely.Try
getBytes("UTF-16LE")
in the Java version.我能找到的第一件事(这可能不是唯一的问题)是 C# 的 Encoding.Unicode.GetBytes() 是小尾数,而 Java 的自然字节顺序是大尾数。
The first thing I can find, and this might not be the only problem, is that C#'s Encoding.Unicode.GetBytes() is littleendian, while Java's natural byte order is bigendian.
您可以使用 System.Text.Enconding.Unicode.GetString(byte[]) 来从字节转换回字符串。通过这种方式,您可以确保所有事情都以 Unicode 编码进行。
You could use the
System.Text.Enconding.Unicode.GetString(byte[])
to convert back from byte to string. In this way you're sure that all happens in Unicode encoding.